Skip to content

Latest commit

 

History

History
589 lines (458 loc) · 22.8 KB

File metadata and controls

589 lines (458 loc) · 22.8 KB
title Installation
description Install Morphik on your own infrastructure

For users who need to run Morphik on their own infrastructure, we provide two installation options: Direct Installation and Docker.

## Direct Installation
<Steps>
  <Step title="Prerequisites">
    <Accordion title="Python">
      Please ensure that you have Python 3.12 installed on your machine. Guides for installing Python can be found on the [Python website](https://www.python.org/downloads/release/python-3129/).
    </Accordion>
    <Accordion title="Rust">
      Morphik requires the Rust toolchain for optimized performance operations (binary quantization, base64 encoding, text processing). Install Rust using rustup:

      <Tabs>
        <Tab title="macOS/Linux">
          ```bash
          curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
          ```

          After installation, restart your terminal or run:
          ```bash
          source $HOME/.cargo/env
          ```
        </Tab>
        <Tab title="Windows">
          Download and run the installer from [rustup.rs](https://rustup.rs/), or use winget:
          ```powershell
          winget install Rustlang.Rustup
          ```
        </Tab>
      </Tabs>

      Verify the installation:
      ```bash
      rustc --version
      cargo --version
      ```
    </Accordion>
    <Accordion title="PostgreSQL">
      Morphik requires PostgreSQL with the pgvector extension for vector storage and similarity search capabilities. Follow the installation instructions for your operating system:

      <Tabs>
        <Tab title="macOS">
          On macOS, you can use Homebrew to install PostgreSQL and pgvector:

          ```bash
          brew install postgresql@14
          brew install pgvector
          ```

          Start the PostgreSQL service:

          ```bash
          brew services start postgresql@14
          ```

          Create a database and user for Morphik:

          ```bash
          createdb morphik
          createuser -s postgres
          ```

          These commands create a database named "morphik" and a superuser named "postgres" that the application will use to connect.
        </Tab>
        <Tab title="Ubuntu/Debian">
          Install PostgreSQL from the official repositories. **We recommend version 14.** Other versions may work, but haven't been extensively tested!

          ```bash
          sudo apt update
          sudo apt install postgresql postgresql-contrib
          ```

          Install pgvector:

          ```bash
          sudo apt install postgresql-14-pgvector
          ```

          Start and enable the PostgreSQL service:

          ```bash
          sudo systemctl start postgresql
          sudo systemctl enable postgresql
          ```

          Create a database and user for Morphik:

          ```bash
          sudo -u postgres createdb morphik
          sudo -u postgres createuser -s postgres
          ```
        </Tab>
        <Tab title="Windows">
          1. Download and install PostgreSQL from the [official website](https://www.postgresql.org/download/windows/).
          2. During installation, make note of the password you set for the postgres user.
          3. Install pgvector:
             - Open pgAdmin (installed with PostgreSQL)
             - Connect to your PostgreSQL server
             - Right-click on "Extensions" and select "Create \> Extension"
             - Select "pgvector" from the dropdown and click "Save"
          4. Create a database for Morphik:
             - Right-click on "Databases" and select "Create \> Database"
             - Name it "morphik" and click "Save"
        </Tab>
      </Tabs>
      After installation, verify that PostgreSQL is running correctly:

      ```bash
      psql -U postgres -c "SELECT version();"
      ```

      You should see the following output:

      ```
       version                                                            
      ------------------------------------------------------------------------------------------------------------------------------
      <Your postgres version and some compilation details>
      ```
    </Accordion>
    <Accordion title="Additional Dependencies">
      Some system-level dependencies might be required for processing various document types:

      <Tabs>
        <Tab title="macOS">
          ```bash
          # Install via Homebrew
          brew install poppler libmagic
          ```
        </Tab>
        <Tab title="Ubuntu/Debian">
          ```bash
          # Install via apt
          sudo apt-get update
          sudo apt-get install -y poppler-utils libmagic-dev
          ```
        </Tab>
        <Tab title="Windows">
          For Windows, you may need to install these dependencies manually:

          1. **Poppler**: Download from [poppler for Windows](https://github.com/oschwartz10612/poppler-windows/releases/)
          2. **libmagic**: This is included in the python-magic-bin package which will be installed with pip
        </Tab>
      </Tabs>
      If you encounter database initialization issues within Docker, you may need to manually initialize the schema:

      ```bash
      psql -U postgres -d morphik -a -f init.sql
      ```
    </Accordion>
    <Accordion title="Docker (required for background services)">
      Docker is used to spin up auxiliary services automatically (e.g., a
      local Redis container for Morphik's task queue). Install Docker
      Desktop (macOS/Windows) or the Docker engine (Linux) and make sure the
      daemon is running.
    </Accordion>
    <Accordion title="Optional: Running Local Models">
      Morphik supports fully local inference for both embeddings and completions through two powerful engines:
      
      - **Lemonade SDK** - Windows only, optimized for AMD GPUs/NPUs
      - **Ollama** - Cross-platform (Windows, macOS, Linux)
      
      Both are pre-configured in Morphik. For detailed setup instructions, see our [Local Inference Guide](/local-inference).
    </Accordion>
  </Step>

  <Step title="Cloning the Repository">
    To get started with Morphik, we need to first setup the server. This involves cloning the [repository](https://github.com/morphik-org/morphik-core/), installing the dependencies, and the running the server. You are just a few steps away from accurate, agentic RAG over your multi-modal data\!

    First, let's clone the repository from GitHub.

    ```bash
    git clone https://github.com/morphik-org/morphik-core.git
    ```

    After cloning the repository, navigate into the `morphik-core` folder.

    ```bash
    cd morphik-core
    ```
  </Step>

  <Step title="Run the Installer Script">
    Choose the installer for your OS:

    <Tabs>
      <Tab title="macOS/Linux">
      From the project root, run:

      ```bash
      # Make the script executable
      chmod +x install_and_start.sh

      # Run the installer
      ./install_and_start.sh
      ```
      </Tab>
      <Tab title="Windows (PowerShell)">
      From the project root, run in PowerShell:

      ```powershell
      Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
      ./install_and_start.ps1
      ```

      This script installs dependencies with uv, attempts a prebuilt
      `llama-cpp-python` wheel, and starts the server.
      </Tab>
    </Tabs>

    The installers will:

    1. Create & activate a `.venv` with **uv**
    2. Ask about GPU availability for multimodal embeddings (macOS/Linux)
    3. Install `colpali-engine` (for multimodal document understanding)
    4. Install or build `llama-cpp-python` (Metal on Apple Silicon, wheel on Windows)
    5. Launch the server via `uv run start_server.py`

    <Note>
      You only need to run `./install_and_start.sh` the first time to set up
      the environment. For future sessions, activate your project directory
      and simply start the server with:

      ```bash
      uv run start_server.py
      ```
    </Note>
  </Step>

  <Step title="Setting up the Server Parameters">
    At this point, you may want to customize the server - such as use a different model, enable or disable certain features, etc. - you can do so by editing the `morphik.toml` file. 
    
    Morphik uses a registered models approach, which allows you to define hundreds of different models in one place and reference them throughout your configuration. This makes it easy to mix and match models based on your needs (e.g., smaller models for simpler tasks). You can find more details about configuration [here](/configuration).

    The installer copies `.env.example` to `.env` automatically if it's
    missing. After the script finishes, open `.env` to add any API keys (e.g.
    `OPENAI_API_KEY`) or secrets you need. You can tweak `morphik.toml`
    anytime to switch completion/embedding models, adjust chunking, or enable
    advanced features.
  </Step>

  <Step title="Launching the Server">
    You are now ready to launch the Morphik server\! Just run the following command to start the server.

    ```bash
    uv run start_server.py
    ```

    You should see the following output:

    ```
    INFO:     Started server process [15169]
    INFO:     Waiting for application startup.
    INFO:     Application startup complete.
    INFO:     Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
    ```

    This means that the server is running on [http://localhost:8000](http://localhost:8000). You can now interact with the server using the API or the Python SDK.
  </Step>
</Steps>
## Using Docker
Morphik provides a streamlined Docker-based setup that includes all necessary components: the core API, PostgreSQL with pgvector, and Redis for task queuing.

<Note>
  If you are using an Apple Silicon (M-series) Mac, we highly recommend using the Direct Installation method instead of Docker. GPU passthrough is not supported via Docker on Apple Silicon, which can significantly impact performance.
</Note>

<Warning>
  **Multimodal Embeddings and GPU Recommendations**
  
  Morphik achieves ultra-accurate document understanding through advanced multimodal embeddings that excel at processing images, PDFs, and complex layouts. While Morphik will work without a GPU, for best results we recommend using a GPU-enabled machine.
  
  The installer will ask if you want multimodal embeddings, and can be turned on and off later.
  
  To manually adjust multimodal embeddings, edit `morphik.toml`:
  ```toml
  [morphik]
  enable_colpali = false  # Set to true when GPU is available
  ```
</Warning>

### Prerequisites

- Docker and Docker Compose V2 installed on your system
- At least 10GB of free disk space (for models and data)
- 8GB\+ RAM recommended

### Quick Start

**Option&nbsp;A — Using Pre-built Image (Recommended)**

This is the easiest way to get started with Morphik. Run the command for your OS:

<Tabs>
  <Tab title="macOS/Linux">
  ```bash
  curl -sSL https://raw.githubusercontent.com/morphik-org/morphik-core/main/install_docker.sh | bash
  ```
  </Tab>
  <Tab title="Windows (PowerShell)">
  ```powershell
  Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
  iwr -useb https://raw.githubusercontent.com/morphik-org/morphik-core/main/install_docker.ps1 -OutFile install_docker.ps1; .\install_docker.ps1
  ```
  </Tab>
</Tabs>

<Note>
  Run the installer from the folder where you want Morphik to live (for example `~/morphik-core`). The script writes `docker-compose.run.yml`, `.env`, `morphik.toml`, and the helper scripts (`start-morphik.*` / `stop-morphik.*`) into the directory where you run the command.
</Note>

<video controls>
  <source src="/assets/download_morphik.mp4" type="video/mp4" />
  Your browser does not support the video tag.
</video>

*This video demonstrates the installation flow where we select "No" for GPU support and "Yes" for the Admin UI. The initial startup took approximately 3-4 minutes without GPU. Note that with multimodal embeddings enabled, the initial startup may take longer.*

The script handles all setup tasks including:
- Checking prerequisites (Docker, Docker Compose V2)
- Downloading configurations
- Prompting for your OpenAI API key (optional - press Enter to skip)
- **Setting up authentication** (optional but recommended for production)
- **Offering to install the Admin UI** (optional but recommended)
- Copying the Admin UI bundle directly from the Morphik image (with a GitHub fallback) when you opt in, so the UI launches even if you haven't cloned the repo
- Generating `start-morphik.*` and `stop-morphik.*` helpers so you can restart or clean up services safely later
- Starting all services

During installation, you'll be prompted:
1. To enter your OpenAI API key (optional - press Enter to skip and configure later)
2. To set a LOCAL_URI_PASSWORD for authentication:
   - **Skip for local-only access**: Press Enter to enable auth bypass (`bypass_auth_mode=true`)
   - **Set for production/external access**: Enter a secure password to keep authentication enabled (`bypass_auth_mode=false`)
3. Whether you'd like to install the Admin UI (recommended for easier management)

<Note>
  Morphik supports 100s of models including OpenAI, Anthropic (Claude), Google Gemini, local models, and even custom models! You can configure your preferred provider in `morphik.toml` after installation.
</Note>

After installation:

<Tabs>
  <Tab title="macOS/Linux">
  ```bash
  # To restart Morphik later (auto-detects UI if installed)
  ./start-morphik.sh

  # To stop all services and clean up containers/networks/volumes
  ./stop-morphik.sh
  # (Runs docker compose down --volumes --remove-orphans with any active profiles)
  ```
  </Tab>
  <Tab title="Windows (PowerShell)">
  ```powershell
  # To restart Morphik later (auto-detects UI if installed)
  ./start-morphik.ps1

  # To stop all services and clean up containers/networks/volumes
  ./stop-morphik.ps1
  # (Runs docker compose down --volumes --remove-orphans with any active profiles)
  ```
  </Tab>
</Tabs>

<Note>
  Use the generated `./stop-morphik.sh` (or `./stop-morphik.ps1` on Windows) whenever you need to shut Morphik down. It automatically includes the UI profile if installed and runs `docker compose down --volumes --remove-orphans`, so no containers or networks linger. After stopping, update `morphik.toml` as needed and restart with `./start-morphik.sh`.
</Note>

The services will be available at:
- **API**: http://localhost:8000 (or your configured port)
- **API Documentation**: http://localhost:8000/docs
- **Admin UI** (if installed): http://localhost:3003

### Authentication and Connection URIs

Morphik provides flexible authentication options:

**Local Development (bypass_auth_mode = true)**
- When `bypass_auth_mode = true`, authentication is disabled for local API access
- Useful for local development and testing
- **Not secure for production or external access**

**Authenticated Mode (bypass_auth_mode = false)**
- Authentication required for all API requests
- Set `JWT_SECRET_KEY` and `LOCAL_URI_PASSWORD` in your environment
- To generate authorized connection URIs:

<Steps>
  <Step title="Access the API Documentation">
    Navigate to http://localhost:8000/docs in your browser
    
    ![Generate Local URI endpoint in Swagger UI](/assets/generate_local_uri_api_docs.png)
  </Step>
  <Step title="Find the /local/generate_uri endpoint">
    Scroll down to find the `/local/generate_uri` endpoint in the API docs
  </Step>
  <Step title="Generate your connection URI">
    1. Click on the endpoint to expand it
    2. Click "Try it out"
    3. Fill in the request parameters:
       - **name**: A descriptive name for this connection (e.g., "admin")
       - **expiry_days**: How long the token should remain valid (default: 30)
       - **password_token**: Your LOCAL_URI_PASSWORD from your environment
       - **server_mode**: 
         - Set to `true` if you want to access Morphik from outside the server
         - Set to `false` for local access only (uses localhost)
    4. Click "Execute"
    
    The response will contain a secure connection URI that includes authentication tokens for your client applications.
  </Step>
</Steps>

<Note>
  The generated URI contains authentication tokens and should be kept secure. Anyone with this URI can access your Morphik instance with the permissions embedded in the token.
</Note>

### Using Your Connection URI

Once you have generated a connection URI, you can use it to connect to Morphik through:

**Option 1: Admin UI**

If you have the Admin UI installed, you can paste your connection URI directly in the UI:

![Paste connection URI in Admin UI](/assets/paste_in_ui.png)

The URI field is highlighted in red, showing where to paste your `morphik://` connection string.

**Option 2: Python SDK**

```python
from morphik import Morphik

# Initialize Morphik with your connection URI
morphik = Morphik("morphik://admin:eyJhbGc...")  # Use your generated URI

# Ingest a file
doc = morphik.ingest_file(file_path="document.pdf")
doc.wait_for_completion()

# Query the ingested content
response = morphik.query("What is Morphik?")
print(response)  # "Morphik is the most accurate end-to-end RAG system"
```

**Option 3: TypeScript/JavaScript SDK**

```typescript
import Morphik from 'morphik';
import * as fs from 'fs';

// Extract the token from your URI
// URI format: morphik://name:token@host
const uri = "morphik://admin:eyJhbGc...";  // Your generated URI
const token = uri.split(':')[1].split('@')[0];

// Initialize Morphik client
const morphik = new Morphik({
  apiKey: token,
  baseURL: 'http://localhost:8000'  // Adjust based on your server_mode
});

// Ingest a file
const file = fs.createReadStream('document.pdf');
const doc = await morphik.ingest.ingestFile({ file });

// Wait for processing
await new Promise(resolve => setTimeout(resolve, 5000));

// Query the content
const response = await morphik.query.generateCompletion({
  query: 'What is Morphik?'
});
console.log(response.completion);  // "Morphik is the most accurate end-to-end RAG system"
```

<Note>
  **What is Morphik?** Morphik is the most accurate end-to-end RAG (Retrieval-Augmented Generation) system, designed to understand and process multi-modal documents with exceptional precision.
</Note>

---

**Option&nbsp;B — Local Development**

For developing and testing changes to Morphik:

```bash
# Clone and enter the repo
git clone https://github.com/morphik-org/morphik-core.git
cd morphik-core

# Start development environment
./start-dev.sh
```

This builds the Docker image locally and starts all services. 

To include local inference models, see our [Local Inference Guide](/local-inference) for setting up Ollama or Lemonade SDK.

---

### Changing the Port

To run Morphik on a different port, edit the `morphik.toml` file:

```toml
[api]
port = 9000  # Your custom port
```

Then restart using the appropriate start script (`./start-morphik.sh` for production or `./start-dev.sh` for development).

### Advanced Configuration

• **Change models** – Edit `morphik.toml` to switch between different models. See the `[registered_models]` section for available options.

• **Environment variables** – Customize settings in `.env` file (API keys, database connection, etc.)

• **Persist data** – Docker volumes automatically persist PostgreSQL data and uploaded files across restarts.

---

### Configuration

The default configuration works out of the box and includes:

- PostgreSQL with pgvector for document storage
- Redis for task queuing
- Local file storage
- Basic authentication
- Optional: Local inference with [Ollama or Lemonade](/local-inference)

You can customize your setup by creating a `.env` file:

```bash
JWT_SECRET_KEY=your-secure-key-here  # Important: Change in production
OPENAI_API_KEY=sk-...                # Only if using OpenAI
HOST=0.0.0.0                         # Leave as is for Docker
PORT=8000                            # Change if needed
```

### Accessing Services

- Morphik API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Health Check: http://localhost:8000/health

### Troubleshooting

1. **Service Won't Start**

   ```bash
   # View all logs
   docker compose logs
   
   # View specific service logs
   docker compose logs morphik
   docker compose logs postgres
   docker compose logs redis
   ```
2. **Database Issues**
   - Check PostgreSQL is healthy: `docker compose ps`
   - Verify database connection: `docker compose exec postgres psql -U morphik -d morphik`

3. **Local Model Issues**
   - For local inference setup and troubleshooting, see our [Local Inference Guide](/local-inference)
   - Verify that your Redis configuration in `morphik.toml` matches your deployment:
     - For Redis in Docker, use `host = "redis"` (not "localhost")
4. **Memory Issues with Local Models**
   - If using local models and encountering memory issues:
     - Increase Docker memory allocation in Docker Desktop (Settings > Resources)
     - Use smaller quantized models
     - See our [Local Inference Guide](/local-inference) for model recommendations
   - Alternatively, switch to cloud providers (OpenAI, Anthropic, etc.) in `morphik.toml`
5. **Performance Issues**
   - Monitor resources: `docker stats`
   - Ensure sufficient RAM (8GB\+ recommended)
   - Check disk space: `df -h`