Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 45 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ uv sync --no-dev --frozen
uv run gen-protos
```

## Usage
## Configuration

This section explains how to run the SYNC Server LLM using different methods.
Before running the server, you need to:

1. Configure the server by editing `configs/config.toml`
1. Configure the server settings in `configs/config.toml`

2. Set up the required environment variables by adding them to a `.env` file
2. Create a `.env` file with the following environment variables:

| Variable | Description |
| ------------------- | ----------------------------- |
Expand All @@ -30,38 +30,58 @@ This section explains how to run the SYNC Server LLM using different methods.
| `QDRANT_PORT` | The Qdrant host REST API port |
| `QDRANT_COLLECTION` | The Qdrant collection name |

3. Start the server:
## Running the Server

- To run the server locally:
You can run SYNC Server LLM using one of the following methods:

```shell
uv run scripts/serve.py --config configs/config.toml
```
### Method 1: Running Locally

- To run the server using Docker:
```shell
uv run scripts/serve.py --config configs/config.toml
```

### Method 2: Using Docker

1. Build the Docker image:

```shell
docker build -t sync/backend-llm .
```

2. Run the container:

```shell
docker run -p 50051:50051 \
--env-file .env \
-v $(pwd)/path/to/configs:/app/configs/config.toml \
-v $(pwd)/path/to/hf_cache:/tmp/llama_index \
sync/backend-llm
```

> Notes:
> - For Windows users, add `--gpus=all` to use GPU capabilities (requires Docker with GPU support)
> - We strongly recommend mounting the `hf_cache` directory to avoid re-downloading Hugging Face models on container restart
> - Make sure to [set up and run the Qdrant server](https://qdrant.tech/documentation/guides/installation/#docker-and-docker-compose) before starting

### Method 3: Using Docker Compose

Build the Docker image:
A `docker-compose.yaml` file is included in the repository to simplify deployment with both the server and Qdrant database.

```shell
docker build -t sync/backend-llm .
```
1. Build the services:

Run the container:
```shell
docker-compose build
```

```shell
docker run -p 50051:50051 \
--env-file .env \
-v $(pwd)/path/to/configs:/app/configs/config.toml \
-v $(pwd)/path/to/hf_cache:/tmp/llama_index \
sync/backend-llm
```
2. Start the services:

> 1. If you are using Windows, you can add `--gpus=all` to the `docker run` command. Ensure that your Docker installation supports GPU usage.
> 2. It is strongly recommended to mount the `hf_cache` directory to a persistent volume to avoid re-downloading the Hugging Face models every time the container is started.
```shell
docker-compose up -d
```

## Client Example

You can refer to `scripts/client.py` for an example implementation of a client:
To test the server, you can use the provided client example:

```shell
uv run scripts/client.py
Expand Down
27 changes: 27 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Docker Compose configuration for SYNC Server LLM
# This sets up both the backend-llm service and its required Qdrant vector database

services:
# Main backend service for SYNC Server LLM
backend-llm:
build: sync-server-llm # Path to the directory with Dockerfile of sync-server-llm

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It might be helpful to specify the context for the build, e.g., . if the Dockerfile is in the same directory. This makes the compose file more explicit.

    build:
      context: ./sync-server-llm # Path to the directory with Dockerfile of sync-server-llm
      dockerfile: Dockerfile

restart: always
ports:
# Maps the container port to host (must match server.port in config.toml)
- 50051:50051
env_file:
- .env
environment:
QDRANT_HOST: qdrant
volumes:
# Mount configuration and cache for persistence
- ./configs/config.toml:/app/configs/config.toml
- ./.hf_cache:/tmp/llama_index
Comment on lines +20 to +21

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider using environment variables for the volume paths to make the compose file more configurable.

    volumes:
      # Mount configuration and cache for persistence
      - ${CONFIG_PATH}:/app/configs/config.toml
      - ${HF_CACHE_PATH}:/tmp/llama_index


# Qdrant vector database service
qdrant:
image: qdrant/qdrant:latest
Copy link

Copilot AI Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the 'latest' tag for the Qdrant image can lead to unpredictable behavior with future releases. It is recommended to pin to a specific version.

Suggested change
image: qdrant/qdrant:latest
image: qdrant/qdrant:v1.0.0

Copilot uses AI. Check for mistakes.
restart: always
volumes:
# Mount storage for persistence
- ./qdrant_storage:/qdrant/storage

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider using an environment variable for the storage path to make the compose file more configurable.

    volumes:
      # Mount storage for persistence
      - ${QDRANT_STORAGE}:/qdrant/storage