Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Git
.git/
.gitignore
.gitmodules

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
.venv/
venv/
*.egg-info/
.installed.cfg
*.egg

# Cache
.mypy_cache/
.pytest_cache/
.coverage
htmlcov/
.cache/

# Build artifacts
build/
dist/
*.manifest
*.spec

# Logs
*.log
logs/

# Local development files
.pytest_cache
.coverage
*.swp
.DS_Store

# IDE
.idea/
.vscode/
*.sublime-project
*.sublime-workspace

# Project specific
.python-version
.pre-commit-config.yaml
.github/

# Environment
.env
.env.*
env/

# Generated files
llm_backend/protos/
35 changes: 35 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
FROM ghcr.io/astral-sh/uv:bookworm-slim AS builder

ENV UV_COMPILE_BYTECODE=1\
UV_LINK_MODE=copy \
UV_PYTHON_INSTALL_DIR=/python \
UV_PYTHON_PREFERENCE=only-managed

RUN uv python install 3.12

WORKDIR /app

RUN --mount=type=cache,target=/root/.cache/uv \
--mount=type=bind,source=uv.lock,target=uv.lock \
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
uv sync --frozen --no-dev --no-install-project

COPY . /app

RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --frozen --no-dev

FROM debian:bookworm-slim

COPY --from=builder --chown=python:python /python /python
COPY --from=builder --chown=app:app /app /app

ENV PATH="/app/.venv/bin:$PATH"

WORKDIR /app

# Generate the protos
RUN ["python3", "scripts/gen_protos.py"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Consider adding error handling to ensure the proto generation succeeds. If gen_protos.py fails, the Docker build should fail as well. This can be achieved by adding || exit 1 to the command.

RUN ["python3", "scripts/gen_protos.py"] || exit 1


# Run the application
ENTRYPOINT ["python3", "scripts/serve.py", "--config", "configs/config.toml"]
58 changes: 48 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,70 @@
# SYNC Server LLM

## Overview

SYNC Server LLM is a gRPC-based server that performs document retrieval and summarization. It leverages Qdrant for vector search and OpenAI models to generate summaries of retrieved content based on user-provided keywords.

## Installation

```shell
git clone --recurse-submodules https://github.com/NCTU-SYNC/sync-server-llm.git
cd sync-server-llm

uv sync --no-dev --frozen

uv run gen-protos
```

## Usage

Please configure the `configs/config.toml` file.
The following environment variables are required (`export` them or place them in a `.env` file):
This section explains how to run the SYNC Server LLM using different methods.

- `OPENAI_API_KEY`: Your ChatGPT API key.
- `QDRANT_HOST`: The Qdrant host address.
- `QDRANT_PORT`: The Qdrant host port.
- `QDRANT_COLLECTION`: The Qdrant collection name.
1. Configure the server by editing `configs/config.toml`

```shell
python3 scripts/serve.py --config configs/config.toml
```
2. Set up the required environment variables by adding them to a `.env` file

| Variable | Description |
| ------------------- | ----------------------------- |
| `OPENAI_API_KEY` | Your ChatGPT API key |
| `QDRANT_HOST` | The Qdrant host address |
| `QDRANT_PORT` | The Qdrant host REST API port |
| `QDRANT_COLLECTION` | The Qdrant collection name |

3. Start the server:

- To run the server locally:

```shell
uv run scripts/serve.py --config configs/config.toml
```

- To run the server using Docker:

Build the Docker image:

```shell
docker build -t sync/backend-llm .
```

Run the container:

```shell
docker run -p 50051:50051 \
--env-file .env \
-v $(pwd)/path/to/configs:/app/configs/config.toml \
-v $(pwd)/path/to/hf_cache:/tmp/llama_index \
sync/backend-llm
```

> 1. If you are using Windows, you can add `--gpus=all` to the `docker run` command. Ensure that your Docker installation supports GPU usage.
> 2. It is strongly recommended to mount the `hf_cache` directory to a persistent volume to avoid re-downloading the Hugging Face models every time the container is started.

## Client Example

You can refer to `scripts/client.py` for an example implementation of a client:

```shell
python3 scripts/client.py
uv run scripts/client.py
Comment on lines 66 to +67

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the other script execution examples, consider using uv run here as well.

Suggested change
```shell
python3 scripts/client.py
uv run scripts/client.py
uv run scripts/client.py

```

## Features
Expand Down
8 changes: 5 additions & 3 deletions scripts/gen_protos.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,7 @@ def generate():
proto_files = glob.glob(f"{proto_dir}/*.proto")

command = [
"uv",
"run",
"python",
"python3",
"-m",
"grpc_tools.protoc",
f"-I{target_dir}={proto_dir}",
Expand All @@ -21,3 +19,7 @@ def generate():
] + proto_files

subprocess.run(command, shell=False, check=True)


if __name__ == "__main__":
generate()
2 changes: 1 addition & 1 deletion scripts/serve.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ async def serve(config: Config, logger: logging.Logger):

async def server_graceful_shutdown():
logging.info("Starting graceful shutdown...")
await server.stop(3)
await server.stop(1)

_cleanup_coroutines.append(server_graceful_shutdown())

Expand Down