Skip to content

Commit 840481c

Browse files
docker (#11)
* fix(serve): reduce server shutdown timeout from 3 seconds to 1 second for quicker graceful shutdown * fix(gen_protos): update command to use python3 and add main guard for script execution * feat(docker): add Docker files * docs(docker): enhance documentation with overview, installation, usage instructions, and Docker setup
1 parent 3b45ef6 commit 840481c

File tree

5 files changed

+147
-14
lines changed

5 files changed

+147
-14
lines changed

.dockerignore

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Git
2+
.git/
3+
.gitignore
4+
.gitmodules
5+
6+
# Python
7+
__pycache__/
8+
*.py[cod]
9+
*$py.class
10+
*.so
11+
.Python
12+
.venv/
13+
venv/
14+
*.egg-info/
15+
.installed.cfg
16+
*.egg
17+
18+
# Cache
19+
.mypy_cache/
20+
.pytest_cache/
21+
.coverage
22+
htmlcov/
23+
.cache/
24+
25+
# Build artifacts
26+
build/
27+
dist/
28+
*.manifest
29+
*.spec
30+
31+
# Logs
32+
*.log
33+
logs/
34+
35+
# Local development files
36+
.pytest_cache
37+
.coverage
38+
*.swp
39+
.DS_Store
40+
41+
# IDE
42+
.idea/
43+
.vscode/
44+
*.sublime-project
45+
*.sublime-workspace
46+
47+
# Project specific
48+
.python-version
49+
.pre-commit-config.yaml
50+
.github/
51+
52+
# Environment
53+
.env
54+
.env.*
55+
env/
56+
57+
# Generated files
58+
llm_backend/protos/

Dockerfile

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
FROM ghcr.io/astral-sh/uv:bookworm-slim AS builder
2+
3+
ENV UV_COMPILE_BYTECODE=1\
4+
UV_LINK_MODE=copy \
5+
UV_PYTHON_INSTALL_DIR=/python \
6+
UV_PYTHON_PREFERENCE=only-managed
7+
8+
RUN uv python install 3.12
9+
10+
WORKDIR /app
11+
12+
RUN --mount=type=cache,target=/root/.cache/uv \
13+
--mount=type=bind,source=uv.lock,target=uv.lock \
14+
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
15+
uv sync --frozen --no-dev --no-install-project
16+
17+
COPY . /app
18+
19+
RUN --mount=type=cache,target=/root/.cache/uv \
20+
uv sync --frozen --no-dev
21+
22+
FROM debian:bookworm-slim
23+
24+
COPY --from=builder --chown=python:python /python /python
25+
COPY --from=builder --chown=app:app /app /app
26+
27+
ENV PATH="/app/.venv/bin:$PATH"
28+
29+
WORKDIR /app
30+
31+
# Generate the protos
32+
RUN ["python3", "scripts/gen_protos.py"]
33+
34+
# Run the application
35+
ENTRYPOINT ["python3", "scripts/serve.py", "--config", "configs/config.toml"]

README.md

Lines changed: 48 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,70 @@
11
# SYNC Server LLM
22

3+
## Overview
4+
5+
SYNC Server LLM is a gRPC-based server that performs document retrieval and summarization. It leverages Qdrant for vector search and OpenAI models to generate summaries of retrieved content based on user-provided keywords.
6+
37
## Installation
48

59
```shell
610
git clone --recurse-submodules https://github.com/NCTU-SYNC/sync-server-llm.git
711
cd sync-server-llm
812

13+
uv sync --no-dev --frozen
14+
915
uv run gen-protos
1016
```
1117

1218
## Usage
1319

14-
Please configure the `configs/config.toml` file.
15-
The following environment variables are required (`export` them or place them in a `.env` file):
20+
This section explains how to run the SYNC Server LLM using different methods.
1621

17-
- `OPENAI_API_KEY`: Your ChatGPT API key.
18-
- `QDRANT_HOST`: The Qdrant host address.
19-
- `QDRANT_PORT`: The Qdrant host port.
20-
- `QDRANT_COLLECTION`: The Qdrant collection name.
22+
1. Configure the server by editing `configs/config.toml`
2123

22-
```shell
23-
python3 scripts/serve.py --config configs/config.toml
24-
```
24+
2. Set up the required environment variables by adding them to a `.env` file
25+
26+
| Variable | Description |
27+
| ------------------- | ----------------------------- |
28+
| `OPENAI_API_KEY` | Your ChatGPT API key |
29+
| `QDRANT_HOST` | The Qdrant host address |
30+
| `QDRANT_PORT` | The Qdrant host REST API port |
31+
| `QDRANT_COLLECTION` | The Qdrant collection name |
32+
33+
3. Start the server:
34+
35+
- To run the server locally:
36+
37+
```shell
38+
uv run scripts/serve.py --config configs/config.toml
39+
```
40+
41+
- To run the server using Docker:
42+
43+
Build the Docker image:
44+
45+
```shell
46+
docker build -t sync/backend-llm .
47+
```
48+
49+
Run the container:
50+
51+
```shell
52+
docker run -p 50051:50051 \
53+
--env-file .env \
54+
-v $(pwd)/path/to/configs:/app/configs/config.toml \
55+
-v $(pwd)/path/to/hf_cache:/tmp/llama_index \
56+
sync/backend-llm
57+
```
58+
59+
> 1. If you are using Windows, you can add `--gpus=all` to the `docker run` command. Ensure that your Docker installation supports GPU usage.
60+
> 2. It is strongly recommended to mount the `hf_cache` directory to a persistent volume to avoid re-downloading the Hugging Face models every time the container is started.
61+
62+
## Client Example
2563

2664
You can refer to `scripts/client.py` for an example implementation of a client:
2765

2866
```shell
29-
python3 scripts/client.py
67+
uv run scripts/client.py
3068
```
3169

3270
## Features

scripts/gen_protos.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,7 @@ def generate():
99
proto_files = glob.glob(f"{proto_dir}/*.proto")
1010

1111
command = [
12-
"uv",
13-
"run",
14-
"python",
12+
"python3",
1513
"-m",
1614
"grpc_tools.protoc",
1715
f"-I{target_dir}={proto_dir}",
@@ -21,3 +19,7 @@ def generate():
2119
] + proto_files
2220

2321
subprocess.run(command, shell=False, check=True)
22+
23+
24+
if __name__ == "__main__":
25+
generate()

scripts/serve.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ async def serve(config: Config, logger: logging.Logger):
4949

5050
async def server_graceful_shutdown():
5151
logging.info("Starting graceful shutdown...")
52-
await server.stop(3)
52+
await server.stop(1)
5353

5454
_cleanup_coroutines.append(server_graceful_shutdown())
5555

0 commit comments

Comments
 (0)