Skip to content

Commit a820f4e

Browse files
committed
adding load balanced replicas' docker compose configuration
updating README with the replicas compose docs
1 parent 0a4c871 commit a820f4e

File tree

3 files changed

+74
-3
lines changed

3 files changed

+74
-3
lines changed

README.md

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,38 @@ The service uses the following environment variables:
2121

2222
### Running the Service
2323

24+
#### Running a single instance
25+
26+
1. **Build and Run:** Execute the following command in the root directory:
27+
28+
```bash
29+
docker compose -f docker/compose.yml up --build -d
30+
```
31+
32+
The first run will download the `clip-ViT-B-32` model by default (or the model you put in the env variables) (approx. 600MB) and store it in the persistent `model_cache` volume.
33+
34+
2. **Access the API:** The service will be available at `http://localhost:8000`.
35+
36+
#### Running a load-balanced n number of instances
37+
38+
The load balancing is handled by nginx and a set of replica containers are created. calling the nginx server would automatically choose the target instance.
39+
40+
**Note :** In order to have a single downloading instance of the default model, it's better to run the single instance compose file to populate the volume with the model files.
41+
After that all instances will just read and load their copy of the model into VRAM.
42+
43+
**Note :** Ensure that you have enough VRAM to hold the model copies, otherwise some instances might crash or spill into system RAM.
44+
2445
1. **Build and Run:** Execute the following command in the root directory:
2546
2647
```bash
27-
docker compose up --build -d
48+
docker compose -f docker/compose_load_balanced.yml up --build -d
2849
```
50+
This will also download the default model,
51+
52+
2. **Access the API:** The service will be available at `http://localhost:8004`.
53+
2954
30-
The first run will download the `clip-ViT-B-32` model by default (approx. 600MB) and store it in the persistent `model_cache` volume.
3155
32-
2. **Access the API:** The service will be available at `http://localhost:8000`. You can view the interactive documentation (Swagger UI) at `http://localhost:8000/docs`.
3356
3457
## 💡 API Endpoints
3558

docker/compose_load_balanced.yml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
services:
2+
embedding-service:
3+
build:
4+
context: ../
5+
dockerfile: Dockerfile
6+
#image: ghcr.io/moda20/mes:latest
7+
ports:
8+
- "8000"
9+
environment:
10+
TRANSFORMERS_CACHE: ${MODEL_CACHE_DIR:-/app/model_cache}
11+
volumes:
12+
- model_cache:/app/model_cache
13+
restart: unless-stopped
14+
deploy:
15+
replicas: 1
16+
nginx:
17+
image: nginx:latest
18+
ports:
19+
- "8004:80"
20+
volumes:
21+
- ./nginx.conf:/etc/nginx/nginx.conf:ro # Nginx configuration mount
22+
volumes:
23+
model_cache:
24+
driver: local

docker/nginx.conf

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
events {
2+
worker_connections 1024;
3+
}
4+
5+
6+
http {
7+
resolver 127.0.0.11 valid=10s;
8+
upstream backend {
9+
zone backend 64k;
10+
least_conn;
11+
server embedding-service:8000 resolve;
12+
}
13+
14+
server {
15+
listen 80;
16+
17+
location / {
18+
proxy_pass http://backend;
19+
proxy_set_header Host $host;
20+
proxy_set_header X-Real-IP $remote_addr;
21+
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
22+
}
23+
}
24+
}

0 commit comments

Comments
 (0)