Skip to content

Commit 56712af

Browse files
authored
docs: k8s quickstart and observability with k8s (#225)
* fix typo & add k8s quickstart doc Signed-off-by: JaredforReal <[email protected]> * change docker to deploy quickstart Signed-off-by: JaredforReal <[email protected]> * refactor deploy-quickstart.md Signed-off-by: JaredforReal <[email protected]> * declare k8s needs seperate llm endpoint and envoy set up Signed-off-by: JaredforReal <[email protected]> * add some reference in k8s requirement Signed-off-by: JaredforReal <[email protected]> * change docker to deploy quickstart Signed-off-by: JaredforReal <[email protected]> --------- Signed-off-by: JaredforReal <[email protected]>
1 parent 386d7aa commit 56712af

File tree

6 files changed

+375
-202
lines changed

6 files changed

+375
-202
lines changed

docker-compose.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -69,9 +69,9 @@ services:
6969
image: prom/prometheus:v2.53.0
7070
container_name: prometheus
7171
volumes:
72-
- ./config/prometheus.yml:/etc/prometheus/prometheus.yml:ro
72+
- ./config/prometheus.yaml:/etc/prometheus/prometheus.yaml:ro
7373
command:
74-
- --config.file=/etc/prometheus/prometheus.yml
74+
- --config.file=/etc/prometheus/prometheus.yaml
7575
- --storage.tsdb.retention.time=15d
7676
ports:
7777
- "9090:9090"
@@ -87,8 +87,8 @@ services:
8787
ports:
8888
- "3000:3000"
8989
volumes:
90-
- ./config/grafana/datasource.yml:/etc/grafana/provisioning/datasources/datasource.yml:ro
91-
- ./config/grafana/dashboards.yml:/etc/grafana/provisioning/dashboards/dashboards.yml:ro
90+
- ./config/grafana/datasource.yaml:/etc/grafana/provisioning/datasources/datasource.yaml:ro
91+
- ./config/grafana/dashboards.yaml:/etc/grafana/provisioning/dashboards/dashboards.yaml:ro
9292
- ./deploy/llm-router-dashboard.json:/etc/grafana/provisioning/dashboards/llm-router-dashboard.json:ro
9393
networks:
9494
- semantic-network

tools/mock-vllm/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,10 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
66
curl \
77
&& rm -rf /var/lib/apt/lists/*
88

9-
COPY requirements.txt
9+
COPY requirements.txt ./
1010
RUN pip install --no-cache-dir -r requirements.txt
1111

12-
COPY app.py
12+
COPY app.py ./
1313

1414
EXPOSE 8000
1515

Lines changed: 238 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,238 @@
1+
---
2+
sidebar_position: 3
3+
---
4+
5+
# Containerized Deployment
6+
7+
This unified guide helps you quickly run Semantic Router locally (Docker Compose) or in a cluster (Kubernetes) and explains when to choose each path.Both share the same configuration concepts: **Docker Compose** is ideal for rapid iteration and demos, while **Kubernetes** is suited for long‑running workloads, elasticity, and upcoming Operator / CRD scenarios.
8+
9+
## Choosing a Path
10+
11+
**Docker Compose path** = semantic-router + Envoy proxy + optional mock vLLM (testing profile) + Prometheus + Grafana. It gives you an end-to-end local playground with minimal friction.
12+
13+
**Kubernetes path** (current manifests) = ONLY the semantic-router Deployment (gRPC + metrics), a PVC for model cache, its ConfigMap, and two Services (gRPC + metrics). It does NOT yet bundle Envoy, a real LLM inference backend, Istio, or any CRDs/Operator.
14+
15+
| Scenario / Goal | Recommended Path | Why |
16+
| ------------------------------------------- | -------------------------------- | -------------------------------------------------------------------------------- |
17+
| Local dev, quickest iteration, hacking code | Docker Compose | One command starts router + Envoy + (optionally) mock vLLM + observability stack |
18+
| Demo with dashboard quickly | Docker Compose (testing profile) | Bundled Prometheus + Grafana + mock responses |
19+
| Team shared staging / pre‑prod | Kubernetes | Declarative config, rolling upgrades, persistent model volume |
20+
| Performance, scalability, autoscaling | Kubernetes | HPA, scheduling, resource isolation |
21+
| Future Operator / CRD driven config | Kubernetes | Native controller pattern |
22+
23+
You can seamlessly reuse the same configuration concepts in both paths.
24+
25+
---
26+
27+
## Common Prerequisites
28+
29+
- **Docker Engine:** see more in [Docker Engine Installation](https://docs.docker.com/engine/install/)
30+
31+
- **Clone repo:**
32+
33+
```bash
34+
git clone https://github.com/vllm-project/semantic-router.git
35+
cd semantic-router
36+
```
37+
38+
- **Download classification models (≈1.5GB, first run only):**
39+
40+
```bash
41+
make download-models
42+
```
43+
44+
This downloads the classification models used by the router:
45+
46+
- Category classifier (ModernBERT-base)
47+
- PII classifier (ModernBERT-base)
48+
- Jailbreak classifier (ModernBERT-base)
49+
50+
---
51+
52+
## Path A: Docker Compose Quick Start
53+
54+
### Requirements
55+
56+
- Docker Compose v2 (`docker compose` command, not the legacy `docker-compose`)
57+
58+
Install Docker Compose Plugin (if missing), see more in [Docker Compose Plugin Installation](https://docs.docker.com/compose/install/linux/#install-using-the-repository)
59+
60+
```bash
61+
# For Debian / Ubuntu
62+
sudo apt-get update
63+
sudo apt-get install -y docker-compose-plugin
64+
65+
# For RHEL / CentOS / Fedora
66+
sudo yum update -y
67+
sudo yum install -y docker-compose-plugin
68+
69+
# Verify
70+
docker compose version
71+
```
72+
73+
- Ensure ports 8801, 50051, 19000, 3000 and 9090 are free
74+
75+
### Start Services
76+
77+
```bash
78+
# Core (router + envoy)
79+
docker compose up --build
80+
81+
# Detached (recommended once OK)
82+
docker compose up -d --build
83+
84+
# Include mock vLLM + testing profile (points router to mock endpoint)
85+
CONFIG_FILE=/app/config/config.testing.yaml \
86+
docker compose --profile testing up --build
87+
```
88+
89+
### Verify
90+
91+
- gRPC: `localhost:50051`
92+
- Envoy HTTP: `http://localhost:8801`
93+
- Envoy Admin: `http://localhost:19000`
94+
- Prometheus: `http://localhost:9090`
95+
- Grafana: `http://localhost:3000` (`admin` / `admin` for first login)
96+
97+
### Common Operations
98+
99+
```bash
100+
# View service status
101+
docker compose ps
102+
103+
# Follow logs for the router service
104+
docker compose logs -f semantic-router
105+
106+
# Exec into the router container
107+
docker compose exec semantic-router bash
108+
109+
# Recreate after config change
110+
docker compose up -d --build
111+
112+
# Stop and clean up containers
113+
docker compose down
114+
```
115+
116+
---
117+
118+
## Path B: Kubernetes Quick Start
119+
120+
### Requirements
121+
122+
- Kubernetes cluster
123+
- [Kubernetes Official docs](https://kubernetes.io/docs/home/)
124+
- [kind (local clusters)](https://kind.sigs.k8s.io/)
125+
- [k3d (k3s in Docker)](https://k3d.io/)
126+
- [minikube](https://minikube.sigs.k8s.io/docs/)
127+
- [`kubectl`](https://kubernetes.io/docs/tasks/tools/)access (CLI)
128+
- *Optional: Prometheus metrics stack (e.g. [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator))*
129+
- *(Planned / not yet merged) Service Mesh or advanced gateway:*
130+
- *[Istio](https://istio.io/latest/docs/setup/getting-started/) / [Kubernetes Gateway API](https://gateway-api.sigs.k8s.io/)*
131+
- Separate deployment of **Envoy** (or another gateway) + real **LLM endpoints** (follow [Installation guide](https://vllm-semantic-router.com/docs/getting-started/installation)).
132+
- Replace placeholder IPs in `deploy/kubernetes/config.yaml` once services exist.
133+
134+
### Deploy (Kustomize)
135+
136+
```bash
137+
kubectl apply -k deploy/kubernetes/
138+
139+
# Wait for pod
140+
kubectl -n semantic-router get pods
141+
```
142+
143+
Manifests create:
144+
145+
- Deployment (main container + init model downloader)
146+
- Service `semantic-router` (gRPC 50051)
147+
- Service `semantic-router-metrics` (metrics 9190)
148+
- ConfigMap (base config)
149+
- PVC (model cache)
150+
151+
### Port Forward (Ad-hoc)
152+
153+
```bash
154+
kubectl -n semantic-router port-forward svc/semantic-router 50051:50051 &
155+
kubectl -n semantic-router port-forward svc/semantic-router-metrics 9190:9190 &
156+
```
157+
158+
### Observability (Summary)
159+
160+
- Add a `ServiceMonitor` or a static scrape rule
161+
- Import `deploy/llm-router-dashboard.json` (see `observability.md`)
162+
163+
### Updating Config
164+
165+
`deploy/kubernetes/config.yaml` updated:
166+
167+
```bash
168+
kubectl apply -k deploy/kubernetes/
169+
kubectl -n semantic-router rollout restart deploy/semantic-router
170+
```
171+
172+
### Typical Customizations
173+
174+
| Goal | Change |
175+
| ------------------ | --------------------------------------------------- |
176+
| Scale horizontally | `kubectl scale deploy/semantic-router --replicas=N` |
177+
| Resource tuning | Edit `resources:` in `deployment.yaml` |
178+
| Add HTTP readiness | Switch TCP probe -> HTTP `/health` (port 8080) |
179+
| PVC size | Adjust storage request in PVC manifest |
180+
| Metrics scraping | Add ServiceMonitor / scrape rule |
181+
182+
---
183+
184+
## Feature Comparison
185+
186+
| Capability | Docker Compose | Kubernetes |
187+
| ------------------------ | ------------------- | ---------------------------------------------- |
188+
| Startup speed | Fast (seconds) | Depends on cluster/image pull |
189+
| Config reload | Manual recreate | Rolling restart / future Operator / hot reload |
190+
| Model caching | Host volume/bind | PVC persistent across pods |
191+
| Observability | Bundled stack | Integrate existing stack |
192+
| Autoscaling | Manual | HPA / custom metrics |
193+
| Isolation / multi-tenant | Single host network | Namespaces / RBAC |
194+
| Rapid hacking | Minimal friction | YAML overhead |
195+
| Production lifecycle | Basic | Full (probes, rollout, scaling) |
196+
197+
---
198+
199+
## Troubleshooting (Unified)
200+
201+
### HF model download failure / DNS errors
202+
Log example: `Dns Failed: resolve huggingface.co`. See solutions in [Network Tips](https://vllm-semantic-router.com/docs/troubleshooting/network-tips/)
203+
204+
### Port conflicts
205+
206+
Adjust external port mappings in `docker-compose.yml`, or free local ports 8801 / 50051 / 19000.
207+
208+
Extra tip: If you use the testing profile, also pass the testing config so the router targets the mock service:
209+
210+
```bash
211+
CONFIG_FILE=/app/config/config.testing.yaml docker compose --profile testing up --build
212+
```
213+
214+
### Envoy/Router up but requests fail
215+
216+
- Ensure `mock-vllm` is healthy (testing profile only):
217+
- `docker compose ps` should show mock-vllm healthy; logs show 200 on `/health`.
218+
- Verify the router config in use:
219+
- Router logs print `Starting vLLM Semantic Router ExtProc with config: ...`. If it shows `/app/config/config.yaml` while testing, you forgot `CONFIG_FILE`.
220+
- Basic smoke test via Envoy (OpenAI-compatible):
221+
- Send a POST to `http://localhost:8801/v1/chat/completions` with `{"model":"auto", "messages":[{"role":"user","content":"hi"}]}` and check that the mock responds with `[mock-openai/gpt-oss-20b]` content when testing profile is active.
222+
223+
### DNS problems inside containers
224+
225+
If DNS is flaky in your Docker environment, add DNS servers to the `semantic-router` service in `docker-compose.yml`:
226+
227+
```yaml
228+
services:
229+
semantic-router:
230+
# ...
231+
dns:
232+
- 1.1.1.1
233+
- 8.8.8.8
234+
```
235+
236+
For corporate proxies, set `http_proxy`, `https_proxy`, and `no_proxy` in the service `environment`.
237+
238+
Make sure 8801, 50051, 19000 are not bound by other processes. Adjust ports in `docker-compose.yml` if needed.

0 commit comments

Comments
 (0)