|
| 1 | +--- |
| 2 | +sidebar_position: 3 |
| 3 | +--- |
| 4 | + |
| 5 | +# Containerized Deployment |
| 6 | + |
| 7 | +This unified guide helps you quickly run Semantic Router locally (Docker Compose) or in a cluster (Kubernetes) and explains when to choose each path.Both share the same configuration concepts: **Docker Compose** is ideal for rapid iteration and demos, while **Kubernetes** is suited for long‑running workloads, elasticity, and upcoming Operator / CRD scenarios. |
| 8 | + |
| 9 | +## Choosing a Path |
| 10 | + |
| 11 | +**Docker Compose path** = semantic-router + Envoy proxy + optional mock vLLM (testing profile) + Prometheus + Grafana. It gives you an end-to-end local playground with minimal friction. |
| 12 | + |
| 13 | +**Kubernetes path** (current manifests) = ONLY the semantic-router Deployment (gRPC + metrics), a PVC for model cache, its ConfigMap, and two Services (gRPC + metrics). It does NOT yet bundle Envoy, a real LLM inference backend, Istio, or any CRDs/Operator. |
| 14 | + |
| 15 | +| Scenario / Goal | Recommended Path | Why | |
| 16 | +| ------------------------------------------- | -------------------------------- | -------------------------------------------------------------------------------- | |
| 17 | +| Local dev, quickest iteration, hacking code | Docker Compose | One command starts router + Envoy + (optionally) mock vLLM + observability stack | |
| 18 | +| Demo with dashboard quickly | Docker Compose (testing profile) | Bundled Prometheus + Grafana + mock responses | |
| 19 | +| Team shared staging / pre‑prod | Kubernetes | Declarative config, rolling upgrades, persistent model volume | |
| 20 | +| Performance, scalability, autoscaling | Kubernetes | HPA, scheduling, resource isolation | |
| 21 | +| Future Operator / CRD driven config | Kubernetes | Native controller pattern | |
| 22 | + |
| 23 | +You can seamlessly reuse the same configuration concepts in both paths. |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +## Common Prerequisites |
| 28 | + |
| 29 | +- **Docker Engine:** see more in [Docker Engine Installation](https://docs.docker.com/engine/install/) |
| 30 | + |
| 31 | +- **Clone repo:** |
| 32 | + |
| 33 | + ```bash |
| 34 | + git clone https://github.com/vllm-project/semantic-router.git |
| 35 | + cd semantic-router |
| 36 | + ``` |
| 37 | + |
| 38 | +- **Download classification models (≈1.5GB, first run only):** |
| 39 | + |
| 40 | + ```bash |
| 41 | + make download-models |
| 42 | + ``` |
| 43 | + |
| 44 | + This downloads the classification models used by the router: |
| 45 | + |
| 46 | + - Category classifier (ModernBERT-base) |
| 47 | + - PII classifier (ModernBERT-base) |
| 48 | + - Jailbreak classifier (ModernBERT-base) |
| 49 | + |
| 50 | +--- |
| 51 | + |
| 52 | +## Path A: Docker Compose Quick Start |
| 53 | + |
| 54 | +### Requirements |
| 55 | + |
| 56 | +- Docker Compose v2 (`docker compose` command, not the legacy `docker-compose`) |
| 57 | + |
| 58 | + Install Docker Compose Plugin (if missing), see more in [Docker Compose Plugin Installation](https://docs.docker.com/compose/install/linux/#install-using-the-repository) |
| 59 | + |
| 60 | + ```bash |
| 61 | + # For Debian / Ubuntu |
| 62 | + sudo apt-get update |
| 63 | + sudo apt-get install -y docker-compose-plugin |
| 64 | + |
| 65 | + # For RHEL / CentOS / Fedora |
| 66 | + sudo yum update -y |
| 67 | + sudo yum install -y docker-compose-plugin |
| 68 | + |
| 69 | + # Verify |
| 70 | + docker compose version |
| 71 | + ``` |
| 72 | + |
| 73 | +- Ensure ports 8801, 50051, 19000, 3000 and 9090 are free |
| 74 | + |
| 75 | +### Start Services |
| 76 | + |
| 77 | +```bash |
| 78 | +# Core (router + envoy) |
| 79 | +docker compose up --build |
| 80 | + |
| 81 | +# Detached (recommended once OK) |
| 82 | +docker compose up -d --build |
| 83 | + |
| 84 | +# Include mock vLLM + testing profile (points router to mock endpoint) |
| 85 | +CONFIG_FILE=/app/config/config.testing.yaml \ |
| 86 | + docker compose --profile testing up --build |
| 87 | +``` |
| 88 | + |
| 89 | +### Verify |
| 90 | + |
| 91 | +- gRPC: `localhost:50051` |
| 92 | +- Envoy HTTP: `http://localhost:8801` |
| 93 | +- Envoy Admin: `http://localhost:19000` |
| 94 | +- Prometheus: `http://localhost:9090` |
| 95 | +- Grafana: `http://localhost:3000` (`admin` / `admin` for first login) |
| 96 | + |
| 97 | +### Common Operations |
| 98 | + |
| 99 | +```bash |
| 100 | +# View service status |
| 101 | +docker compose ps |
| 102 | + |
| 103 | +# Follow logs for the router service |
| 104 | +docker compose logs -f semantic-router |
| 105 | + |
| 106 | +# Exec into the router container |
| 107 | +docker compose exec semantic-router bash |
| 108 | + |
| 109 | +# Recreate after config change |
| 110 | +docker compose up -d --build |
| 111 | + |
| 112 | +# Stop and clean up containers |
| 113 | +docker compose down |
| 114 | +``` |
| 115 | + |
| 116 | +--- |
| 117 | + |
| 118 | +## Path B: Kubernetes Quick Start |
| 119 | + |
| 120 | +### Requirements |
| 121 | + |
| 122 | +- Kubernetes cluster |
| 123 | + - [Kubernetes Official docs](https://kubernetes.io/docs/home/) |
| 124 | + - [kind (local clusters)](https://kind.sigs.k8s.io/) |
| 125 | + - [k3d (k3s in Docker)](https://k3d.io/) |
| 126 | + - [minikube](https://minikube.sigs.k8s.io/docs/) |
| 127 | +- [`kubectl`](https://kubernetes.io/docs/tasks/tools/)access (CLI) |
| 128 | +- *Optional: Prometheus metrics stack (e.g. [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator))* |
| 129 | +- *(Planned / not yet merged) Service Mesh or advanced gateway:* |
| 130 | + - *[Istio](https://istio.io/latest/docs/setup/getting-started/) / [Kubernetes Gateway API](https://gateway-api.sigs.k8s.io/)* |
| 131 | +- Separate deployment of **Envoy** (or another gateway) + real **LLM endpoints** (follow [Installation guide](https://vllm-semantic-router.com/docs/getting-started/installation)). |
| 132 | + - Replace placeholder IPs in `deploy/kubernetes/config.yaml` once services exist. |
| 133 | + |
| 134 | +### Deploy (Kustomize) |
| 135 | + |
| 136 | +```bash |
| 137 | +kubectl apply -k deploy/kubernetes/ |
| 138 | + |
| 139 | +# Wait for pod |
| 140 | +kubectl -n semantic-router get pods |
| 141 | +``` |
| 142 | + |
| 143 | +Manifests create: |
| 144 | + |
| 145 | +- Deployment (main container + init model downloader) |
| 146 | +- Service `semantic-router` (gRPC 50051) |
| 147 | +- Service `semantic-router-metrics` (metrics 9190) |
| 148 | +- ConfigMap (base config) |
| 149 | +- PVC (model cache) |
| 150 | + |
| 151 | +### Port Forward (Ad-hoc) |
| 152 | + |
| 153 | +```bash |
| 154 | +kubectl -n semantic-router port-forward svc/semantic-router 50051:50051 & |
| 155 | +kubectl -n semantic-router port-forward svc/semantic-router-metrics 9190:9190 & |
| 156 | +``` |
| 157 | + |
| 158 | +### Observability (Summary) |
| 159 | + |
| 160 | +- Add a `ServiceMonitor` or a static scrape rule |
| 161 | +- Import `deploy/llm-router-dashboard.json` (see `observability.md`) |
| 162 | + |
| 163 | +### Updating Config |
| 164 | + |
| 165 | +`deploy/kubernetes/config.yaml` updated: |
| 166 | + |
| 167 | +```bash |
| 168 | +kubectl apply -k deploy/kubernetes/ |
| 169 | +kubectl -n semantic-router rollout restart deploy/semantic-router |
| 170 | +``` |
| 171 | + |
| 172 | +### Typical Customizations |
| 173 | + |
| 174 | +| Goal | Change | |
| 175 | +| ------------------ | --------------------------------------------------- | |
| 176 | +| Scale horizontally | `kubectl scale deploy/semantic-router --replicas=N` | |
| 177 | +| Resource tuning | Edit `resources:` in `deployment.yaml` | |
| 178 | +| Add HTTP readiness | Switch TCP probe -> HTTP `/health` (port 8080) | |
| 179 | +| PVC size | Adjust storage request in PVC manifest | |
| 180 | +| Metrics scraping | Add ServiceMonitor / scrape rule | |
| 181 | + |
| 182 | +--- |
| 183 | + |
| 184 | +## Feature Comparison |
| 185 | + |
| 186 | +| Capability | Docker Compose | Kubernetes | |
| 187 | +| ------------------------ | ------------------- | ---------------------------------------------- | |
| 188 | +| Startup speed | Fast (seconds) | Depends on cluster/image pull | |
| 189 | +| Config reload | Manual recreate | Rolling restart / future Operator / hot reload | |
| 190 | +| Model caching | Host volume/bind | PVC persistent across pods | |
| 191 | +| Observability | Bundled stack | Integrate existing stack | |
| 192 | +| Autoscaling | Manual | HPA / custom metrics | |
| 193 | +| Isolation / multi-tenant | Single host network | Namespaces / RBAC | |
| 194 | +| Rapid hacking | Minimal friction | YAML overhead | |
| 195 | +| Production lifecycle | Basic | Full (probes, rollout, scaling) | |
| 196 | + |
| 197 | +--- |
| 198 | + |
| 199 | +## Troubleshooting (Unified) |
| 200 | + |
| 201 | +### HF model download failure / DNS errors |
| 202 | +Log example: `Dns Failed: resolve huggingface.co`. See solutions in [Network Tips](https://vllm-semantic-router.com/docs/troubleshooting/network-tips/) |
| 203 | + |
| 204 | +### Port conflicts |
| 205 | + |
| 206 | +Adjust external port mappings in `docker-compose.yml`, or free local ports 8801 / 50051 / 19000. |
| 207 | + |
| 208 | +Extra tip: If you use the testing profile, also pass the testing config so the router targets the mock service: |
| 209 | + |
| 210 | +```bash |
| 211 | +CONFIG_FILE=/app/config/config.testing.yaml docker compose --profile testing up --build |
| 212 | +``` |
| 213 | + |
| 214 | +### Envoy/Router up but requests fail |
| 215 | + |
| 216 | +- Ensure `mock-vllm` is healthy (testing profile only): |
| 217 | + - `docker compose ps` should show mock-vllm healthy; logs show 200 on `/health`. |
| 218 | +- Verify the router config in use: |
| 219 | + - Router logs print `Starting vLLM Semantic Router ExtProc with config: ...`. If it shows `/app/config/config.yaml` while testing, you forgot `CONFIG_FILE`. |
| 220 | +- Basic smoke test via Envoy (OpenAI-compatible): |
| 221 | + - Send a POST to `http://localhost:8801/v1/chat/completions` with `{"model":"auto", "messages":[{"role":"user","content":"hi"}]}` and check that the mock responds with `[mock-openai/gpt-oss-20b]` content when testing profile is active. |
| 222 | + |
| 223 | +### DNS problems inside containers |
| 224 | + |
| 225 | +If DNS is flaky in your Docker environment, add DNS servers to the `semantic-router` service in `docker-compose.yml`: |
| 226 | + |
| 227 | +```yaml |
| 228 | +services: |
| 229 | + semantic-router: |
| 230 | + # ... |
| 231 | + dns: |
| 232 | + - 1.1.1.1 |
| 233 | + - 8.8.8.8 |
| 234 | +``` |
| 235 | +
|
| 236 | +For corporate proxies, set `http_proxy`, `https_proxy`, and `no_proxy` in the service `environment`. |
| 237 | + |
| 238 | +Make sure 8801, 50051, 19000 are not bound by other processes. Adjust ports in `docker-compose.yml` if needed. |
0 commit comments