|
| 1 | +# Semantic Router Observability on Kubernetes |
| 2 | + |
| 3 | +This guide adds a production-ready Prometheus + Grafana stack to the existing Semantic Router Kubernetes deployment. It includes manifests for collectors, dashboards, data sources, RBAC, and ingress so you can monitor routing performance in any cluster. |
| 4 | + |
| 5 | +> **Namespace** – All manifests default to the `vllm-semantic-router-system` namespace to match the core deployment. Override it with Kustomize if you use a different namespace. |
| 6 | +
|
| 7 | +## What Gets Installed |
| 8 | + |
| 9 | +| Component | Purpose | Key Files | |
| 10 | +|--------------|---------|-----------| |
| 11 | +| Prometheus | Scrapes Semantic Router metrics and stores them with persistent retention | `prometheus/` (`rbac.yaml`, `configmap.yaml`, `deployment.yaml`, `pvc.yaml`, `service.yaml`)| |
| 12 | +| Grafana | Visualizes metrics using the bundled LLM Router dashboard and a pre-configured Prometheus datasource | `grafana/` (`secret.yaml`, `configmap-*.yaml`, `deployment.yaml`, `pvc.yaml`, `service.yaml`)| |
| 13 | +| Ingress (optional) | Exposes the UIs outside the cluster | `ingress.yaml`| |
| 14 | +| Dashboard provisioning | Automatically loads `deploy/llm-router-dashboard.json` into Grafana | `grafana/configmap-dashboard.yaml`| |
| 15 | + |
| 16 | +Prometheus is configured to discover the `semantic-router-metrics` service (port `9190`) automatically. Grafana provisions the same LLM Router dashboard that ships with the Docker Compose stack. |
| 17 | + |
| 18 | +## 1. Prerequisites |
| 19 | + |
| 20 | +- Deployed Semantic Router workload via `deploy/kubernetes/` |
| 21 | +- A Kubernetes cluster (managed, on-prem, or kind) |
| 22 | +- `kubectl` v1.23+ |
| 23 | +- Optional: an ingress controller (NGINX, ALB, etc.) if you want external access |
| 24 | + |
| 25 | +## 2. Directory Layout |
| 26 | + |
| 27 | +``` |
| 28 | +deploy/kubernetes/observability/ |
| 29 | +├── README.md |
| 30 | +├── kustomization.yaml # (created in the next step) |
| 31 | +├── ingress.yaml # optional HTTPS ingress examples |
| 32 | +├── prometheus/ |
| 33 | +│ ├── configmap.yaml # Scrape config (Kubernetes SD) |
| 34 | +│ ├── deployment.yaml |
| 35 | +│ ├── pvc.yaml |
| 36 | +│ ├── rbac.yaml # SA + ClusterRole + binding |
| 37 | +│ └── service.yaml |
| 38 | +└── grafana/ |
| 39 | + ├── configmap-dashboard.yaml # Bundled LLM router dashboard |
| 40 | + ├── configmap-provisioning.yaml # Datasource + provider config |
| 41 | + ├── deployment.yaml |
| 42 | + ├── pvc.yaml |
| 43 | + ├── secret.yaml # Admin credentials (override in prod) |
| 44 | + └── service.yaml |
| 45 | +``` |
| 46 | + |
| 47 | +## 3. Prometheus Configuration Highlights |
| 48 | + |
| 49 | +- Uses `kubernetes_sd_configs` to enumerate endpoints in `vllm-semantic-router-system` |
| 50 | +- Keeps 15 days of metrics by default (`--storage.tsdb.retention.time=15d`) |
| 51 | +- Stores metrics in a `PersistentVolumeClaim` named `prometheus-data` |
| 52 | +- RBAC rules grant read-only access to Services, Endpoints, Pods, Nodes, and EndpointSlices |
| 53 | + |
| 54 | +### Scrape configuration snippet |
| 55 | + |
| 56 | +```yaml |
| 57 | +scrape_configs: |
| 58 | + - job_name: semantic-router |
| 59 | + kubernetes_sd_configs: |
| 60 | + - role: endpoints |
| 61 | + namespaces: |
| 62 | + names: |
| 63 | + - vllm-semantic-router-system |
| 64 | + relabel_configs: |
| 65 | + - source_labels: [__meta_kubernetes_service_name] |
| 66 | + regex: semantic-router-metrics |
| 67 | + action: keep |
| 68 | + - source_labels: [__meta_kubernetes_endpoint_port_name] |
| 69 | + regex: metrics |
| 70 | + action: keep |
| 71 | +``` |
| 72 | +
|
| 73 | +Modify the namespace or service name if you changed them in your primary deployment. |
| 74 | +
|
| 75 | +## 4. Grafana Configuration Highlights |
| 76 | +
|
| 77 | +- Stateful deployment backed by the `grafana-storage` PVC |
| 78 | +- Datasource provisioned automatically pointing to `http://prometheus:9090` |
| 79 | +- Dashboard provider watches `/var/lib/grafana-dashboards` |
| 80 | +- Bundled `llm-router-dashboard.json` is identical to `deploy/llm-router-dashboard.json` |
| 81 | +- Admin credentials pulled from the `grafana-admin` secret (default `admin/admin` – **change this!)** |
| 82 | + |
| 83 | +### Updating credentials |
| 84 | + |
| 85 | +```bash |
| 86 | +kubectl create secret generic grafana-admin \ |
| 87 | + --namespace vllm-semantic-router-system \ |
| 88 | + --from-literal=admin-user=monitor \ |
| 89 | + --from-literal=admin-password='pick-a-strong-password' \ |
| 90 | + --dry-run=client -o yaml | kubectl apply -f - |
| 91 | +``` |
| 92 | + |
| 93 | +Remove or overwrite the committed `secret.yaml` when you adopt a different secret management approach. |
| 94 | + |
| 95 | +## 5. Deployment Steps |
| 96 | + |
| 97 | +### 5.1. Create the Kustomization |
| 98 | + |
| 99 | +Create `deploy/kubernetes/observability/kustomization.yaml` (see below) to assemble all manifests. This guide assumes you keep Prometheus & Grafana in the same namespace as the router. |
| 100 | + |
| 101 | +### 5.2. Apply manifests |
| 102 | + |
| 103 | +```bash |
| 104 | +kubectl apply -k deploy/kubernetes/observability/ |
| 105 | +``` |
| 106 | + |
| 107 | +Verify pods: |
| 108 | + |
| 109 | +```bash |
| 110 | +kubectl get pods -n vllm-semantic-router-system |
| 111 | +``` |
| 112 | + |
| 113 | +You should see `prometheus-...` and `grafana-...` pods in `Running` state. |
| 114 | + |
| 115 | +### 5.3. Integration with the core deployment |
| 116 | + |
| 117 | +1. Deploy or update Semantic Router (`kubectl apply -k deploy/kubernetes/`). |
| 118 | +2. Deploy observability stack (`kubectl apply -k deploy/kubernetes/observability/`). |
| 119 | +3. Confirm the metrics service (`semantic-router-metrics`) has endpoints: |
| 120 | + |
| 121 | + ```bash |
| 122 | + kubectl get endpoints semantic-router-metrics -n vllm-semantic-router-system |
| 123 | + ``` |
| 124 | + |
| 125 | +4. Prometheus target should transition to **UP** within ~15 seconds. |
| 126 | + |
| 127 | +### 5.4. Accessing the UIs |
| 128 | + |
| 129 | +> **Optional Ingress** – If you prefer to keep the stack private, delete `ingress.yaml` from `kustomization.yaml` before applying. |
| 130 | + |
| 131 | +- **Port-forward (quick check)** |
| 132 | + |
| 133 | + ```bash |
| 134 | + kubectl port-forward svc/prometheus 9090:9090 -n vllm-semantic-router-system |
| 135 | + kubectl port-forward svc/grafana 3000:3000 -n vllm-semantic-router-system |
| 136 | + ``` |
| 137 | + |
| 138 | + Prometheus → http://localhost:9090, Grafana → http://localhost:3000 |
| 139 | + |
| 140 | +- **Ingress (production)** – Customize `ingress.yaml` with real domains, TLS secrets, and your ingress class before applying. Replace `*.example.com` and configure HTTPS certificates via cert-manager or your provider. |
| 141 | + |
| 142 | +## 6. Verifying Metrics Collection |
| 143 | + |
| 144 | +1. Open Prometheus (port-forward or ingress) → **Status ▸ Targets** → ensure `semantic-router` job is green. |
| 145 | +2. Query `rate(llm_model_completion_tokens_total[5m])` – should return data after traffic. |
| 146 | +3. Open Grafana, log in with the admin credentials, and confirm the **LLM Router Metrics** dashboard exists under the *Semantic Router* folder. |
| 147 | +4. Generate traffic to Semantic Router (classification or routing requests). Key panels should start populating: |
| 148 | + - Prompt Category counts |
| 149 | + - Token usage rate per model |
| 150 | + - Routing modifications between models |
| 151 | + - Latency histograms (TTFT, completion p95) |
| 152 | + |
| 153 | +## 7. Dashboard Customization |
| 154 | + |
| 155 | +- Duplicate the provisioned dashboard inside Grafana to make changes while keeping the original as a template. |
| 156 | +- Update Grafana provisioning (`grafana/configmap-provisioning.yaml`) to point to alternate folders or add new providers. |
| 157 | +- Add additional dashboards by extending `grafana/configmap-dashboard.yaml` or mounting a different ConfigMap. |
| 158 | +- Incorporate Kubernetes cluster metrics (CPU/memory) by adding another datasource or deploying kube-state-metrics + node exporters. |
| 159 | + |
| 160 | +## 8. Best Practices |
| 161 | + |
| 162 | +### Resource Sizing |
| 163 | + |
| 164 | +- Prometheus: increase CPU/memory with higher scrape cardinality or retention > 15 days. |
| 165 | +- Grafana: start with `500m` CPU / `1Gi` RAM; scale replicas horizontally when concurrent viewers exceed a few dozen. |
| 166 | + |
| 167 | +### Storage |
| 168 | + |
| 169 | +- Use SSD-backed storage classes for Prometheus when retention/window is large. |
| 170 | +- Increase `prometheus/pvc.yaml` (default 20Gi) and `grafana/pvc.yaml` (default 10Gi) to match retention requirements. |
| 171 | +- Enable volume snapshots or backups for dashboards and alert history. |
| 172 | + |
| 173 | +### Security |
| 174 | + |
| 175 | +- Replace the demo `grafana-admin` secret with credentials stored in your preferred secret manager. |
| 176 | +- Restrict ingress access with network policies, OAuth proxies, or SSO integrations. |
| 177 | +- Enable Grafana role-based access control and API keys for automation. |
| 178 | +- Scope Prometheus RBAC to only the namespaces you need. If metrics run in multiple namespaces, list them in the scrape config. |
| 179 | + |
| 180 | +### Maintenance |
| 181 | + |
| 182 | +- Monitor Prometheus disk usage; prune retention or scale PVC before it fills up. |
| 183 | +- Back up Grafana dashboards or store them in Git (already done through this ConfigMap). |
| 184 | +- Roll upgrades separately: update Prometheus and Grafana images via `kustomization.yaml` patches. |
| 185 | +- Consider adopting the Prometheus Operator (`ServiceMonitor` + `PodMonitor`) if you already run kube-prometheus-stack. A sample `ServiceMonitor` is in `website/docs/tutorials/observability/observability.md`. |
| 186 | + |
| 187 | +## 9. Troubleshooting |
| 188 | + |
| 189 | +| Symptom | Checks | Fix | |
| 190 | +|---------|--------|-----| |
| 191 | +| Prometheus target **DOWN** | `kubectl get endpoints semantic-router-metrics -n vllm-semantic-router-system` | Ensure the Semantic Router deployment is running and the service labels match `app=semantic-router`, `service=metrics` | |
| 192 | +| Grafana dashboard empty | **Configuration → Data Sources** | Confirm Prometheus datasource URL resolves and the Prometheus service is reachable | |
| 193 | +| Login fails | `kubectl get secret grafana-admin -o yaml` | Update the secret to match the credentials you expect | |
| 194 | +| PVC Pending | `kubectl describe pvc prometheus-data` | Provide a storage class via `storageClassName`, or provision storage manually | |
| 195 | +| Ingress 404 | `kubectl describe ingress grafana` | Update hostnames, TLS secrets, and ensure ingress controller is installed | |
| 196 | + |
| 197 | +## 10. Next Steps |
| 198 | + |
| 199 | +- Configure alerts for critical metrics (Prometheus alerting rules + Alertmanager) |
| 200 | +- Add log aggregation (Loki, Elasticsearch, or Cloud-native logging) |
| 201 | +- Automate stack deployment through CI/CD pipelines using `kubectl apply -k` |
| 202 | + |
| 203 | +With this observability stack in place, you can track Semantic Router health, routing accuracy, latency distributions, and usage trends across any Kubernetes environment. |
0 commit comments