You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After applying `deploy/kubernetes/`, you get services:
52
+
This guide adds a production-ready Prometheus + Grafana stack to the existing Semantic Router Kubernetes deployment. It includes manifests for collectors, dashboards, data sources, RBAC, and ingress so you can monitor routing performance in any cluster.
53
53
54
-
-`semantic-router` (gRPC)
55
-
-`semantic-router-metrics` (metrics 9190)
54
+
> **Namespace** – All manifests default to the `vllm-semantic-router-system` namespace to match the core deployment. Override it with Kustomize if you use a different namespace.
56
55
57
-
### 3.1 Prometheus Operator (ServiceMonitor)
56
+
##What Gets Installed
58
57
59
-
```yaml
60
-
apiVersion: monitoring.coreos.com/v1
61
-
kind: ServiceMonitor
62
-
metadata:
63
-
name: semantic-router
64
-
namespace: semantic-router
65
-
spec:
66
-
selector:
67
-
matchLabels:
68
-
app: semantic-router
69
-
service: metrics
70
-
namespaceSelector:
71
-
matchNames: ["semantic-router"]
72
-
endpoints:
73
-
- port: metrics
74
-
interval: 15s
75
-
path: /metrics
58
+
| Component | Purpose | Key Files |
59
+
|--------------|---------|-----------|
60
+
| Prometheus | Scrapes Semantic Router metrics and stores them with persistent retention |`prometheus/` (`rbac.yaml`, `configmap.yaml`, `deployment.yaml`, `pvc.yaml`, `service.yaml`)|
61
+
| Grafana | Visualizes metrics using the bundled LLM Router dashboard and a pre-configured Prometheus datasource |`grafana/` (`secret.yaml`, `configmap-*.yaml`, `deployment.yaml`, `pvc.yaml`, `service.yaml`)|
62
+
| Ingress (optional) | Exposes the UIs outside the cluster |`ingress.yaml`|
63
+
| Dashboard provisioning | Automatically loads `deploy/llm-router-dashboard.json` into Grafana |`grafana/configmap-dashboard.yaml`|
64
+
65
+
Prometheus is configured to discover the `semantic-router-metrics` service (port `9190`) automatically. Grafana provisions the same LLM Router dashboard that ships with the Docker Compose stack.
66
+
67
+
### 1. Prerequisites
68
+
69
+
- Deployed Semantic Router workload via `deploy/kubernetes/`
70
+
- A Kubernetes cluster (managed, on-prem, or kind)
71
+
-`kubectl` v1.23+
72
+
- Optional: an ingress controller (NGINX, ALB, etc.) if you want external access
73
+
74
+
### 2. Directory Layout
75
+
76
+
```
77
+
deploy/kubernetes/observability/
78
+
├── README.md
79
+
├── kustomization.yaml # (created in the next step)
Remove or overwrite the committed `secret.yaml` when you adopt a different secret management approach.
101
143
102
-
If using kube-prometheus-stack or a Grafana sidecar:
144
+
### 5. Deployment Steps
103
145
104
-
```yaml
105
-
apiVersion: v1
106
-
kind: ConfigMap
107
-
metadata:
108
-
name: semantic-router-dashboard
109
-
namespace: semantic-router
110
-
labels:
111
-
grafana_dashboard: "1"
112
-
data:
113
-
llm-router-dashboard.json: |
114
-
# Paste JSON from deploy/llm-router-dashboard.json
146
+
#### 5.1. Create the Kustomization
147
+
148
+
Create `deploy/kubernetes/observability/kustomization.yaml` (see below) to assemble all manifests. This guide assumes you keep Prometheus & Grafana in the same namespace as the router.
149
+
150
+
#### 5.2. Apply manifests
151
+
152
+
```bash
153
+
kubectl apply -k deploy/kubernetes/observability/
115
154
```
116
155
117
-
Otherwise import the JSON manually in Grafana UI.
156
+
Verify pods:
118
157
119
-
---
158
+
```bash
159
+
kubectl get pods -n vllm-semantic-router-system
160
+
```
161
+
162
+
You should see `prometheus-...` and `grafana-...` pods in `Running` state.
163
+
164
+
#### 5.3. Integration with the core deployment
165
+
166
+
1. Deploy or update Semantic Router (`kubectl apply -k deploy/kubernetes/`).
- **Ingress (production)** – Customize `ingress.yaml` with real domains, TLS secrets, and your ingress class before applying. Replace `*.example.com` and configure HTTPS certificates via cert-manager or your provider.
190
+
191
+
### 6. Verifying Metrics Collection
192
+
193
+
1. Open Prometheus (port-forward or ingress) → **Status ▸ Targets** → ensure `semantic-router` job is green.
194
+
2. Query `rate(llm_model_completion_tokens_total[5m])` – should return data after traffic.
195
+
3. Open Grafana, log in with the admin credentials, and confirm the **LLM Router Metrics** dashboard exists under the *Semantic Router* folder.
196
+
4. Generate traffic to Semantic Router (classification or routing requests). Key panels should start populating:
197
+
- Prompt Category counts
198
+
- Token usage rate per model
199
+
- Routing modifications between models
200
+
- Latency histograms (TTFT, completion p95)
201
+
202
+
### 7. Dashboard Customization
203
+
204
+
- Duplicate the provisioned dashboard inside Grafana to make changes while keeping the original as a template.
205
+
- Update Grafana provisioning (`grafana/configmap-provisioning.yaml`) to point to alternate folders or add new providers.
206
+
- Add additional dashboards by extending `grafana/configmap-dashboard.yaml` or mounting a different ConfigMap.
207
+
- Incorporate Kubernetes cluster metrics (CPU/memory) by adding another datasource or deploying kube-state-metrics + node exporters.
208
+
209
+
### 8. Best Practices
210
+
211
+
#### Resource Sizing
212
+
213
+
- Prometheus: increase CPU/memory with higher scrape cardinality or retention > 15 days.
214
+
- Grafana: start with `500m` CPU / `1Gi` RAM; scale replicas horizontally when concurrent viewers exceed a few dozen.
215
+
216
+
#### Storage
217
+
218
+
- Use SSD-backed storage classes for Prometheus when retention/window is large.
219
+
- Increase `prometheus/pvc.yaml` (default 20Gi) and `grafana/pvc.yaml` (default 10Gi) to match retention requirements.
220
+
- Enable volume snapshots or backups for dashboards and alert history.
221
+
222
+
#### Security
223
+
224
+
- Replace the demo `grafana-admin` secret with credentials stored in your preferred secret manager.
225
+
- Restrict ingress access with network policies, OAuth proxies, or SSO integrations.
226
+
- Enable Grafana role-based access control and API keys for automation.
227
+
- Scope Prometheus RBAC to only the namespaces you need. If metrics run in multiple namespaces, list them in the scrape config.
228
+
229
+
#### Maintenance
230
+
231
+
- Monitor Prometheus disk usage; prune retention or scale PVC before it fills up.
232
+
- Back up Grafana dashboards or store them in Git (already done through this ConfigMap).
233
+
- Roll upgrades separately: update Prometheus and Grafana images via `kustomization.yaml` patches.
234
+
- Consider adopting the Prometheus Operator (`ServiceMonitor` + `PodMonitor`) if you already run kube-prometheus-stack. A sample `ServiceMonitor` is in `website/docs/tutorials/observability/observability.md`.
235
+
236
+
### 9. Troubleshooting
237
+
238
+
| Symptom | Checks | Fix |
239
+
|---------|--------|-----|
240
+
| Prometheus target **DOWN** | `kubectl get endpoints semantic-router-metrics -n vllm-semantic-router-system` | Ensure the Semantic Router deployment is running and the service labels match `app=semantic-router`, `service=metrics` |
241
+
| Grafana dashboard empty | **Configuration → Data Sources** | Confirm Prometheus datasource URL resolves and the Prometheus service is reachable |
242
+
| Login fails | `kubectl get secret grafana-admin -o yaml` | Update the secret to match the credentials you expect |
243
+
| PVC Pending | `kubectl describe pvc prometheus-data` | Provide a storage class via `storageClassName`, or provision storage manually |
244
+
| Ingress 404 | `kubectl describe ingress grafana` | Update hostnames, TLS secrets, and ensure ingress controller is installed |
- Add log aggregation (Loki, Elasticsearch, or Cloud-native logging)
250
+
- Automate stack deployment through CI/CD pipelines using `kubectl apply -k`
251
+
252
+
With this observability stack in place, you can track Semantic Router health, routing accuracy, latency distributions, and usage trends across any Kubernetes environment.
0 commit comments