Skip to content

Commit dff45b3

Browse files
JaredforRealrootfs
andauthored
feat: add Grafana+Prometheus in k8s (#294)
* feat: add Grafava+Prometheus in k8s Signed-off-by: JaredforReal <[email protected]> * Update docs of observability k8s part Signed-off-by: JaredforReal <[email protected]> * get rig of redudent part in doc Signed-off-by: JaredforReal <[email protected]> * add comments of 472 and 65534 Signed-off-by: JaredforReal <[email protected]> * add network tips of k8s Signed-off-by: JaredforReal <[email protected]> * update uid in dashboard Signed-off-by: JaredforReal <[email protected]> --------- Signed-off-by: JaredforReal <[email protected]> Co-authored-by: Huamin Chen <[email protected]>
1 parent f295f45 commit dff45b3

File tree

17 files changed

+1440
-56
lines changed

17 files changed

+1440
-56
lines changed
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
# Semantic Router Observability on Kubernetes
2+
3+
This guide adds a production-ready Prometheus + Grafana stack to the existing Semantic Router Kubernetes deployment. It includes manifests for collectors, dashboards, data sources, RBAC, and ingress so you can monitor routing performance in any cluster.
4+
5+
> **Namespace** – All manifests default to the `vllm-semantic-router-system` namespace to match the core deployment. Override it with Kustomize if you use a different namespace.
6+
7+
## What Gets Installed
8+
9+
| Component | Purpose | Key Files |
10+
|--------------|---------|-----------|
11+
| Prometheus | Scrapes Semantic Router metrics and stores them with persistent retention | `prometheus/` (`rbac.yaml`, `configmap.yaml`, `deployment.yaml`, `pvc.yaml`, `service.yaml`)|
12+
| Grafana | Visualizes metrics using the bundled LLM Router dashboard and a pre-configured Prometheus datasource | `grafana/` (`secret.yaml`, `configmap-*.yaml`, `deployment.yaml`, `pvc.yaml`, `service.yaml`)|
13+
| Ingress (optional) | Exposes the UIs outside the cluster | `ingress.yaml`|
14+
| Dashboard provisioning | Automatically loads `deploy/llm-router-dashboard.json` into Grafana | `grafana/configmap-dashboard.yaml`|
15+
16+
Prometheus is configured to discover the `semantic-router-metrics` service (port `9190`) automatically. Grafana provisions the same LLM Router dashboard that ships with the Docker Compose stack.
17+
18+
## 1. Prerequisites
19+
20+
- Deployed Semantic Router workload via `deploy/kubernetes/`
21+
- A Kubernetes cluster (managed, on-prem, or kind)
22+
- `kubectl` v1.23+
23+
- Optional: an ingress controller (NGINX, ALB, etc.) if you want external access
24+
25+
## 2. Directory Layout
26+
27+
```
28+
deploy/kubernetes/observability/
29+
├── README.md
30+
├── kustomization.yaml # (created in the next step)
31+
├── ingress.yaml # optional HTTPS ingress examples
32+
├── prometheus/
33+
│ ├── configmap.yaml # Scrape config (Kubernetes SD)
34+
│ ├── deployment.yaml
35+
│ ├── pvc.yaml
36+
│ ├── rbac.yaml # SA + ClusterRole + binding
37+
│ └── service.yaml
38+
└── grafana/
39+
├── configmap-dashboard.yaml # Bundled LLM router dashboard
40+
├── configmap-provisioning.yaml # Datasource + provider config
41+
├── deployment.yaml
42+
├── pvc.yaml
43+
├── secret.yaml # Admin credentials (override in prod)
44+
└── service.yaml
45+
```
46+
47+
## 3. Prometheus Configuration Highlights
48+
49+
- Uses `kubernetes_sd_configs` to enumerate endpoints in `vllm-semantic-router-system`
50+
- Keeps 15 days of metrics by default (`--storage.tsdb.retention.time=15d`)
51+
- Stores metrics in a `PersistentVolumeClaim` named `prometheus-data`
52+
- RBAC rules grant read-only access to Services, Endpoints, Pods, Nodes, and EndpointSlices
53+
54+
### Scrape configuration snippet
55+
56+
```yaml
57+
scrape_configs:
58+
- job_name: semantic-router
59+
kubernetes_sd_configs:
60+
- role: endpoints
61+
namespaces:
62+
names:
63+
- vllm-semantic-router-system
64+
relabel_configs:
65+
- source_labels: [__meta_kubernetes_service_name]
66+
regex: semantic-router-metrics
67+
action: keep
68+
- source_labels: [__meta_kubernetes_endpoint_port_name]
69+
regex: metrics
70+
action: keep
71+
```
72+
73+
Modify the namespace or service name if you changed them in your primary deployment.
74+
75+
## 4. Grafana Configuration Highlights
76+
77+
- Stateful deployment backed by the `grafana-storage` PVC
78+
- Datasource provisioned automatically pointing to `http://prometheus:9090`
79+
- Dashboard provider watches `/var/lib/grafana-dashboards`
80+
- Bundled `llm-router-dashboard.json` is identical to `deploy/llm-router-dashboard.json`
81+
- Admin credentials pulled from the `grafana-admin` secret (default `admin/admin` – **change this!)**
82+
83+
### Updating credentials
84+
85+
```bash
86+
kubectl create secret generic grafana-admin \
87+
--namespace vllm-semantic-router-system \
88+
--from-literal=admin-user=monitor \
89+
--from-literal=admin-password='pick-a-strong-password' \
90+
--dry-run=client -o yaml | kubectl apply -f -
91+
```
92+
93+
Remove or overwrite the committed `secret.yaml` when you adopt a different secret management approach.
94+
95+
## 5. Deployment Steps
96+
97+
### 5.1. Create the Kustomization
98+
99+
Create `deploy/kubernetes/observability/kustomization.yaml` (see below) to assemble all manifests. This guide assumes you keep Prometheus & Grafana in the same namespace as the router.
100+
101+
### 5.2. Apply manifests
102+
103+
```bash
104+
kubectl apply -k deploy/kubernetes/observability/
105+
```
106+
107+
Verify pods:
108+
109+
```bash
110+
kubectl get pods -n vllm-semantic-router-system
111+
```
112+
113+
You should see `prometheus-...` and `grafana-...` pods in `Running` state.
114+
115+
### 5.3. Integration with the core deployment
116+
117+
1. Deploy or update Semantic Router (`kubectl apply -k deploy/kubernetes/`).
118+
2. Deploy observability stack (`kubectl apply -k deploy/kubernetes/observability/`).
119+
3. Confirm the metrics service (`semantic-router-metrics`) has endpoints:
120+
121+
```bash
122+
kubectl get endpoints semantic-router-metrics -n vllm-semantic-router-system
123+
```
124+
125+
4. Prometheus target should transition to **UP** within ~15 seconds.
126+
127+
### 5.4. Accessing the UIs
128+
129+
> **Optional Ingress** – If you prefer to keep the stack private, delete `ingress.yaml` from `kustomization.yaml` before applying.
130+
131+
- **Port-forward (quick check)**
132+
133+
```bash
134+
kubectl port-forward svc/prometheus 9090:9090 -n vllm-semantic-router-system
135+
kubectl port-forward svc/grafana 3000:3000 -n vllm-semantic-router-system
136+
```
137+
138+
Prometheus → http://localhost:9090, Grafana → http://localhost:3000
139+
140+
- **Ingress (production)** – Customize `ingress.yaml` with real domains, TLS secrets, and your ingress class before applying. Replace `*.example.com` and configure HTTPS certificates via cert-manager or your provider.
141+
142+
## 6. Verifying Metrics Collection
143+
144+
1. Open Prometheus (port-forward or ingress) → **Status ▸ Targets** → ensure `semantic-router` job is green.
145+
2. Query `rate(llm_model_completion_tokens_total[5m])` – should return data after traffic.
146+
3. Open Grafana, log in with the admin credentials, and confirm the **LLM Router Metrics** dashboard exists under the *Semantic Router* folder.
147+
4. Generate traffic to Semantic Router (classification or routing requests). Key panels should start populating:
148+
- Prompt Category counts
149+
- Token usage rate per model
150+
- Routing modifications between models
151+
- Latency histograms (TTFT, completion p95)
152+
153+
## 7. Dashboard Customization
154+
155+
- Duplicate the provisioned dashboard inside Grafana to make changes while keeping the original as a template.
156+
- Update Grafana provisioning (`grafana/configmap-provisioning.yaml`) to point to alternate folders or add new providers.
157+
- Add additional dashboards by extending `grafana/configmap-dashboard.yaml` or mounting a different ConfigMap.
158+
- Incorporate Kubernetes cluster metrics (CPU/memory) by adding another datasource or deploying kube-state-metrics + node exporters.
159+
160+
## 8. Best Practices
161+
162+
### Resource Sizing
163+
164+
- Prometheus: increase CPU/memory with higher scrape cardinality or retention > 15 days.
165+
- Grafana: start with `500m` CPU / `1Gi` RAM; scale replicas horizontally when concurrent viewers exceed a few dozen.
166+
167+
### Storage
168+
169+
- Use SSD-backed storage classes for Prometheus when retention/window is large.
170+
- Increase `prometheus/pvc.yaml` (default 20Gi) and `grafana/pvc.yaml` (default 10Gi) to match retention requirements.
171+
- Enable volume snapshots or backups for dashboards and alert history.
172+
173+
### Security
174+
175+
- Replace the demo `grafana-admin` secret with credentials stored in your preferred secret manager.
176+
- Restrict ingress access with network policies, OAuth proxies, or SSO integrations.
177+
- Enable Grafana role-based access control and API keys for automation.
178+
- Scope Prometheus RBAC to only the namespaces you need. If metrics run in multiple namespaces, list them in the scrape config.
179+
180+
### Maintenance
181+
182+
- Monitor Prometheus disk usage; prune retention or scale PVC before it fills up.
183+
- Back up Grafana dashboards or store them in Git (already done through this ConfigMap).
184+
- Roll upgrades separately: update Prometheus and Grafana images via `kustomization.yaml` patches.
185+
- Consider adopting the Prometheus Operator (`ServiceMonitor` + `PodMonitor`) if you already run kube-prometheus-stack. A sample `ServiceMonitor` is in `website/docs/tutorials/observability/observability.md`.
186+
187+
## 9. Troubleshooting
188+
189+
| Symptom | Checks | Fix |
190+
|---------|--------|-----|
191+
| Prometheus target **DOWN** | `kubectl get endpoints semantic-router-metrics -n vllm-semantic-router-system` | Ensure the Semantic Router deployment is running and the service labels match `app=semantic-router`, `service=metrics` |
192+
| Grafana dashboard empty | **Configuration → Data Sources** | Confirm Prometheus datasource URL resolves and the Prometheus service is reachable |
193+
| Login fails | `kubectl get secret grafana-admin -o yaml` | Update the secret to match the credentials you expect |
194+
| PVC Pending | `kubectl describe pvc prometheus-data` | Provide a storage class via `storageClassName`, or provision storage manually |
195+
| Ingress 404 | `kubectl describe ingress grafana` | Update hostnames, TLS secrets, and ensure ingress controller is installed |
196+
197+
## 10. Next Steps
198+
199+
- Configure alerts for critical metrics (Prometheus alerting rules + Alertmanager)
200+
- Add log aggregation (Loki, Elasticsearch, or Cloud-native logging)
201+
- Automate stack deployment through CI/CD pipelines using `kubectl apply -k`
202+
203+
With this observability stack in place, you can track Semantic Router health, routing accuracy, latency distributions, and usage trends across any Kubernetes environment.

0 commit comments

Comments
 (0)