Skip to content

Commit cedd13e

Browse files
committed
Update docs of observability k8s part
Signed-off-by: JaredforReal <[email protected]>
1 parent 4096e4e commit cedd13e

File tree

1 file changed

+174
-41
lines changed

1 file changed

+174
-41
lines changed

website/docs/tutorials/observability/observability.md

Lines changed: 174 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -49,74 +49,207 @@ Expected Prometheus targets:
4949

5050
## 3. Kubernetes Observability
5151

52-
After applying `deploy/kubernetes/`, you get services:
52+
This guide adds a production-ready Prometheus + Grafana stack to the existing Semantic Router Kubernetes deployment. It includes manifests for collectors, dashboards, data sources, RBAC, and ingress so you can monitor routing performance in any cluster.
5353

54-
- `semantic-router` (gRPC)
55-
- `semantic-router-metrics` (metrics 9190)
54+
> **Namespace** – All manifests default to the `vllm-semantic-router-system` namespace to match the core deployment. Override it with Kustomize if you use a different namespace.
5655
57-
### 3.1 Prometheus Operator (ServiceMonitor)
56+
## What Gets Installed
5857

59-
```yaml
60-
apiVersion: monitoring.coreos.com/v1
61-
kind: ServiceMonitor
62-
metadata:
63-
name: semantic-router
64-
namespace: semantic-router
65-
spec:
66-
selector:
67-
matchLabels:
68-
app: semantic-router
69-
service: metrics
70-
namespaceSelector:
71-
matchNames: ["semantic-router"]
72-
endpoints:
73-
- port: metrics
74-
interval: 15s
75-
path: /metrics
58+
| Component | Purpose | Key Files |
59+
|--------------|---------|-----------|
60+
| Prometheus | Scrapes Semantic Router metrics and stores them with persistent retention | `prometheus/` (`rbac.yaml`, `configmap.yaml`, `deployment.yaml`, `pvc.yaml`, `service.yaml`)|
61+
| Grafana | Visualizes metrics using the bundled LLM Router dashboard and a pre-configured Prometheus datasource | `grafana/` (`secret.yaml`, `configmap-*.yaml`, `deployment.yaml`, `pvc.yaml`, `service.yaml`)|
62+
| Ingress (optional) | Exposes the UIs outside the cluster | `ingress.yaml`|
63+
| Dashboard provisioning | Automatically loads `deploy/llm-router-dashboard.json` into Grafana | `grafana/configmap-dashboard.yaml`|
64+
65+
Prometheus is configured to discover the `semantic-router-metrics` service (port `9190`) automatically. Grafana provisions the same LLM Router dashboard that ships with the Docker Compose stack.
66+
67+
### 1. Prerequisites
68+
69+
- Deployed Semantic Router workload via `deploy/kubernetes/`
70+
- A Kubernetes cluster (managed, on-prem, or kind)
71+
- `kubectl` v1.23+
72+
- Optional: an ingress controller (NGINX, ALB, etc.) if you want external access
73+
74+
### 2. Directory Layout
75+
76+
```
77+
deploy/kubernetes/observability/
78+
├── README.md
79+
├── kustomization.yaml # (created in the next step)
80+
├── ingress.yaml # optional HTTPS ingress examples
81+
├── prometheus/
82+
│ ├── configmap.yaml # Scrape config (Kubernetes SD)
83+
│ ├── deployment.yaml
84+
│ ├── pvc.yaml
85+
│ ├── rbac.yaml # SA + ClusterRole + binding
86+
│ └── service.yaml
87+
└── grafana/
88+
├── configmap-dashboard.yaml # Bundled LLM router dashboard
89+
├── configmap-provisioning.yaml # Datasource + provider config
90+
├── deployment.yaml
91+
├── pvc.yaml
92+
├── secret.yaml # Admin credentials (override in prod)
93+
└── service.yaml
7694
```
7795

78-
Ensure the metrics Service carries a label like `service: metrics`. (It does in the provided manifests.)
96+
### 3. Prometheus Configuration Highlights
7997

80-
### 3.2 Plain Prometheus Static Scrape
98+
- Uses `kubernetes_sd_configs` to enumerate endpoints in `vllm-semantic-router-system`
99+
- Keeps 15 days of metrics by default (`--storage.tsdb.retention.time=15d`)
100+
- Stores metrics in a `PersistentVolumeClaim` named `prometheus-data`
101+
- RBAC rules grant read-only access to Services, Endpoints, Pods, Nodes, and EndpointSlices
102+
103+
#### Scrape configuration snippet
81104

82105
```yaml
83106
scrape_configs:
84107
- job_name: semantic-router
85108
kubernetes_sd_configs:
86109
- role: endpoints
110+
namespaces:
111+
names:
112+
- vllm-semantic-router-system
87113
relabel_configs:
88114
- source_labels: [__meta_kubernetes_service_name]
89115
regex: semantic-router-metrics
90116
action: keep
117+
- source_labels: [__meta_kubernetes_endpoint_port_name]
118+
regex: metrics
119+
action: keep
91120
```
92121
93-
### 3.3 Port Forward for Spot Checks
122+
Modify the namespace or service name if you changed them in your primary deployment.
123+
124+
### 4. Grafana Configuration Highlights
125+
126+
- Stateful deployment backed by the `grafana-storage` PVC
127+
- Datasource provisioned automatically pointing to `http://prometheus:9090`
128+
- Dashboard provider watches `/var/lib/grafana-dashboards`
129+
- Bundled `llm-router-dashboard.json` is identical to `deploy/llm-router-dashboard.json`
130+
- Admin credentials pulled from the `grafana-admin` secret (default `admin/admin` – **change this!)**
131+
132+
#### Updating credentials
94133

95134
```bash
96-
kubectl -n semantic-router port-forward svc/semantic-router-metrics 9190:9190
97-
curl -s localhost:9190/metrics | head
135+
kubectl create secret generic grafana-admin \
136+
--namespace vllm-semantic-router-system \
137+
--from-literal=admin-user=monitor \
138+
--from-literal=admin-password='pick-a-strong-password' \
139+
--dry-run=client -o yaml | kubectl apply -f -
98140
```
99141

100-
### 3.4 Grafana Dashboard Provision
142+
Remove or overwrite the committed `secret.yaml` when you adopt a different secret management approach.
101143

102-
If using kube-prometheus-stack or a Grafana sidecar:
144+
### 5. Deployment Steps
103145

104-
```yaml
105-
apiVersion: v1
106-
kind: ConfigMap
107-
metadata:
108-
name: semantic-router-dashboard
109-
namespace: semantic-router
110-
labels:
111-
grafana_dashboard: "1"
112-
data:
113-
llm-router-dashboard.json: |
114-
# Paste JSON from deploy/llm-router-dashboard.json
146+
#### 5.1. Create the Kustomization
147+
148+
Create `deploy/kubernetes/observability/kustomization.yaml` (see below) to assemble all manifests. This guide assumes you keep Prometheus & Grafana in the same namespace as the router.
149+
150+
#### 5.2. Apply manifests
151+
152+
```bash
153+
kubectl apply -k deploy/kubernetes/observability/
115154
```
116155

117-
Otherwise import the JSON manually in Grafana UI.
156+
Verify pods:
118157

119-
---
158+
```bash
159+
kubectl get pods -n vllm-semantic-router-system
160+
```
161+
162+
You should see `prometheus-...` and `grafana-...` pods in `Running` state.
163+
164+
#### 5.3. Integration with the core deployment
165+
166+
1. Deploy or update Semantic Router (`kubectl apply -k deploy/kubernetes/`).
167+
2. Deploy observability stack (`kubectl apply -k deploy/kubernetes/observability/`).
168+
3. Confirm the metrics service (`semantic-router-metrics`) has endpoints:
169+
170+
```bash
171+
kubectl get endpoints semantic-router-metrics -n vllm-semantic-router-system
172+
```
173+
174+
4. Prometheus target should transition to **UP** within ~15 seconds.
175+
176+
#### 5.4. Accessing the UIs
177+
178+
> **Optional Ingress** – If you prefer to keep the stack private, delete `ingress.yaml` from `kustomization.yaml` before applying.
179+
180+
- **Port-forward (quick check)**
181+
182+
```bash
183+
kubectl port-forward svc/prometheus 9090:9090 -n vllm-semantic-router-system
184+
kubectl port-forward svc/grafana 3000:3000 -n vllm-semantic-router-system
185+
```
186+
187+
Prometheus → http://localhost:9090, Grafana → http://localhost:3000
188+
189+
- **Ingress (production)** – Customize `ingress.yaml` with real domains, TLS secrets, and your ingress class before applying. Replace `*.example.com` and configure HTTPS certificates via cert-manager or your provider.
190+
191+
### 6. Verifying Metrics Collection
192+
193+
1. Open Prometheus (port-forward or ingress) → **Status ▸ Targets** → ensure `semantic-router` job is green.
194+
2. Query `rate(llm_model_completion_tokens_total[5m])` – should return data after traffic.
195+
3. Open Grafana, log in with the admin credentials, and confirm the **LLM Router Metrics** dashboard exists under the *Semantic Router* folder.
196+
4. Generate traffic to Semantic Router (classification or routing requests). Key panels should start populating:
197+
- Prompt Category counts
198+
- Token usage rate per model
199+
- Routing modifications between models
200+
- Latency histograms (TTFT, completion p95)
201+
202+
### 7. Dashboard Customization
203+
204+
- Duplicate the provisioned dashboard inside Grafana to make changes while keeping the original as a template.
205+
- Update Grafana provisioning (`grafana/configmap-provisioning.yaml`) to point to alternate folders or add new providers.
206+
- Add additional dashboards by extending `grafana/configmap-dashboard.yaml` or mounting a different ConfigMap.
207+
- Incorporate Kubernetes cluster metrics (CPU/memory) by adding another datasource or deploying kube-state-metrics + node exporters.
208+
209+
### 8. Best Practices
210+
211+
#### Resource Sizing
212+
213+
- Prometheus: increase CPU/memory with higher scrape cardinality or retention > 15 days.
214+
- Grafana: start with `500m` CPU / `1Gi` RAM; scale replicas horizontally when concurrent viewers exceed a few dozen.
215+
216+
#### Storage
217+
218+
- Use SSD-backed storage classes for Prometheus when retention/window is large.
219+
- Increase `prometheus/pvc.yaml` (default 20Gi) and `grafana/pvc.yaml` (default 10Gi) to match retention requirements.
220+
- Enable volume snapshots or backups for dashboards and alert history.
221+
222+
#### Security
223+
224+
- Replace the demo `grafana-admin` secret with credentials stored in your preferred secret manager.
225+
- Restrict ingress access with network policies, OAuth proxies, or SSO integrations.
226+
- Enable Grafana role-based access control and API keys for automation.
227+
- Scope Prometheus RBAC to only the namespaces you need. If metrics run in multiple namespaces, list them in the scrape config.
228+
229+
#### Maintenance
230+
231+
- Monitor Prometheus disk usage; prune retention or scale PVC before it fills up.
232+
- Back up Grafana dashboards or store them in Git (already done through this ConfigMap).
233+
- Roll upgrades separately: update Prometheus and Grafana images via `kustomization.yaml` patches.
234+
- Consider adopting the Prometheus Operator (`ServiceMonitor` + `PodMonitor`) if you already run kube-prometheus-stack. A sample `ServiceMonitor` is in `website/docs/tutorials/observability/observability.md`.
235+
236+
### 9. Troubleshooting
237+
238+
| Symptom | Checks | Fix |
239+
|---------|--------|-----|
240+
| Prometheus target **DOWN** | `kubectl get endpoints semantic-router-metrics -n vllm-semantic-router-system` | Ensure the Semantic Router deployment is running and the service labels match `app=semantic-router`, `service=metrics` |
241+
| Grafana dashboard empty | **Configuration → Data Sources** | Confirm Prometheus datasource URL resolves and the Prometheus service is reachable |
242+
| Login fails | `kubectl get secret grafana-admin -o yaml` | Update the secret to match the credentials you expect |
243+
| PVC Pending | `kubectl describe pvc prometheus-data` | Provide a storage class via `storageClassName`, or provision storage manually |
244+
| Ingress 404 | `kubectl describe ingress grafana` | Update hostnames, TLS secrets, and ensure ingress controller is installed |
245+
246+
### 10. Next Steps
247+
248+
- Configure alerts for critical metrics (Prometheus alerting rules + Alertmanager)
249+
- Add log aggregation (Loki, Elasticsearch, or Cloud-native logging)
250+
- Automate stack deployment through CI/CD pipelines using `kubectl apply -k`
251+
252+
With this observability stack in place, you can track Semantic Router health, routing accuracy, latency distributions, and usage trends across any Kubernetes environment.
120253

121254
## 4. Key Metrics (Sample)
122255

0 commit comments

Comments
 (0)