Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
203 changes: 203 additions & 0 deletions deploy/kubernetes/observability/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
# Semantic Router Observability on Kubernetes

This guide adds a production-ready Prometheus + Grafana stack to the existing Semantic Router Kubernetes deployment. It includes manifests for collectors, dashboards, data sources, RBAC, and ingress so you can monitor routing performance in any cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can move this to tutorials in website, https://vllm-semantic-router.com/docs/tutorials/observability/ ?

Copy link
Collaborator Author

@JaredforReal JaredforReal Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure,scheduled


> **Namespace** – All manifests default to the `vllm-semantic-router-system` namespace to match the core deployment. Override it with Kustomize if you use a different namespace.

## What Gets Installed

| Component | Purpose | Key Files |
|--------------|---------|-----------|
| Prometheus | Scrapes Semantic Router metrics and stores them with persistent retention | `prometheus/` (`rbac.yaml`, `configmap.yaml`, `deployment.yaml`, `pvc.yaml`, `service.yaml`)|
| Grafana | Visualizes metrics using the bundled LLM Router dashboard and a pre-configured Prometheus datasource | `grafana/` (`secret.yaml`, `configmap-*.yaml`, `deployment.yaml`, `pvc.yaml`, `service.yaml`)|
| Ingress (optional) | Exposes the UIs outside the cluster | `ingress.yaml`|
| Dashboard provisioning | Automatically loads `deploy/llm-router-dashboard.json` into Grafana | `grafana/configmap-dashboard.yaml`|

Prometheus is configured to discover the `semantic-router-metrics` service (port `9190`) automatically. Grafana provisions the same LLM Router dashboard that ships with the Docker Compose stack.

## 1. Prerequisites

- Deployed Semantic Router workload via `deploy/kubernetes/`
- A Kubernetes cluster (managed, on-prem, or kind)
- `kubectl` v1.23+
- Optional: an ingress controller (NGINX, ALB, etc.) if you want external access

## 2. Directory Layout

```
deploy/kubernetes/observability/
├── README.md
├── kustomization.yaml # (created in the next step)
├── ingress.yaml # optional HTTPS ingress examples
├── prometheus/
│ ├── configmap.yaml # Scrape config (Kubernetes SD)
│ ├── deployment.yaml
│ ├── pvc.yaml
│ ├── rbac.yaml # SA + ClusterRole + binding
│ └── service.yaml
└── grafana/
├── configmap-dashboard.yaml # Bundled LLM router dashboard
├── configmap-provisioning.yaml # Datasource + provider config
├── deployment.yaml
├── pvc.yaml
├── secret.yaml # Admin credentials (override in prod)
└── service.yaml
```

## 3. Prometheus Configuration Highlights

- Uses `kubernetes_sd_configs` to enumerate endpoints in `vllm-semantic-router-system`
- Keeps 15 days of metrics by default (`--storage.tsdb.retention.time=15d`)
- Stores metrics in a `PersistentVolumeClaim` named `prometheus-data`
- RBAC rules grant read-only access to Services, Endpoints, Pods, Nodes, and EndpointSlices

### Scrape configuration snippet

```yaml
scrape_configs:
- job_name: semantic-router
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- vllm-semantic-router-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
regex: semantic-router-metrics
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
regex: metrics
action: keep
```

Modify the namespace or service name if you changed them in your primary deployment.

## 4. Grafana Configuration Highlights

- Stateful deployment backed by the `grafana-storage` PVC
- Datasource provisioned automatically pointing to `http://prometheus:9090`
- Dashboard provider watches `/var/lib/grafana-dashboards`
- Bundled `llm-router-dashboard.json` is identical to `deploy/llm-router-dashboard.json`
- Admin credentials pulled from the `grafana-admin` secret (default `admin/admin` – **change this!)**

### Updating credentials

```bash
kubectl create secret generic grafana-admin \
--namespace vllm-semantic-router-system \
--from-literal=admin-user=monitor \
--from-literal=admin-password='pick-a-strong-password' \
--dry-run=client -o yaml | kubectl apply -f -
```

Remove or overwrite the committed `secret.yaml` when you adopt a different secret management approach.

## 5. Deployment Steps

### 5.1. Create the Kustomization

Create `deploy/kubernetes/observability/kustomization.yaml` (see below) to assemble all manifests. This guide assumes you keep Prometheus & Grafana in the same namespace as the router.

### 5.2. Apply manifests

```bash
kubectl apply -k deploy/kubernetes/observability/
```

Verify pods:

```bash
kubectl get pods -n vllm-semantic-router-system
```

You should see `prometheus-...` and `grafana-...` pods in `Running` state.

### 5.3. Integration with the core deployment

1. Deploy or update Semantic Router (`kubectl apply -k deploy/kubernetes/`).
2. Deploy observability stack (`kubectl apply -k deploy/kubernetes/observability/`).
3. Confirm the metrics service (`semantic-router-metrics`) has endpoints:

```bash
kubectl get endpoints semantic-router-metrics -n vllm-semantic-router-system
```

4. Prometheus target should transition to **UP** within ~15 seconds.

### 5.4. Accessing the UIs

> **Optional Ingress** – If you prefer to keep the stack private, delete `ingress.yaml` from `kustomization.yaml` before applying.

- **Port-forward (quick check)**

```bash
kubectl port-forward svc/prometheus 9090:9090 -n vllm-semantic-router-system
kubectl port-forward svc/grafana 3000:3000 -n vllm-semantic-router-system
```

Prometheus → http://localhost:9090, Grafana → http://localhost:3000

- **Ingress (production)** – Customize `ingress.yaml` with real domains, TLS secrets, and your ingress class before applying. Replace `*.example.com` and configure HTTPS certificates via cert-manager or your provider.

## 6. Verifying Metrics Collection

1. Open Prometheus (port-forward or ingress) → **Status ▸ Targets** → ensure `semantic-router` job is green.
2. Query `rate(llm_model_completion_tokens_total[5m])` – should return data after traffic.
3. Open Grafana, log in with the admin credentials, and confirm the **LLM Router Metrics** dashboard exists under the *Semantic Router* folder.
4. Generate traffic to Semantic Router (classification or routing requests). Key panels should start populating:
- Prompt Category counts
- Token usage rate per model
- Routing modifications between models
- Latency histograms (TTFT, completion p95)

## 7. Dashboard Customization

- Duplicate the provisioned dashboard inside Grafana to make changes while keeping the original as a template.
- Update Grafana provisioning (`grafana/configmap-provisioning.yaml`) to point to alternate folders or add new providers.
- Add additional dashboards by extending `grafana/configmap-dashboard.yaml` or mounting a different ConfigMap.
- Incorporate Kubernetes cluster metrics (CPU/memory) by adding another datasource or deploying kube-state-metrics + node exporters.

## 8. Best Practices

### Resource Sizing

- Prometheus: increase CPU/memory with higher scrape cardinality or retention > 15 days.
- Grafana: start with `500m` CPU / `1Gi` RAM; scale replicas horizontally when concurrent viewers exceed a few dozen.

### Storage

- Use SSD-backed storage classes for Prometheus when retention/window is large.
- Increase `prometheus/pvc.yaml` (default 20Gi) and `grafana/pvc.yaml` (default 10Gi) to match retention requirements.
- Enable volume snapshots or backups for dashboards and alert history.

### Security

- Replace the demo `grafana-admin` secret with credentials stored in your preferred secret manager.
- Restrict ingress access with network policies, OAuth proxies, or SSO integrations.
- Enable Grafana role-based access control and API keys for automation.
- Scope Prometheus RBAC to only the namespaces you need. If metrics run in multiple namespaces, list them in the scrape config.

### Maintenance

- Monitor Prometheus disk usage; prune retention or scale PVC before it fills up.
- Back up Grafana dashboards or store them in Git (already done through this ConfigMap).
- Roll upgrades separately: update Prometheus and Grafana images via `kustomization.yaml` patches.
- Consider adopting the Prometheus Operator (`ServiceMonitor` + `PodMonitor`) if you already run kube-prometheus-stack. A sample `ServiceMonitor` is in `website/docs/tutorials/observability/observability.md`.

## 9. Troubleshooting

| Symptom | Checks | Fix |
|---------|--------|-----|
| Prometheus target **DOWN** | `kubectl get endpoints semantic-router-metrics -n vllm-semantic-router-system` | Ensure the Semantic Router deployment is running and the service labels match `app=semantic-router`, `service=metrics` |
| Grafana dashboard empty | **Configuration → Data Sources** | Confirm Prometheus datasource URL resolves and the Prometheus service is reachable |
| Login fails | `kubectl get secret grafana-admin -o yaml` | Update the secret to match the credentials you expect |
| PVC Pending | `kubectl describe pvc prometheus-data` | Provide a storage class via `storageClassName`, or provision storage manually |
| Ingress 404 | `kubectl describe ingress grafana` | Update hostnames, TLS secrets, and ensure ingress controller is installed |

## 10. Next Steps

- Configure alerts for critical metrics (Prometheus alerting rules + Alertmanager)
- Add log aggregation (Loki, Elasticsearch, or Cloud-native logging)
- Automate stack deployment through CI/CD pipelines using `kubectl apply -k`

With this observability stack in place, you can track Semantic Router health, routing accuracy, latency distributions, and usage trends across any Kubernetes environment.
Loading
Loading