-
Notifications
You must be signed in to change notification settings - Fork 273
feat: add Grafana+Prometheus in k8s #294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 7 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
0665cbf
feat: add Grafava+Prometheus in k8s
JaredforReal cb70540
Merge branch 'main' into obser
JaredforReal 4096e4e
Merge branch 'main' into obser
JaredforReal cedd13e
Update docs of observability k8s part
JaredforReal 9405f47
get rig of redudent part in doc
JaredforReal abad95a
Merge branch 'main' into obser
rootfs ebbce20
Merge branch 'main' into obser
JaredforReal 5249c10
add comments of 472 and 65534
JaredforReal 496a8bd
add network tips of k8s
JaredforReal 790c919
update uid in dashboard
JaredforReal File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,203 @@ | ||
| # Semantic Router Observability on Kubernetes | ||
|
|
||
| This guide adds a production-ready Prometheus + Grafana stack to the existing Semantic Router Kubernetes deployment. It includes manifests for collectors, dashboards, data sources, RBAC, and ingress so you can monitor routing performance in any cluster. | ||
|
|
||
| > **Namespace** – All manifests default to the `vllm-semantic-router-system` namespace to match the core deployment. Override it with Kustomize if you use a different namespace. | ||
|
|
||
| ## What Gets Installed | ||
|
|
||
| | Component | Purpose | Key Files | | ||
| |--------------|---------|-----------| | ||
| | Prometheus | Scrapes Semantic Router metrics and stores them with persistent retention | `prometheus/` (`rbac.yaml`, `configmap.yaml`, `deployment.yaml`, `pvc.yaml`, `service.yaml`)| | ||
| | Grafana | Visualizes metrics using the bundled LLM Router dashboard and a pre-configured Prometheus datasource | `grafana/` (`secret.yaml`, `configmap-*.yaml`, `deployment.yaml`, `pvc.yaml`, `service.yaml`)| | ||
| | Ingress (optional) | Exposes the UIs outside the cluster | `ingress.yaml`| | ||
| | Dashboard provisioning | Automatically loads `deploy/llm-router-dashboard.json` into Grafana | `grafana/configmap-dashboard.yaml`| | ||
|
|
||
| Prometheus is configured to discover the `semantic-router-metrics` service (port `9190`) automatically. Grafana provisions the same LLM Router dashboard that ships with the Docker Compose stack. | ||
|
|
||
| ## 1. Prerequisites | ||
|
|
||
| - Deployed Semantic Router workload via `deploy/kubernetes/` | ||
| - A Kubernetes cluster (managed, on-prem, or kind) | ||
| - `kubectl` v1.23+ | ||
| - Optional: an ingress controller (NGINX, ALB, etc.) if you want external access | ||
|
|
||
| ## 2. Directory Layout | ||
|
|
||
| ``` | ||
| deploy/kubernetes/observability/ | ||
| ├── README.md | ||
| ├── kustomization.yaml # (created in the next step) | ||
| ├── ingress.yaml # optional HTTPS ingress examples | ||
| ├── prometheus/ | ||
| │ ├── configmap.yaml # Scrape config (Kubernetes SD) | ||
| │ ├── deployment.yaml | ||
| │ ├── pvc.yaml | ||
| │ ├── rbac.yaml # SA + ClusterRole + binding | ||
| │ └── service.yaml | ||
| └── grafana/ | ||
| ├── configmap-dashboard.yaml # Bundled LLM router dashboard | ||
| ├── configmap-provisioning.yaml # Datasource + provider config | ||
| ├── deployment.yaml | ||
| ├── pvc.yaml | ||
| ├── secret.yaml # Admin credentials (override in prod) | ||
| └── service.yaml | ||
| ``` | ||
|
|
||
| ## 3. Prometheus Configuration Highlights | ||
|
|
||
| - Uses `kubernetes_sd_configs` to enumerate endpoints in `vllm-semantic-router-system` | ||
| - Keeps 15 days of metrics by default (`--storage.tsdb.retention.time=15d`) | ||
| - Stores metrics in a `PersistentVolumeClaim` named `prometheus-data` | ||
| - RBAC rules grant read-only access to Services, Endpoints, Pods, Nodes, and EndpointSlices | ||
|
|
||
| ### Scrape configuration snippet | ||
|
|
||
| ```yaml | ||
| scrape_configs: | ||
| - job_name: semantic-router | ||
| kubernetes_sd_configs: | ||
| - role: endpoints | ||
| namespaces: | ||
| names: | ||
| - vllm-semantic-router-system | ||
| relabel_configs: | ||
| - source_labels: [__meta_kubernetes_service_name] | ||
| regex: semantic-router-metrics | ||
| action: keep | ||
| - source_labels: [__meta_kubernetes_endpoint_port_name] | ||
| regex: metrics | ||
| action: keep | ||
| ``` | ||
|
|
||
| Modify the namespace or service name if you changed them in your primary deployment. | ||
|
|
||
| ## 4. Grafana Configuration Highlights | ||
|
|
||
| - Stateful deployment backed by the `grafana-storage` PVC | ||
| - Datasource provisioned automatically pointing to `http://prometheus:9090` | ||
| - Dashboard provider watches `/var/lib/grafana-dashboards` | ||
| - Bundled `llm-router-dashboard.json` is identical to `deploy/llm-router-dashboard.json` | ||
| - Admin credentials pulled from the `grafana-admin` secret (default `admin/admin` – **change this!)** | ||
|
|
||
| ### Updating credentials | ||
|
|
||
| ```bash | ||
| kubectl create secret generic grafana-admin \ | ||
| --namespace vllm-semantic-router-system \ | ||
| --from-literal=admin-user=monitor \ | ||
| --from-literal=admin-password='pick-a-strong-password' \ | ||
| --dry-run=client -o yaml | kubectl apply -f - | ||
| ``` | ||
|
|
||
| Remove or overwrite the committed `secret.yaml` when you adopt a different secret management approach. | ||
|
|
||
| ## 5. Deployment Steps | ||
|
|
||
| ### 5.1. Create the Kustomization | ||
|
|
||
| Create `deploy/kubernetes/observability/kustomization.yaml` (see below) to assemble all manifests. This guide assumes you keep Prometheus & Grafana in the same namespace as the router. | ||
|
|
||
| ### 5.2. Apply manifests | ||
|
|
||
| ```bash | ||
| kubectl apply -k deploy/kubernetes/observability/ | ||
| ``` | ||
|
|
||
| Verify pods: | ||
|
|
||
| ```bash | ||
| kubectl get pods -n vllm-semantic-router-system | ||
| ``` | ||
|
|
||
| You should see `prometheus-...` and `grafana-...` pods in `Running` state. | ||
|
|
||
| ### 5.3. Integration with the core deployment | ||
|
|
||
| 1. Deploy or update Semantic Router (`kubectl apply -k deploy/kubernetes/`). | ||
| 2. Deploy observability stack (`kubectl apply -k deploy/kubernetes/observability/`). | ||
| 3. Confirm the metrics service (`semantic-router-metrics`) has endpoints: | ||
|
|
||
| ```bash | ||
| kubectl get endpoints semantic-router-metrics -n vllm-semantic-router-system | ||
| ``` | ||
|
|
||
| 4. Prometheus target should transition to **UP** within ~15 seconds. | ||
|
|
||
| ### 5.4. Accessing the UIs | ||
|
|
||
| > **Optional Ingress** – If you prefer to keep the stack private, delete `ingress.yaml` from `kustomization.yaml` before applying. | ||
|
|
||
| - **Port-forward (quick check)** | ||
|
|
||
| ```bash | ||
| kubectl port-forward svc/prometheus 9090:9090 -n vllm-semantic-router-system | ||
| kubectl port-forward svc/grafana 3000:3000 -n vllm-semantic-router-system | ||
| ``` | ||
|
|
||
| Prometheus → http://localhost:9090, Grafana → http://localhost:3000 | ||
|
|
||
| - **Ingress (production)** – Customize `ingress.yaml` with real domains, TLS secrets, and your ingress class before applying. Replace `*.example.com` and configure HTTPS certificates via cert-manager or your provider. | ||
|
|
||
| ## 6. Verifying Metrics Collection | ||
|
|
||
| 1. Open Prometheus (port-forward or ingress) → **Status ▸ Targets** → ensure `semantic-router` job is green. | ||
| 2. Query `rate(llm_model_completion_tokens_total[5m])` – should return data after traffic. | ||
| 3. Open Grafana, log in with the admin credentials, and confirm the **LLM Router Metrics** dashboard exists under the *Semantic Router* folder. | ||
| 4. Generate traffic to Semantic Router (classification or routing requests). Key panels should start populating: | ||
| - Prompt Category counts | ||
| - Token usage rate per model | ||
| - Routing modifications between models | ||
| - Latency histograms (TTFT, completion p95) | ||
|
|
||
| ## 7. Dashboard Customization | ||
|
|
||
| - Duplicate the provisioned dashboard inside Grafana to make changes while keeping the original as a template. | ||
| - Update Grafana provisioning (`grafana/configmap-provisioning.yaml`) to point to alternate folders or add new providers. | ||
| - Add additional dashboards by extending `grafana/configmap-dashboard.yaml` or mounting a different ConfigMap. | ||
| - Incorporate Kubernetes cluster metrics (CPU/memory) by adding another datasource or deploying kube-state-metrics + node exporters. | ||
|
|
||
| ## 8. Best Practices | ||
|
|
||
| ### Resource Sizing | ||
|
|
||
| - Prometheus: increase CPU/memory with higher scrape cardinality or retention > 15 days. | ||
| - Grafana: start with `500m` CPU / `1Gi` RAM; scale replicas horizontally when concurrent viewers exceed a few dozen. | ||
|
|
||
| ### Storage | ||
|
|
||
| - Use SSD-backed storage classes for Prometheus when retention/window is large. | ||
| - Increase `prometheus/pvc.yaml` (default 20Gi) and `grafana/pvc.yaml` (default 10Gi) to match retention requirements. | ||
| - Enable volume snapshots or backups for dashboards and alert history. | ||
|
|
||
| ### Security | ||
|
|
||
| - Replace the demo `grafana-admin` secret with credentials stored in your preferred secret manager. | ||
| - Restrict ingress access with network policies, OAuth proxies, or SSO integrations. | ||
| - Enable Grafana role-based access control and API keys for automation. | ||
| - Scope Prometheus RBAC to only the namespaces you need. If metrics run in multiple namespaces, list them in the scrape config. | ||
|
|
||
| ### Maintenance | ||
|
|
||
| - Monitor Prometheus disk usage; prune retention or scale PVC before it fills up. | ||
| - Back up Grafana dashboards or store them in Git (already done through this ConfigMap). | ||
| - Roll upgrades separately: update Prometheus and Grafana images via `kustomization.yaml` patches. | ||
| - Consider adopting the Prometheus Operator (`ServiceMonitor` + `PodMonitor`) if you already run kube-prometheus-stack. A sample `ServiceMonitor` is in `website/docs/tutorials/observability/observability.md`. | ||
|
|
||
| ## 9. Troubleshooting | ||
|
|
||
| | Symptom | Checks | Fix | | ||
| |---------|--------|-----| | ||
| | Prometheus target **DOWN** | `kubectl get endpoints semantic-router-metrics -n vllm-semantic-router-system` | Ensure the Semantic Router deployment is running and the service labels match `app=semantic-router`, `service=metrics` | | ||
| | Grafana dashboard empty | **Configuration → Data Sources** | Confirm Prometheus datasource URL resolves and the Prometheus service is reachable | | ||
| | Login fails | `kubectl get secret grafana-admin -o yaml` | Update the secret to match the credentials you expect | | ||
| | PVC Pending | `kubectl describe pvc prometheus-data` | Provide a storage class via `storageClassName`, or provision storage manually | | ||
| | Ingress 404 | `kubectl describe ingress grafana` | Update hostnames, TLS secrets, and ensure ingress controller is installed | | ||
|
|
||
| ## 10. Next Steps | ||
|
|
||
| - Configure alerts for critical metrics (Prometheus alerting rules + Alertmanager) | ||
| - Add log aggregation (Loki, Elasticsearch, or Cloud-native logging) | ||
| - Automate stack deployment through CI/CD pipelines using `kubectl apply -k` | ||
|
|
||
| With this observability stack in place, you can track Semantic Router health, routing accuracy, latency distributions, and usage trends across any Kubernetes environment. | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can move this to tutorials in website, https://vllm-semantic-router.com/docs/tutorials/observability/ ?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure,scheduled