The observability stack is based on kube-prom-stack.
To launch the observability stack:
Make sure to have:
- A running Kubernetes (K8s) environment with GPUs
- Run
cd utils && bash install-minikube-cluster.sh - Or follow our tutorial
- Run
After that you can run:
bash install.shAfter installing, the dashboard can be accessed through the service service/kube-prom-stack-grafana in the monitoring namespace.
Forward the Grafana dashboard port to the local node-port
kubectl --namespace monitoring port-forward svc/kube-prom-stack-grafana 3000:80 --address 0.0.0.0Forward the Prometheus dashboard
kubectl --namespace monitoring port-forward prometheus-kube-prom-stack-kube-prome-prometheus-0 9090:9090Open the webpage at http://<IP of your node>:3000 to access the Grafana web page. The default user name is admin and the password can be configured in kube-prom-stack.yaml field adminPassword (default is prom-operator).
To import the dashboard, click + in the top-right corner and then upload the vllm-dashboard.json file from this folder.
If you use LMCache image in production stack, you can try the LMCache dashboard. It contains the following six fields showing the benefits of cpu offloading: Average time to first token (sec), Cache hit rate (%) in last 1 minute, LMCache retrieve speed (K Tokens / sec), Local CPU cache usage (GB), Number of requested tokens in total, and Number of hit tokens in total.
kubectl apply -f lmcache-dashboard-cm.yaml
kubectl -n monitoring rollout restart deployment kube-prom-stack-grafana
kubectl --namespace monitoring port-forward svc/kube-prom-stack-grafana 3000:80 --address 0.0.0.0The vLLM router can export metrics to Prometheus using the Prometheus Adapter.
When running the install.sh script, the Prometheus Adapter will be installed and configured to export the vLLM metrics.
We provide a minimal example of how to use the Prometheus Adapter to export vLLM metrics. See prom-adapter.yaml for more details.
The exported metrics can be used for different purposes, such as horizontal scaling of the vLLM deployments.
To verify the metrics are being exported, you can use the following command:
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq | grep vllm_num_requests_waiting -C 10You should see something like the following:
{
"name": "namespaces/vllm_num_requests_waiting",
"singularName": "",
"namespaced": false,
"kind": "MetricValueList",
"verbs": [
"get"
]
}The following command will show the current value of the metric:
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/metrics/vllm_num_requests_waiting | jqThe output should look like the following:
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1beta1",
"metadata": {},
"items": [
{
"describedObject": {
"kind": "Namespace",
"name": "default",
"apiVersion": "/v1"
},
"metricName": "vllm_num_requests_waiting",
"timestamp": "2025-03-02T01:56:01Z",
"value": "0",
"selector": null
}
]
}bash uninstall.sh