Skip to content

Commit 21805bc

Browse files
authored
docs: correct dead kvstats metric names in Grafana dashboards and doc… (#7235)
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
1 parent da312ee commit 21805bc

File tree

3 files changed

+14
-28
lines changed

3 files changed

+14
-28
lines changed

deploy/observability/grafana_dashboards/DASHBOARD_METRICS.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ The dashboard is organized in **logical request flow order** (21 panels across 6
1919
- Request Throughput (x=0), Avg Request Duration (x=8), KV Cache Utilization (%) (x=16)
2020

2121
**Row 5: KV Cache + GPU** (y=32)
22-
- KV Cache Blocks (Active/Total) (x=0), GPU Compute Utilization (x=8), GPU Memory Used (x=16)
22+
- KV Cache Blocks (Total) (x=0), GPU Compute Utilization (x=8), GPU Memory Used (x=16)
2323

2424
**Row 6: NIXL Transfer Metrics** (y=40)
2525
- GPU Memory Bandwidth (x=0), NVLink Bandwidth (GB/s) (x=8), Worker CPU Usage (x=16)
@@ -69,8 +69,8 @@ These metrics come from the decode worker pods' system endpoints (port 9090). In
6969
| **Component Latency - Prefill vs Decode** | `dynamo_component_request_duration_seconds_{sum,count}{dynamo_component="prefill",dynamo_endpoint="generate"}` & `{dynamo_component="backend",dynamo_endpoint="generate"}` | `rate(sum[5m]) / rate(count[5m])` | Average request duration for prefill workers (includes NIXL transfer) vs decode workers (entire decode session for all output tokens) over the last 5 minutes. **Note**: Decode worker latency measures the FULL decode session duration, not just time to first token. Only shows `generate` endpoint (filters out `clear_kv_blocks` maintenance operations) |
7070
| **Decode Worker - Request Throughput** | `dynamo_component_requests_total{dynamo_component="backend"}` | `rate(...[5m])` | Rate of requests processed by decode workers in requests/second |
7171
| **Decode Worker - Avg Request Duration** | `dynamo_component_request_duration_seconds_{sum,count}{dynamo_component="backend"}` | `rate(sum[5m]) / rate(count[5m])` | Average time decode workers spend processing requests (decode phase only) over the last 5 minutes |
72-
| **KV Cache Utilization** | `dynamo_component_kvstats_gpu_cache_usage_percent` | Raw value (0-100%) | GPU memory utilization for KV cache storage of active requests. High values (>90%) indicate workers are at capacity and requests are queueing. **Note**: Only available for decode workers - prefill workers in disaggregated mode don't expose this metric. Monitor Prefill Worker Processing Time instead for prefill capacity |
73-
| **KV Cache Blocks (Active/Total)** | `dynamo_component_kvstats_active_blocks` & `dynamo_component_kvstats_total_blocks` | Raw values | Number of KV cache blocks in use vs total available for decode workers. When active approaches total, decode workers are at capacity. Shows numeric values (e.g., 2048/5297). **Note**: Only for decode workers |
72+
| **KV Cache Utilization** | `dynamo_component_gpu_cache_usage_percent` | Raw value (0-100%) | GPU memory utilization for KV cache storage of active requests. High values (>90%) indicate workers are at capacity and requests are queueing. **Note**: Only available for decode workers - prefill workers in disaggregated mode don't expose this metric. Monitor Prefill Worker Processing Time instead for prefill capacity |
73+
| **KV Cache Blocks (Total)** | `dynamo_component_total_blocks` | Raw value | Total number of KV cache blocks available on decode workers. **Note**: Only for decode workers |
7474

7575
### CPU Metrics (from cAdvisor and Node Exporter)
7676
These metrics come from Kubernetes cAdvisor (container metrics) and Node Exporter (node-level metrics). CPU bottlenecks can impact prefill/decode performance.
@@ -194,7 +194,7 @@ The DCGM ServiceMonitor must be manually created (see `dcgm-servicemonitor.yaml`
194194
- Check deployment mode and request routing configuration
195195

196196
### KV Cache metrics only showing decode workers:
197-
**Important Limitation**: In disaggregated mode, prefill workers (`--disaggregation-mode prefill`) do NOT expose `dynamo_component_kvstats_*` metrics. Only decode workers expose these.
197+
**Important Limitation**: In disaggregated mode, prefill workers (`--disaggregation-mode prefill`) do NOT expose `dynamo_component_total_blocks` or `dynamo_component_gpu_cache_usage_percent` metrics. Only decode workers expose these.
198198

199199
**Why this happens:**
200200
- Prefill workers transfer KV cache to decode workers via NIXL

deploy/observability/grafana_dashboards/disagg-dashboard.json

Lines changed: 5 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1136,7 +1136,7 @@
11361136
"targets": [
11371137
{
11381138
"editorMode": "code",
1139-
"expr": "(dynamo_component_kvstats_gpu_cache_usage_percent{namespace=\"$namespace\"}) * on(pod, namespace) group_left() kube_pod_status_phase{phase=\"Running\"}",
1139+
"expr": "(dynamo_component_gpu_cache_usage_percent{namespace=\"$namespace\"}) * on(pod, namespace) group_left() kube_pod_status_phase{phase=\"Running\"}",
11401140
"legendFormat": "{{pod}}",
11411141
"range": true,
11421142
"refId": "A"
@@ -1150,7 +1150,7 @@
11501150
"type": "prometheus",
11511151
"uid": "${datasource}"
11521152
},
1153-
"description": "Active KV cache blocks vs total available blocks for decode workers. Shows numeric capacity utilization. When active approaches total, workers are at capacity.",
1153+
"description": "Total KV cache blocks available on decode workers. Shows numeric capacity.",
11541154
"fieldConfig": {
11551155
"defaults": {
11561156
"color": {
@@ -1245,20 +1245,13 @@
12451245
"targets": [
12461246
{
12471247
"editorMode": "code",
1248-
"expr": "(dynamo_component_kvstats_active_blocks{namespace=\"$namespace\"}) * on(pod, namespace) group_left() kube_pod_status_phase{phase=\"Running\"}",
1249-
"legendFormat": "Active - {{pod}}",
1250-
"range": true,
1251-
"refId": "A"
1252-
},
1253-
{
1254-
"editorMode": "code",
1255-
"expr": "(dynamo_component_kvstats_total_blocks{namespace=\"$namespace\"}) * on(pod, namespace) group_left() kube_pod_status_phase{phase=\"Running\"}",
1248+
"expr": "(dynamo_component_total_blocks{namespace=\"$namespace\"}) * on(pod, namespace) group_left() kube_pod_status_phase{phase=\"Running\"}",
12561249
"legendFormat": "Total - {{pod}}",
12571250
"range": true,
1258-
"refId": "B"
1251+
"refId": "A"
12591252
}
12601253
],
1261-
"title": "KV Cache Blocks (Active/Total)",
1254+
"title": "KV Cache Blocks (Total)",
12621255
"type": "timeseries"
12631256
},
12641257
{

deploy/observability/k8s/grafana-disagg-dashboard-configmap.yaml

Lines changed: 5 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1148,7 +1148,7 @@ data:
11481148
"targets": [
11491149
{
11501150
"editorMode": "code",
1151-
"expr": "(dynamo_component_kvstats_gpu_cache_usage_percent{namespace=\"$namespace\"}) * on(pod, namespace) group_left() kube_pod_status_phase{phase=\"Running\"}",
1151+
"expr": "(dynamo_component_gpu_cache_usage_percent{namespace=\"$namespace\"}) * on(pod, namespace) group_left() kube_pod_status_phase{phase=\"Running\"}",
11521152
"legendFormat": "{{pod}}",
11531153
"range": true,
11541154
"refId": "A"
@@ -1162,7 +1162,7 @@ data:
11621162
"type": "prometheus",
11631163
"uid": "${datasource}"
11641164
},
1165-
"description": "Active KV cache blocks vs total available blocks for decode workers. Shows numeric capacity utilization. When active approaches total, workers are at capacity.",
1165+
"description": "Total KV cache blocks available on decode workers. Shows numeric capacity.",
11661166
"fieldConfig": {
11671167
"defaults": {
11681168
"color": {
@@ -1257,20 +1257,13 @@ data:
12571257
"targets": [
12581258
{
12591259
"editorMode": "code",
1260-
"expr": "(dynamo_component_kvstats_active_blocks{namespace=\"$namespace\"}) * on(pod, namespace) group_left() kube_pod_status_phase{phase=\"Running\"}",
1261-
"legendFormat": "Active - {{pod}}",
1262-
"range": true,
1263-
"refId": "A"
1264-
},
1265-
{
1266-
"editorMode": "code",
1267-
"expr": "(dynamo_component_kvstats_total_blocks{namespace=\"$namespace\"}) * on(pod, namespace) group_left() kube_pod_status_phase{phase=\"Running\"}",
1260+
"expr": "(dynamo_component_total_blocks{namespace=\"$namespace\"}) * on(pod, namespace) group_left() kube_pod_status_phase{phase=\"Running\"}",
12681261
"legendFormat": "Total - {{pod}}",
12691262
"range": true,
1270-
"refId": "B"
1263+
"refId": "A"
12711264
}
12721265
],
1273-
"title": "KV Cache Blocks (Active/Total)",
1266+
"title": "KV Cache Blocks (Total)",
12741267
"type": "timeseries"
12751268
},
12761269
{

0 commit comments

Comments
 (0)