Skip to content

Commit 690cc3e

Browse files
docs: update metrics design doc to use new vllm:kv_cache_usage_perc (vllm-project#30041)
Signed-off-by: Tim <tim.wang03@sap.com>
1 parent 1f0d184 commit 690cc3e

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/design/metrics.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ The subset of metrics exposed in the Grafana dashboard gives us an indication of
6262
- `vllm:time_per_output_token_seconds` - Inter-token latency (Time Per Output Token, TPOT) in seconds.
6363
- `vllm:time_to_first_token_seconds` - Time to First Token (TTFT) latency in seconds.
6464
- `vllm:num_requests_running` (also, `_swapped` and `_waiting`) - Number of requests in the RUNNING, WAITING, and SWAPPED states.
65-
- `vllm:gpu_cache_usage_perc` - Percentage of used cache blocks by vLLM.
65+
- `vllm:kv_cache_usage_perc` - Percentage of used cache blocks by vLLM.
6666
- `vllm:request_prompt_tokens` - Request prompt length.
6767
- `vllm:request_generation_tokens` - Request generation length.
6868
- `vllm:request_success` - Number of finished requests by their finish reason: either an EOS token was generated or the max sequence length was reached.

0 commit comments

Comments
 (0)