feat: Report more vllm metrics #92

Pavloveuge · 2025-05-13T15:12:22Z

What does the PR do?

Report more counter, histogram, gauge metrics from vLLM to Triton metrics server.

Checklist:

PR title reflects the change and is of format <commit_type>: <Title>
Changes are described in the pull request
Related issues are referenced
Populated github labels field
Added test plan and verified test passes
Verified that the PR passes existing CI
Verified copyright is correct on all changed files
Added succinct git squash message before merging ref.
All template sections are filled out

Commit Type:
Check the conventional commit type
box here and add the label to the github PR.

feat

Related PRs:
n/a

Where should the reviewer start?
n/a

Test plan:
ci/L0_backend_vllm/metrics_test

CI Pipeline ID:
n/a

Caveats:
n/a

Background
Customers requested additional histogram metrics from vLLM.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
n/a

Pavloveuge · 2025-05-31T12:48:17Z

@yinggeh
Hello, can you check this, please?

yinggeh · 2025-06-05T21:40:14Z

@yinggeh Hello, can you check this, please?

@Pavloveuge Thanks for contributing to Triton. I'll keep you posted once there's an update.

kanlas-net · 2025-07-16T08:00:08Z

@yinggeh any update? These metrics are important for production environments

ci/L0_backend_vllm/metrics_test/vllm_metrics_test.py

README.md

yinggeh · 2025-07-23T22:54:13Z

README.md

+# Number of requests currently running on GPU.
+gauge_scheduler_running
+# Number of requests waiting to be processed.
+gauge_scheduler_waiting
+# Number of requests swapped to CPU.
+gauge_scheduler_swapped
+# GPU KV-cache usage. 1 means 100 percent usage.
+gauge_gpu_cache_usage
+# CPU KV-cache usage. 1 means 100 percent usage.
+gauge_cpu_cache_usage
+# CPU prefix cache block hit rate.
+gauge_cpu_prefix_cache_hit_rate
+# GPU prefix cache block hit rate.
+gauge_gpu_prefix_cache_hit_rate


Can you move gauges before counters?

src/utils/metrics.py

yinggeh · 2025-07-24T00:04:06Z

src/utils/metrics.py

+            (self.metrics.gauge_gpu_cache_usage, stats.gpu_cache_usage_sys),
+            (self.metrics.gauge_cpu_cache_usage, stats.cpu_cache_usage_sys),


yinggeh · 2025-07-24T00:07:10Z

src/utils/metrics.py

+            (self.metrics.gauge_num_requests_running, stats.num_running_sys),
+            (self.metrics.gauge_num_requests_waiting, stats.num_waiting_sys),
+            (self.metrics.gauge_num_requests_swapped, stats.num_swapped_sys),
+            (self.metrics.gauge_gpu_cache_usage, stats.gpu_cache_usage_sys),


Please update namings to be consistent with https://github.com/vllm-project/vllm/blob/main/vllm/engine/metrics.py

yinggeh · 2025-07-24T00:20:10Z

src/utils/metrics.py

+        gauge_metrics = [
+            (self.metrics.gauge_num_requests_running, stats.num_running_sys),
+            (self.metrics.gauge_num_requests_waiting, stats.num_waiting_sys),
+            (self.metrics.gauge_num_requests_swapped, stats.num_swapped_sys),


# Deprecated in 0.8 - KV cache offloading is not used in V1 # Hidden in 0.9, due to be removed in 0.10

Remove deprecated metrics. thanks

yinggeh · 2025-07-24T00:23:09Z

@Pavloveuge Left comments

Co-authored-by: Yingge He <[email protected]>

Pavloveuge · 2025-07-24T18:20:56Z

@yinggeh Done

yinggeh · 2025-07-30T07:54:50Z

@Pavloveuge Thanks. There is a concern over the performance degradation with more metrics being reported. While I am getting the data, can you continue your work by applying the change to /opt/tritonserver/backends/vllm/utils/metrics.py in your container?

Pavloveuge · 2025-07-30T12:24:52Z

Yes, I've already applied the change to /opt/tritonserver/backends/vllm/utils/metrics.py in my container. Ran my scenarios and haven't noticed any performance degradation on my end

Pavloveuge added 3 commits May 4, 2025 05:15

add new metrics reporting

b87a1cd

update readme

1a1b66e

update test

7a0096e

yinggeh requested review from yinggeh and oandreeva-nv July 17, 2025 22:56

yinggeh self-assigned this Jul 17, 2025

yinggeh added the enhancement New feature or request label Jul 17, 2025

yinggeh requested changes Jul 24, 2025

View reviewed changes

Pavloveuge and others added 8 commits July 24, 2025 20:46

Update src/utils/metrics.py

41a3919

Co-authored-by: Yingge He <[email protected]>

Update src/utils/metrics.py

c5f9751

Co-authored-by: Yingge He <[email protected]>

Update README.md

6ac6108

Co-authored-by: Yingge He <[email protected]>

Update ci/L0_backend_vllm/metrics_test/vllm_metrics_test.py

221a1c1

Co-authored-by: Yingge He <[email protected]>

Update README.md

f419648

Co-authored-by: Yingge He <[email protected]>

Update src/utils/metrics.py

6896178

Co-authored-by: Yingge He <[email protected]>

move gauges before counters

93d8895

revert suggested change

6be53bd

Pavloveuge force-pushed the report_more_vllm_metric branch 2 times, most recently from 1cb26b7 to 9ad54c2 Compare July 24, 2025 18:31

remove deprecated metrics, fix namings

9b96279

Pavloveuge force-pushed the report_more_vllm_metric branch from 9ad54c2 to 9b96279 Compare July 24, 2025 18:32

		(self.metrics.gauge_gpu_cache_usage, stats.gpu_cache_usage_sys),
		(self.metrics.gauge_cpu_cache_usage, stats.cpu_cache_usage_sys),

feat: Report more vllm metrics #92

Are you sure you want to change the base?

feat: Report more vllm metrics #92

Uh oh!

Conversation

Pavloveuge commented May 13, 2025

What does the PR do?

Uh oh!

Pavloveuge commented May 31, 2025

Uh oh!

yinggeh commented Jun 5, 2025

Uh oh!

kanlas-net commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yinggeh Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Pavloveuge Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yinggeh Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

yinggeh Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Pavloveuge Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

yinggeh Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Pavloveuge Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

yinggeh commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pavloveuge commented Jul 24, 2025

Uh oh!

yinggeh commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pavloveuge commented Jul 30, 2025

Uh oh!

Uh oh!

yinggeh commented Jul 24, 2025 •

edited

Loading

yinggeh commented Jul 30, 2025 •

edited

Loading