-
Notifications
You must be signed in to change notification settings - Fork 32
feat: Report more vllm metrics #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Report more vllm metrics #92
Conversation
@yinggeh |
@Pavloveuge Thanks for contributing to Triton. I'll keep you posted once there's an update. |
@yinggeh any update? These metrics are important for production environments |
README.md
Outdated
# Number of requests currently running on GPU. | ||
gauge_scheduler_running | ||
# Number of requests waiting to be processed. | ||
gauge_scheduler_waiting | ||
# Number of requests swapped to CPU. | ||
gauge_scheduler_swapped | ||
# GPU KV-cache usage. 1 means 100 percent usage. | ||
gauge_gpu_cache_usage | ||
# CPU KV-cache usage. 1 means 100 percent usage. | ||
gauge_cpu_cache_usage | ||
# CPU prefix cache block hit rate. | ||
gauge_cpu_prefix_cache_hit_rate | ||
# GPU prefix cache block hit rate. | ||
gauge_gpu_prefix_cache_hit_rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move gauges before counters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/utils/metrics.py
Outdated
(self.metrics.gauge_gpu_cache_usage, stats.gpu_cache_usage_sys), | ||
(self.metrics.gauge_cpu_cache_usage, stats.cpu_cache_usage_sys), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deprecated
src/utils/metrics.py
Outdated
(self.metrics.gauge_num_requests_running, stats.num_running_sys), | ||
(self.metrics.gauge_num_requests_waiting, stats.num_waiting_sys), | ||
(self.metrics.gauge_num_requests_swapped, stats.num_swapped_sys), | ||
(self.metrics.gauge_gpu_cache_usage, stats.gpu_cache_usage_sys), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update namings to be consistent with https://github.com/vllm-project/vllm/blob/main/vllm/engine/metrics.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update
src/utils/metrics.py
Outdated
gauge_metrics = [ | ||
(self.metrics.gauge_num_requests_running, stats.num_running_sys), | ||
(self.metrics.gauge_num_requests_waiting, stats.num_waiting_sys), | ||
(self.metrics.gauge_num_requests_swapped, stats.num_swapped_sys), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Deprecated in 0.8 - KV cache offloading is not used in V1
# Hidden in 0.9, due to be removed in 0.10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove deprecated metrics. thanks
@Pavloveuge Left comments |
Co-authored-by: Yingge He <[email protected]>
Co-authored-by: Yingge He <[email protected]>
Co-authored-by: Yingge He <[email protected]>
Co-authored-by: Yingge He <[email protected]>
Co-authored-by: Yingge He <[email protected]>
Co-authored-by: Yingge He <[email protected]>
@yinggeh Done |
1cb26b7
to
9ad54c2
Compare
9ad54c2
to
9b96279
Compare
@Pavloveuge Thanks. There is a concern over the performance degradation with more metrics being reported. While I am getting the data, can you continue your work by applying the change to |
Yes, I've already applied the change to |
What does the PR do?
Report more counter, histogram, gauge metrics from vLLM to Triton metrics server.
Checklist:
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
n/a
Where should the reviewer start?
n/a
Test plan:
ci/L0_backend_vllm/metrics_test
CI Pipeline ID:
n/a
Caveats:
n/a
Background
Customers requested additional histogram metrics from vLLM.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
n/a