-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Metrics] Log multi-modal cache stats #16478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
fa40cbe
to
aea9da7
Compare
cb2eb10
to
eea7385
Compare
vllm/v1/core/kv_cache_utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to vllm.v1.metrics.stats.CachingMetrics
Just ran a mistral-eval... there is a problem where the mirrored caches hit rate remains at zero because we don't actually call Edit: I have updated those caches to call Edit 2: This change has been split out into #16593 |
32d0ee9
to
6a12874
Compare
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: DarkLight1337 <[email protected]>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Tests should pass now. |
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
Closing as superseded by #26285 |
For easier debugging and optimization, this PR introduces metrics logging and reset API (
reset_mm_cache
) for the multi-modal processing cache. It is mostly based on the existing code for KV cache metrics.Based on these stats, users can adjust the capacity of the multi-modal cache to achieve a better balance between memory usage and cache hit rate.
Notes
vllm.metrics.loggers.StatLoggerBase
has been updated to acceptmm_cache_stats
.reset_mm_cache
is called. Since it is meant to be just a debugging tool, you should only call it when the engine is not being used.VLLM_SERVER_DEV_MODE=1
.Example logs
V0 Engine:
V1 Engine:
Notes
mm_hash
if there is a cache hit in P0, which implies cache hit in P1 since each item from P0 cache is passed to P1.