[Metrics] Log multi-modal cache stats #16478

DarkLight1337 · 2025-04-11T11:29:16Z

For easier debugging and optimization, this PR introduces metrics logging and reset API (reset_mm_cache) for the multi-modal processing cache. It is mostly based on the existing code for KV cache metrics.

Based on these stats, users can adjust the capacity of the multi-modal cache to achieve a better balance between memory usage and cache hit rate.

Notes

The inferface of vllm.metrics.loggers.StatLoggerBase has been updated to accept mm_cache_stats.
In V1, the three internal caches (P0 processor, P0 mirror, P1 mirror) may become desynced if a request is currently in progress when reset_mm_cache is called. Since it is meant to be just a debugging tool, you should only call it when the engine is not being used.
- Also, for online serving this is only available if VLLM_SERVER_DEV_MODE=1.

Example logs

V0 Engine:

INFO 04-12 16:24:41 [metrics.py:518] Avg prompt throughput: 6051.4 tokens/s, Avg generation throughput: 10.3 tokens/s, Running: 8 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 7.2%, CPU KV cache usage: 0.0%.
INFO 04-12 16:24:41 [metrics.py:533] MM cache usage: 2.86% (13 items = 0.11 GiB)

V1 Engine:

INFO 04-12 16:32:05 [loggers.py:109] Engine 000: Avg prompt throughput: 12550.4 tokens/s, Avg generation throughput: 21.9 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 3.6%, Prefix cache hit rate: 76.9%
INFO 04-12 16:32:05 [loggers.py:126] P0 Processor MM cache usage: 17.91% (81 items = 0.72 GiB), hit rate: 71.28%; P0 Mirrored MM cache usage: 17.91% (81 items = 0.72 GiB), hit rate: 71.28%; P1 Mirrored MM cache usage: 17.70% (80 items = 0.71 GiB), hit rate: 100.00%

Notes

In V1, the number of items and memory of the three internal caches should remain in sync with each other, but since the stats are collected at different times, it is possible for the logged metrics to have minor differences.
In V1, the items in P0 processor and P0 mirror MM caches are the same instances, therefore the memory between them is shared and there is no memory duplication.
In V1, the hit rate of P1 mirror MM cache should always be 100% because the cache is only queried with mm_hash if there is a cache hit in P0, which implies cache hit in P1 since each item from P0 cache is passed to P1.

github-actions · 2025-04-11T11:29:26Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

DarkLight1337 · 2025-04-12T14:11:51Z

vllm/v1/core/kv_cache_utils.py

Moved to vllm.v1.metrics.stats.CachingMetrics

DarkLight1337 · 2025-04-12T15:50:57Z

Just ran a mistral-eval... there is a problem where the mirrored caches hit rate remains at zero because we don't actually call .get for those caches.

Edit: I have updated those caches to call .get now so they actually function as LRU caches. This may also fix #16273 (comment)

Edit 2: This change has been split out into #16593

vllm/v1/engine/async_llm.py

mergify · 2025-04-15T21:44:01Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @DarkLight1337.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-04-25T15:53:47Z

~~@markmc do you know why the test is getting a duplicated timeseries error?~~

mergify · 2025-04-26T05:10:32Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @DarkLight1337.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-04-26T09:37:33Z

Tests should pass now.

mergify · 2025-04-30T10:21:42Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @DarkLight1337.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: DarkLight1337 <[email protected]>

mergify · 2025-05-15T03:14:14Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @DarkLight1337.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

github-actions · 2025-10-02T02:04:46Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

DarkLight1337 · 2025-10-06T10:22:14Z

Closing as superseded by #26285

mergify bot added frontend multi-modality Related to multi-modality (#4194) v1 labels Apr 11, 2025

DarkLight1337 force-pushed the log-mm-cache branch 3 times, most recently from fa40cbe to aea9da7 Compare April 11, 2025 16:02

DarkLight1337 added this to Multi-modality Core Apr 12, 2025

DarkLight1337 moved this to In Progress in Multi-modality Core Apr 12, 2025

DarkLight1337 force-pushed the log-mm-cache branch 5 times, most recently from cb2eb10 to eea7385 Compare April 12, 2025 14:05

DarkLight1337 marked this pull request as ready for review April 12, 2025 14:05

DarkLight1337 requested review from robertgshaw2-redhat, simon-mo, WoosukKwon, njhill, ywang96, comaniac, alexm-redhat, zhuohan123 and youkaichao as code owners April 12, 2025 14:06

DarkLight1337 commented Apr 12, 2025

View reviewed changes

vllm/v1/core/kv_cache_utils.py Outdated

Copy link

Member Author

DarkLight1337 Apr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to vllm.v1.metrics.stats.CachingMetrics

DarkLight1337 force-pushed the log-mm-cache branch from 32d0ee9 to 6a12874 Compare April 12, 2025 16:10

DarkLight1337 mentioned this pull request Apr 12, 2025

[Bugfix] Avoid transferring cached multi-modal items from P0 to P1 #16273

Merged

njhill reviewed Apr 15, 2025

View reviewed changes

vllm/v1/engine/async_llm.py Outdated Show resolved Hide resolved

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 25, 2025

DarkLight1337 added 2 commits April 25, 2025 09:50

Fix circular import

6fb5b0e

Signed-off-by: DarkLight1337 <[email protected]>

Merge branch 'main' into log-mm-cache

4282d9d

mergify bot added the needs-rebase label Apr 26, 2025

DarkLight1337 added 2 commits April 26, 2025 05:11

Merge branch 'main' into log-mm-cache

255cc53

Signed-off-by: DarkLight1337 <[email protected]>

Fix incorrect class

d95047c

Signed-off-by: DarkLight1337 <[email protected]>

mergify bot removed the needs-rebase label Apr 26, 2025

DarkLight1337 mentioned this pull request Apr 28, 2025

[Optim] Compute multimodal hash only once per item #17314

Merged

mergify bot added the needs-rebase label Apr 30, 2025

DarkLight1337 added 2 commits April 30, 2025 10:51

Merge branch 'main' into log-mm-cache

7200adf

Signed-off-by: DarkLight1337 <[email protected]>

Fix merge

7cb1ed6

Signed-off-by: DarkLight1337 <[email protected]>

mergify bot removed the needs-rebase label Apr 30, 2025

Reduce diff

cd48bf0

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 mentioned this pull request May 10, 2025

[Bugfix] Avoid repeatedly creating dummy data during engine startup #17935

Merged

Merge branch 'main' into log-mm-cache

d943036

Signed-off-by: DarkLight1337 <[email protected]>

mergify bot added the needs-rebase label May 15, 2025

mergify bot added the speculative-decoding label Jul 3, 2025

DarkLight1337 mentioned this pull request Jul 30, 2025

[Feature]: Multimodal Benchmarking Support (MMLM) #21887

Open

1 task

github-actions bot added the stale Over 90 days of inactivity label Oct 2, 2025

DarkLight1337 mentioned this pull request Oct 6, 2025

[Metrics] Log multi-modal cache stats #26285

Open

5 tasks

DarkLight1337 closed this Oct 6, 2025

github-project-automation bot moved this from In Progress to Done in Multi-modality Core Oct 6, 2025

DarkLight1337 deleted the log-mm-cache branch October 6, 2025 10:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Metrics] Log multi-modal cache stats #16478

[Metrics] Log multi-modal cache stats #16478

Uh oh!

DarkLight1337 commented Apr 11, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 11, 2025

Uh oh!

DarkLight1337 Apr 12, 2025

Uh oh!

DarkLight1337 commented Apr 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

mergify bot commented Apr 15, 2025

Uh oh!

DarkLight1337 commented Apr 25, 2025 •

edited

Loading

Uh oh!

mergify bot commented Apr 26, 2025

Uh oh!

DarkLight1337 commented Apr 26, 2025

Uh oh!

mergify bot commented Apr 30, 2025

Uh oh!

mergify bot commented May 15, 2025

Uh oh!

github-actions bot commented Oct 2, 2025

Uh oh!

DarkLight1337 commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

[Metrics] Log multi-modal cache stats #16478

[Metrics] Log multi-modal cache stats #16478

Uh oh!

Conversation

DarkLight1337 commented Apr 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes

Example logs

Notes

Uh oh!

github-actions bot commented Apr 11, 2025

Uh oh!

DarkLight1337 Apr 12, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Apr 15, 2025

Uh oh!

DarkLight1337 commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Apr 26, 2025

Uh oh!

DarkLight1337 commented Apr 26, 2025

Uh oh!

mergify bot commented Apr 30, 2025

Uh oh!

mergify bot commented May 15, 2025

Uh oh!

github-actions bot commented Oct 2, 2025

Uh oh!

DarkLight1337 commented Oct 6, 2025

Uh oh!

Uh oh!

DarkLight1337 commented Apr 11, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented Apr 12, 2025 •

edited

Loading

DarkLight1337 commented Apr 25, 2025 •

edited

Loading