[Core] Use individual MM items in P0/P1 cache and model runner #22570

DarkLight1337 · 2025-08-09T15:50:31Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Follow-up to #22457, in preparation for moving processing cache from P0 to P1.

Key changes:

MultiModalKwargsItem can now contain empty data.
The P0/P1 cache now accepts a list of MultiModalKwargsItem, and modifies MultiModalKwargsItem in place.
EngineCoreRequest, Request, NewRequestData, CachedRequestState now use mm_kwargs: list[MultiModalKwargsItem] instead of mm_inputs: list[MultiModalKwargs]. (cc @wangxiyuan please update vllm/vllm-ascend accordingly after this PR)
Reworked merge_and_sort_multimodal_metadata -> argsort_mm_positions and group_mm_inputs_by_modality -> group_mm_kwargs_by_modality with new semantics to enhance code reuse.
Support pin_memory argument for merging MultiModalFieldElems (unused for now, see comment inside group_mm_kwargs_by_modality)

Test Plan

Test Result

(Optional) Documentation Update

Signed-off-by: DarkLight1337 <[email protected]>

github-actions · 2025-08-09T15:50:38Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request is a significant refactoring of how multimodal inputs are handled, moving from MultiModalKwargs per request to a list of MultiModalKwargsItem. This change is aimed at improving the design for caching and processing of multimodal data. The changes are extensive, touching many files in the core engine, workers, and tests. The tests have been updated to reflect the new logic, which is a positive sign. However, I've identified a critical issue in the new MultiModalKwargsItem.__init__ method that can lead to runtime errors with empty inputs. Additionally, there's a potential data loss bug in gpu_model_runner.py when handling raw multimodal inputs with mixed modalities, which could silently drop data. These issues should be addressed to ensure the correctness of the new implementation.

vllm/multimodal/inputs.py

vllm/v1/worker/gpu_model_runner.py

vllm/v1/worker/tpu_model_runner.py

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-08-09T16:05:46Z

vllm/multimodal/inputs.py

        if len(batch) > 0 and is_list_of(batch, torch.Tensor, check="all"):
            if len(batch) == 1:
                # An optimization when `batch` contains only one tensor:
                # - produce exactly same result as `torch.concat(batch)`
                # - will achieve zero-copy if the tensor is contiguous
                return batch[0].contiguous()

-            def _expect_same_shape(tensor: torch.Tensor):
-                return tensor.shape[:self.dim] + tensor.shape[self.dim + 1:]
+            dim = self.dim + (self.dim < 0) * len(batch[0].shape)


The extra self.dim < 0 check allows negative dim to be passed to this field

DarkLight1337 · 2025-08-09T16:07:10Z

vllm/v1/worker/gpu_input_batch.py

@@ -51,6 +53,13 @@ def __post_init__(self):
    def num_tokens(self) -> int:
        return self.num_prompt_tokens + len(self.output_token_ids)

+    # Temporary back-compatibility for plugins that define model runner


This fallback is determined by https://github.com/vllm-project/vllm/pull/22570/files#diff-629bb642993061658312f62ddfdfc2fabe3bf7a335eee5451e7cde5b23fbc2bbL335

Signed-off-by: DarkLight1337 <[email protected]>

Isotr0py

Overall LGTM, just leave some nits.

vllm/multimodal/inputs.py

vllm/multimodal/utils.py

vllm/multimodal/inputs.py

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-08-10T05:14:36Z

Added ready label just to check CI, please don't merge yet as this is pending discussion with @ywang96 @WoosukKwon

vllm/v1/worker/gpu_input_batch.py

huachenheli

Mark MultiModalKwargs class as deprecated?

DarkLight1337 · 2025-08-11T03:45:48Z

Mark MultiModalKwargs class as deprecated?

It is still used by BaseMultiModalProcessor to remain compatible with V0

Signed-off-by: DarkLight1337 <[email protected]>

[Core] Use individual MM items in P0/P1 cache and model runner

ec347bf

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested a review from Isotr0py August 9, 2025 15:50

DarkLight1337 requested review from ywang96, WoosukKwon, robertgshaw2-redhat and njhill as code owners August 9, 2025 15:50

DarkLight1337 added this to Multi-modality Core Aug 9, 2025

DarkLight1337 requested review from comaniac and alexm-redhat as code owners August 9, 2025 15:50

DarkLight1337 moved this to In Progress in Multi-modality Core Aug 9, 2025

mergify bot added multi-modality Related to multi-modality (#4194) v1 tpu Related to Google TPUs labels Aug 9, 2025

gemini-code-assist bot reviewed Aug 9, 2025

View reviewed changes

vllm/multimodal/inputs.py Outdated Show resolved Hide resolved

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/worker/gpu_model_runner.py Show resolved Hide resolved

vllm/v1/worker/tpu_model_runner.py Show resolved Hide resolved

DarkLight1337 added 2 commits August 9, 2025 15:56

Address comment

c4da5dc

Signed-off-by: DarkLight1337 <[email protected]>

Address comment

3a36adb

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 commented Aug 9, 2025

View reviewed changes

DarkLight1337 added 2 commits August 9, 2025 16:10

Assertion

d5b74ad

Signed-off-by: DarkLight1337 <[email protected]>

Merge branch 'main' into mm-cache-item

bd05abb

Isotr0py approved these changes Aug 10, 2025

View reviewed changes

vllm/multimodal/inputs.py Show resolved Hide resolved

vllm/multimodal/utils.py Outdated Show resolved Hide resolved

vllm/multimodal/inputs.py Show resolved Hide resolved

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 10, 2025

Address comment; add back-compat

426061d

Signed-off-by: DarkLight1337 <[email protected]>

huachenheli reviewed Aug 11, 2025

View reviewed changes

vllm/v1/worker/gpu_input_batch.py Show resolved Hide resolved

huachenheli reviewed Aug 11, 2025

View reviewed changes

DarkLight1337 added 2 commits August 11, 2025 12:56

Merge branch 'main' into mm-cache-item

9c4406c

Signed-off-by: DarkLight1337 <[email protected]>

Update type annotations and message

193f9e8

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 added 2 commits August 11, 2025 16:32

Fix wrong is_empty

e691d97

Signed-off-by: DarkLight1337 <[email protected]>

Rename

e3c85e4

Signed-off-by: DarkLight1337 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Use individual MM items in P0/P1 cache and model runner #22570

[Core] Use individual MM items in P0/P1 cache and model runner #22570

DarkLight1337 commented Aug 9, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 Aug 9, 2025

Uh oh!

DarkLight1337 Aug 9, 2025

Uh oh!

Isotr0py left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Aug 10, 2025

Uh oh!

Uh oh!

huachenheli left a comment

Uh oh!

DarkLight1337 commented Aug 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[Core] Use individual MM items in P0/P1 cache and model runner #22570

Are you sure you want to change the base?

[Core] Use individual MM items in P0/P1 cache and model runner #22570

Conversation

DarkLight1337 commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Aug 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Aug 10, 2025

Uh oh!

Uh oh!

huachenheli left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Aug 9, 2025 •

edited

Loading

DarkLight1337 commented Aug 11, 2025 •

edited

Loading