Skip to content

[Core] Use individual MM items in P0/P1 cache and model runner #22570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Aug 9, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Follow-up to #22457, in preparation for moving processing cache from P0 to P1.

Key changes:

  • MultiModalKwargsItem can now contain empty data.
  • The P0/P1 cache now accepts a list of MultiModalKwargsItem, and modifies MultiModalKwargsItem in place.
  • EngineCoreRequest, Request, NewRequestData, CachedRequestState now use mm_kwargs: list[MultiModalKwargsItem] instead of mm_inputs: list[MultiModalKwargs]. (cc @wangxiyuan please update vllm/vllm-ascend accordingly after this PR)
  • Reworked merge_and_sort_multimodal_metadata -> argsort_mm_positions and group_mm_inputs_by_modality -> group_mm_kwargs_by_modality with new semantics to enhance code reuse.
  • Support pin_memory argument for merging MultiModalFieldElems (unused for now, see comment inside group_mm_kwargs_by_modality)

Test Plan

Test Result

(Optional) Documentation Update

Copy link

github-actions bot commented Aug 9, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@DarkLight1337 DarkLight1337 moved this to In Progress in Multi-modality Core Aug 9, 2025
@mergify mergify bot added multi-modality Related to multi-modality (#4194) v1 tpu Related to Google TPUs labels Aug 9, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant refactoring of how multimodal inputs are handled, moving from MultiModalKwargs per request to a list of MultiModalKwargsItem. This change is aimed at improving the design for caching and processing of multimodal data. The changes are extensive, touching many files in the core engine, workers, and tests. The tests have been updated to reflect the new logic, which is a positive sign. However, I've identified a critical issue in the new MultiModalKwargsItem.__init__ method that can lead to runtime errors with empty inputs. Additionally, there's a potential data loss bug in gpu_model_runner.py when handling raw multimodal inputs with mixed modalities, which could silently drop data. These issues should be addressed to ensure the correctness of the new implementation.

Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
if len(batch) > 0 and is_list_of(batch, torch.Tensor, check="all"):
if len(batch) == 1:
# An optimization when `batch` contains only one tensor:
# - produce exactly same result as `torch.concat(batch)`
# - will achieve zero-copy if the tensor is contiguous
return batch[0].contiguous()

def _expect_same_shape(tensor: torch.Tensor):
return tensor.shape[:self.dim] + tensor.shape[self.dim + 1:]
dim = self.dim + (self.dim < 0) * len(batch[0].shape)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra self.dim < 0 check allows negative dim to be passed to this field

@@ -51,6 +53,13 @@ def __post_init__(self):
def num_tokens(self) -> int:
return self.num_prompt_tokens + len(self.output_token_ids)

# Temporary back-compatibility for plugins that define model runner
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, just leave some nits.

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 10, 2025
@DarkLight1337
Copy link
Member Author

Added ready label just to check CI, please don't merge yet as this is pending discussion with @ywang96 @WoosukKwon

Copy link
Contributor

@huachenheli huachenheli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark MultiModalKwargs class as deprecated?

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Aug 11, 2025

Mark MultiModalKwargs class as deprecated?

It is still used by BaseMultiModalProcessor to remain compatible with V0

Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs v1
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants