[Bugfix] Standardize getting number of image patches/tokens by DarkLight1337 · Pull Request #34358 · vllm-project/vllm

DarkLight1337 · 2026-02-11T16:58:38Z

Purpose

Consider mm_kwargs when determining number of image tokens.
Disallow passing processor=None to simplify the code
Fix Idefics3 and SmolVLM tests not passing mm_kwargs to the reference processor call.

FIX Idefics3 test in #34334

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <[email protected]>

gemini-code-assist

Code Review

This pull request standardizes the methods for calculating the number of image tokens across various multimodal models. The changes correctly enforce that a processor must be passed and that mm_kwargs are considered when applicable. This simplifies the code, improves consistency, and fixes bugs where these arguments were previously ignored. The refactoring is well-executed across multiple files. I have found one critical issue that could lead to a runtime error.

vllm/model_executor/models/molmo2.py

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2026-02-11T17:04:53Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant and valuable refactoring across multiple multimodal models to standardize how the number of image patches and tokens are calculated. By making the processor argument non-optional and consistently passing mm_kwargs, the changes eliminate boilerplate code, improve clarity, and enhance correctness. The bug fixes in the Idefics3 and SmolVLM tests, as well as the fix for SmolVLMProcessingInfo._get_image_token, are also important improvements. The code is now more robust and easier to maintain. Overall, this is a well-executed and beneficial change.

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2026-02-12T04:56:06Z

I think there is some bug with get_number_of_image_patches in the idefics3 image processor (possibly same with smolvlm as well).

images
>>> [[<PIL.Image.Image image mode=RGB size=1456x1456 at 0x7FF451210A90>]]
>>> output_kwargs["images_kwargs"]
{'return_row_col_info': True, 'size': {'longest_edge': 364}, 'return_tensors': 'pt', 'input_data_format': 'channels_last'}
image_inputs = self.image_processor(images, **output_kwargs["images_kwargs"])
{k: image_inputs[k] for k in ("rows", "cols")
>>> {'rows': [[0]], 'cols': [[0]]}

self.image_processor.get_number_of_image_patches(1456, 1456, output_kwargs["images_kwargs"])
>>> (1, 1, 1)

get_number_of_image_patches should be returning (0, 0, 0).

cc @hmellor @ArthurZucker @zucchini-nlp

zucchini-nlp · 2026-02-12T08:19:48Z

By default it was meant to be "one" meaning no cropping to patches iirc. But it indeed is confusing if the number doesn't match the rows/cols we get from calling the processor. That seems to have introduced the bug
Have you already tested with various numbers of cols and rows, if the final number of placeholders is different? Or I can test myself and fix

DarkLight1337 · 2026-02-12T08:21:34Z

I will be afk for much of the day, would be much appreciated if you could help test this!

zucchini-nlp · 2026-02-12T08:25:24Z

Will do, no prob!

zucchini-nlp · 2026-02-12T13:53:06Z

@DarkLight1337 can you check if huggingface/transformers#43948 solves your problem? I found issues with a few other models and fixed them as well

DarkLight1337 · 2026-02-12T15:28:57Z

The processor can at least run without errors but the test still fails due to incorrect output.

zucchini-nlp · 2026-02-12T15:32:11Z

Is that the test I've pointed out yesterday and can you make sure that the height/width passed to the utility is correct?

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2026-02-12T15:34:25Z

Ok the test has been fixed by f3705cd, I was just passing the kwargs incorrectly.

DarkLight1337 · 2026-02-12T15:35:28Z

For now let's disable the tests until your patch has been merged.

DarkLight1337 · 2026-02-12T15:36:07Z

Can we assume that your patch will land in v5.2?

zucchini-nlp · 2026-02-12T15:37:25Z

Yes, it will

DarkLight1337 · 2026-02-12T15:37:39Z

Alright, then this PR should be good to go, thanks!

Signed-off-by: DarkLight1337 <[email protected]>

[Bugfix] Standardize getting number of image patches/tokens

d713500

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested review from Isotr0py and hmellor February 11, 2026 16:58

DarkLight1337 requested review from patrickvonplaten and ywang96 as code owners February 11, 2026 16:58

DarkLight1337 added ready ONLY add when PR is ready to merge/full CI is needed multi-modality Related to multi-modality (#4194) labels Feb 11, 2026

mergify bot added the bug Something isn't working label Feb 11, 2026

gemini-code-assist bot reviewed Feb 11, 2026

View reviewed changes

vllm/model_executor/models/molmo2.py Outdated Show resolved Hide resolved

Fix

9a52232

Signed-off-by: DarkLight1337 <[email protected]>

gemini-code-assist bot reviewed Feb 11, 2026

View reviewed changes

hmellor mentioned this pull request Feb 11, 2026

Update to transformers v5 #30566

Open

Isotr0py approved these changes Feb 11, 2026

View reviewed changes

DarkLight1337 added 4 commits February 11, 2026 18:13

Fix

3698039

Signed-off-by: DarkLight1337 <[email protected]>

Fixes

79febb6

Signed-off-by: DarkLight1337 <[email protected]>

Remove outdated marke

ac772c7

Signed-off-by: DarkLight1337 <[email protected]>

Fix

1382734

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 marked this pull request as draft February 12, 2026 04:51

zucchini-nlp mentioned this pull request Feb 12, 2026

Fix get_number_of_image_tokens huggingface/transformers#43948

Merged

Fix

f3705cd

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 force-pushed the fix-image-patches branch from bb09dd8 to f3705cd Compare February 12, 2026 15:33

Rename

2610738

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 marked this pull request as ready for review February 12, 2026 15:35

DarkLight1337 requested review from NickLucche, WoosukKwon, mgoin, tjtanaa, tlrmchlsmth and yewentao256 as code owners February 12, 2026 15:35

DarkLight1337 enabled auto-merge (squash) February 12, 2026 15:37

Update more models

552f640

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested a review from sighingnow as a code owner February 12, 2026 17:24

Uh oh!

Conversation

DarkLight1337 commented Feb 11, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkLight1337 commented Feb 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

DarkLight1337 commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp commented Feb 12, 2026

Uh oh!

DarkLight1337 commented Feb 12, 2026

Uh oh!

zucchini-nlp commented Feb 12, 2026

Uh oh!

zucchini-nlp commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Feb 12, 2026

Uh oh!

zucchini-nlp commented Feb 12, 2026

Uh oh!

DarkLight1337 commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Feb 12, 2026

Uh oh!

DarkLight1337 commented Feb 12, 2026

Uh oh!

zucchini-nlp commented Feb 12, 2026

Uh oh!

DarkLight1337 commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DarkLight1337 commented Feb 11, 2026 •

edited by github-actions bot

Loading

DarkLight1337 commented Feb 12, 2026 •

edited

Loading

zucchini-nlp commented Feb 12, 2026 •

edited

Loading

DarkLight1337 commented Feb 12, 2026 •

edited

Loading