fix: _get_num_multimodal_tokens video branch (#43329) #43330

floor-licker · 2026-01-17T03:23:44Z

What does this PR do?

Summary

_get_num_multimodal_tokens(video_sizes=...) crashed in multiple multimodal processors because the
video branch called self.video_processor.get_number_of_video_patches(...) (not implemented on several video
processors), and used merge_size without initializing it (it was only set in the image branch).

This meant CI never exercised the video route (tests only covered image_sizes).

Changes

Initialize merge_size in the video_sizes branch (from videos_kwargs.get("merge_size") fallback to
self.video_processor.merge_size).
Add get_number_of_video_patches() to the relevant *VideoProcessor implementations (either as an
alias to the existing helper or as a small utility consistent with preprocessing).
Add unit coverage that calls _get_num_multimodal_tokens(video_sizes=[(8, 224, 224)]) so the video
branch is exercised in CI.

Affected processors

Ernie4_5_VL_MoeProcessor
Glm4vProcessor
Glm46VProcessor
Qwen2VLProcessor
Qwen2_5_VLProcessor
Qwen3VLProcessor
VideoLlama3Processor

Relevant Tests

python -m pytest -q
tests/models/qwen2_vl/test_processing_qwen2_vl.py::Qwen2VLProcessorTest::test_get_num_vision_tokens

tests/models/qwen2_5_vl/
test_processing_qwen2_5_vl.py::Qwen2_5_VLProcessorTest::test_get_num_vision_tokens
tests/models/qwen3_vl/test_processing_qwen3_vl.py::Qwen3VLProcessorTest::test_get_num_vision_tokens

tests/models/video_llama_3/
test_processing_video_llama_3.py::VideoLlama3ProcessorTest::test_get_num_vision_tokens
tests/models/ernie4_5_vl_moe/
test_processing_ernie4_5_vl_moe.py::Ernie4_5_VL_MoeProcessorTest::test_get_num_vision_tokens
tests/models/glm4v/test_processor_glm4v.py::Glm4vProcessorTest::test_get_num_vision_tokens_video
tests/models/glm46v/test_processor_glm46v.py::Glm46VProcessorTest::test_get_num_vision_tokens_video

github-actions · 2026-01-17T04:26:55Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: ernie4_5_vl_moe, glm46v, glm4v, qwen2_5_vl, qwen2_vl, qwen3_vl, video_llama_3

floor-licker added 2 commits January 16, 2026 22:11

Fix _get_num_multimodal_tokens video branch (huggingface#43329)

7f5b723

Fix _get_num_multimodal_tokens video branch (huggingface#43329)

84476f7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: _get_num_multimodal_tokens video branch (#43329) #43330

fix: _get_num_multimodal_tokens video branch (#43329) #43330

Uh oh!

floor-licker commented Jan 17, 2026

Uh oh!

github-actions bot commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: _get_num_multimodal_tokens video branch (#43329) #43330

Are you sure you want to change the base?

fix: _get_num_multimodal_tokens video branch (#43329) #43330

Uh oh!

Conversation

floor-licker commented Jan 17, 2026

What does this PR do?

Summary

Changes

Affected processors

Relevant Tests

Uh oh!

github-actions bot commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant