Skip to content

Conversation

@floor-licker
Copy link

What does this PR do?

Fixes #43329

Summary

_get_num_multimodal_tokens(video_sizes=...) crashed in multiple multimodal processors because the
video branch called self.video_processor.get_number_of_video_patches(...) (not implemented on several video
processors), and used merge_size without initializing it (it was only set in the image branch).

This meant CI never exercised the video route (tests only covered image_sizes).

Changes

  • Initialize merge_size in the video_sizes branch (from videos_kwargs.get("merge_size") fallback to
    self.video_processor.merge_size).
  • Add get_number_of_video_patches() to the relevant *VideoProcessor implementations (either as an
    alias to the existing helper or as a small utility consistent with preprocessing).
  • Add unit coverage that calls _get_num_multimodal_tokens(video_sizes=[(8, 224, 224)]) so the video
    branch is exercised in CI.

Affected processors

  • Ernie4_5_VL_MoeProcessor
  • Glm4vProcessor
  • Glm46VProcessor
  • Qwen2VLProcessor
  • Qwen2_5_VLProcessor
  • Qwen3VLProcessor
  • VideoLlama3Processor

Relevant Tests

python -m pytest -q
tests/models/qwen2_vl/test_processing_qwen2_vl.py::Qwen2VLProcessorTest::test_get_num_vision_tokens

tests/models/qwen2_5_vl/
test_processing_qwen2_5_vl.py::Qwen2_5_VLProcessorTest::test_get_num_vision_tokens
tests/models/qwen3_vl/test_processing_qwen3_vl.py::Qwen3VLProcessorTest::test_get_num_vision_tokens

tests/models/video_llama_3/
test_processing_video_llama_3.py::VideoLlama3ProcessorTest::test_get_num_vision_tokens
tests/models/ernie4_5_vl_moe/
test_processing_ernie4_5_vl_moe.py::Ernie4_5_VL_MoeProcessorTest::test_get_num_vision_tokens
tests/models/glm4v/test_processor_glm4v.py::Glm4vProcessorTest::test_get_num_vision_tokens_video
tests/models/glm46v/test_processor_glm46v.py::Glm46VProcessorTest::test_get_num_vision_tokens_video

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: ernie4_5_vl_moe, glm46v, glm4v, qwen2_5_vl, qwen2_vl, qwen3_vl, video_llama_3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant