[BUG] _get_num_multimodal_tokens: video branch uses undefined a) get_number_of_video_patches, b)merge_size. Tests never hit video route (multiple VLM processors)

### System Info

- `transformers` version: 4.57.3
- Platform: Linux-6.8.0-52-generic-x86_64-with-glibc2.35
- Python version: 3.12.9
- Huggingface_hub version: 0.34.3
- Safetensors version: 0.5.3
- Accelerate version: 1.9.0
- Accelerate config:    - compute_environment: LOCAL_MACHINE
        - distributed_type: MULTI_GPU
        - mixed_precision: bf16
        - use_cpu: False
        - debug: False
        - num_processes: 8
        - machine_rank: 0
        - num_machines: 1
        - gpu_ids: all
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - enable_cpu_affinity: True
        - downcast_bf16: no
        - tpu_use_cluster: False
        - tpu_use_sudo: False
        - tpu_env: []
- DeepSpeed version: 0.17.4
- PyTorch version (accelerator?): 2.7.1+cu128 (CUDA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: no
- Using GPU in script?: no
- GPU type: NVIDIA H100 80GB HBM3

### Who can help?

@zucchini-nlp 

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

These are pure python logic bugs, they exist in multiple processors. 
1. The video branch of `_get_num_multimodal_tokens` calls `get_number_of_video_patches()` but its not implemented for any of these processors.
2. The video branch of `_get_num_multimodal_tokens` uses `merge_size` without defining it in the video branch.

Affected processors/files:
- `src/transformers/models/ernie4_5_vl_moe/processing_ernie4_5_vl_moe.py`
- `src/transformers/models/glm46v/processing_glm46v.py`
- `src/transformers/models/glm4v/processing_glm4v.py`
- `src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py`
- `src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py`
- `src/transformers/models/qwen2_vl/processing_qwen2_vl.py`
- `src/transformers/models/qwen3_vl/processing_qwen3_vl.py` 
- `src/transformers/models/video_llama_3/processing_video_llama_3.py`

Minimal repro (any affected processor):
```py
# assumes you have a processor instance for any of the above models
processor._get_num_multimodal_tokens(video_sizes=[(8, 224, 224)])
```

This will results to 2 issues:

1. First issue: AttributeError: `'Qwen3VLVideoProcessor' object has no attribute 'get_number_of_video_patches'`. All of the mentioned (affected processors) use this function, but its not defined -in comparison with the image counterpart `get_number_of_image_patches` which exists and all tests pass -video branch doesn't have tests.

2. Second issue: After you solve this by implementing 'get_number_of_video_patches' , then the undefined `merge_size` will arise: `UnboundLocalError: cannot access local variable 'merge_size' where it is not associated with a value`

### Expected behavior

Calling `_get_num_multimodal_tokens(video_sizes=...)` should not crash.

The video branch should define `merge_size` (same as image branch) and `get_number_of_video_patches` (similarly as image branch).

Also in tests I think it would be a good idea to add a unit test that calls `_get_num_multimodal_tokens(video_sizes=[...])` so CI actually exercises the video path.

------
I’m happy to send a PR with the fixes + tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] _get_num_multimodal_tokens: video branch uses undefined a) get_number_of_video_patches, b)merge_size. Tests never hit video route (multiple VLM processors) #43329

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] _get_num_multimodal_tokens: video branch uses undefined a) get_number_of_video_patches, b)merge_size. Tests never hit video route (multiple VLM processors) #43329

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions