Skip to content

Image features and image tokens do not match #948

@Resurgamm

Description

@Resurgamm

I configured the environment according to the instructions and ran the following script:

export HF_HOME="~/.cache/huggingface"
# pip3 install transformers==4.57.1 (Qwen3VL models)
# pip3 install ".[qwen]" (for Qwen's dependencies)

# Exmaple with Qwen3-VL-4B-Instruct: https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct 

accelerate launch --num_processes=1 --main_process_port=12346 -m lmms_eval \
    --model qwen3_vl \
    --model_args=pretrained=Qwen/Qwen3-VL-4B-Instruct,max_pixels=12845056,attn_implementation=flash_attention_2,interleave_visuals=False \
    --tasks "mmmu_val" \
    --batch_size 1 \
    --verbosity=DEBUG \
    --output_path ./logs/qwen3vl_4b_instruct \

I meet the following traceback:

[rank0]: Traceback (most recent call last):
[rank0]:   File "<frozen runpy>", line 198, in _run_module_as_main
[rank0]:   File "<frozen runpy>", line 88, in _run_code
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/lmms_eval/__main__.py", line 549, in <module>
[rank0]:     cli_evaluate()
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/lmms_eval/__main__.py", line 368, in cli_evaluate
[rank0]:     raise e
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/lmms_eval/__main__.py", line 349, in cli_evaluate
[rank0]:     results, samples = cli_evaluate_single(args)
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/lmms_eval/__main__.py", line 484, in cli_evaluate_single
[rank0]:     results = evaluator.simple_evaluate(
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/lmms_eval/utils.py", line 590, in _wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/lmms_eval/evaluator.py", line 268, in simple_evaluate
[rank0]:     results = evaluate(
[rank0]:               ^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/lmms_eval/utils.py", line 590, in _wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/lmms_eval/evaluator.py", line 506, in evaluate
[rank0]:     resps = getattr(lm, reqtype)(cloned_reqs)  # Choiszt run generate until
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/lmms_eval/models/chat/qwen3_vl.py", line 110, in generate_until
[rank0]:     cont = self.model.generate(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2564, in generate
[rank0]:     result = decoding_method(
[rank0]:              ^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2784, in _sample
[rank0]:     outputs = self(**model_inputs, return_dict=True)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/.venv/lib/python3.12/site-packages/accelerate/hooks.py", line 175, in new_forward
[rank0]:     output = module._old_forward(*args, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/.venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 1064, in wrapper
[rank0]:     outputs = func(self, *args, **kwargs)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/.venv/lib/python3.12/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1344, in forward
[rank0]:     outputs = self.model(
[rank0]:               ^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/.venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 1064, in wrapper
[rank0]:     outputs = func(self, *args, **kwargs)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/.venv/lib/python3.12/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1140, in forward
[rank0]:     image_mask, _ = self.get_placeholder_mask(
[rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/yifan26/LiBoyi/lmms-eval/.venv/lib/python3.12/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1093, in get_placeholder_mask
[rank0]:     raise ValueError(
[rank0]: ValueError: Image features and image tokens do not match: tokens: 812, features 812
Model Responding:   0%|                                      | 0/2374 [00:23<?, ?it/s]
[rank0]:[W1223 09:50:10.377108539 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

How can I solve this problem Image features and image tokens do not match: tokens: 812, features 812?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions