Hello, we are currently attempting to reproduce the experimental results of Qwen3-VL-8B-Instruct based on the lmms-eval framework. However, we are unable to reproduce the official reported results on both image and video benchmarks.
Our experimental settings are as follows:
- For image benchmarks:
max_pixels = 12845056, attn_implementation = flash_attention_2, interleave_visuals = False
- For video benchmarks:
max_pixels = 301056, attn_implementation = flash_attention_2, max_num_frames = 64
(For benchmarks such as MVBench, fps = 1 is used.)