Skip to content

[Bug]: The same, wrong results of FP8-Dynamic Qwen2-VL, Llava-OneVision regardless input imagesย #2072

@tu-canva

Description

@tu-canva

โš™๏ธ Your current environment

The output of python collect_env.py
### Environment Information ###
Operating System: `Linux-6.8.0-1043-aws-x86_64-with-glibc2.35`
Python Version: `3.10.12 (main, Nov  4 2025, 08:48:33) [GCC 11.4.0]`
llm-compressor Version: `0.8.1`
compressed-tensors Version: `0.12.2`
transformers Version: `4.53.3`
torch Version: `2.6.0+cu126`
CUDA Devices: `['NVIDIA L40S']`
AMD Devices: `None`

๐Ÿ› Describe the bug

Hi team, I encounter an issue that FP8-quantized Llava-Onevision, Qwen2-VL models produce identical outputs for all input images (e.g., "blue" for red/blue/green images), while non-quantized models work correctly. FP8-quantized Qwen2.5-VL models work well.
Could you please help to take a look?

๐Ÿ› ๏ธ Steps to reproduce

  • Reproduction: Load nm-testing/llava-onevision-qwen2-7b-ov-hf-FP8-dynamic, nm-testing/Qwen2-VL-7B-Instruct-FP8-dynamic with vLLM, test with different colored images (red/blue/green) - all produce identical outputs. RedHatAI/Qwen2.5-VL-3B-Instruct-FP8-dynamic answers correctly.

    • Expected: Different outputs for different images (works with non-quantized)
    • Actual: All images produce "blue" / "purple" (identical outputs)
  • Scripts:
    test_nm_testing_fp8_qwen2vl.py
    test_redhat_fp8.py
    test_nm_testing_fp8.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions