-
Notifications
You must be signed in to change notification settings - Fork 300
Open
Labels
bugSomething isn't workingSomething isn't working
Description
โ๏ธ Your current environment
The output of python collect_env.py
### Environment Information ###
Operating System: `Linux-6.8.0-1043-aws-x86_64-with-glibc2.35`
Python Version: `3.10.12 (main, Nov 4 2025, 08:48:33) [GCC 11.4.0]`
llm-compressor Version: `0.8.1`
compressed-tensors Version: `0.12.2`
transformers Version: `4.53.3`
torch Version: `2.6.0+cu126`
CUDA Devices: `['NVIDIA L40S']`
AMD Devices: `None`
๐ Describe the bug
Hi team, I encounter an issue that FP8-quantized Llava-Onevision, Qwen2-VL models produce identical outputs for all input images (e.g., "blue" for red/blue/green images), while non-quantized models work correctly. FP8-quantized Qwen2.5-VL models work well.
Could you please help to take a look?
๐ ๏ธ Steps to reproduce
-
Reproduction: Load
nm-testing/llava-onevision-qwen2-7b-ov-hf-FP8-dynamic,nm-testing/Qwen2-VL-7B-Instruct-FP8-dynamicwith vLLM, test with different colored images (red/blue/green) - all produce identical outputs.RedHatAI/Qwen2.5-VL-3B-Instruct-FP8-dynamicanswers correctly.- Expected: Different outputs for different images (works with non-quantized)
- Actual: All images produce "blue" / "purple" (identical outputs)
-
Scripts:
test_nm_testing_fp8_qwen2vl.py
test_redhat_fp8.py
test_nm_testing_fp8.py
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working