vllm-project · dsikka · Feb 11, 2026 · Feb 11, 2026
diff --git a/examples/autoround/quantization_w4a4_fp4/README.md b/examples/autoround/quantization_w4a4_fp4/README.md
@@ -16,17 +16,15 @@ pip install -e .
 
 ## Quickstart
 
-The example includes end-to-end scripts for applying the AutoRound quantization algorithm.
-
-### Llama 3.1 Example
+The example includes an end-to-end script for applying the AutoRound quantization algorithm.
 
 ```bash
 python3 llama3.1_example.py
 ```
 
 The resulting model `Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound` is ready to be loaded into vLLM.
 
-#### Evaluate Accuracy
+### Evaluate Accuracy
 
 With the model created, we can now load and run in vLLM (after installing).
 
@@ -48,68 +46,33 @@ lm_eval --model vllm \
   --batch_size 'auto'
 ```
 
-##### meta-llama/Meta-Llama-3.1-8B-Instruct
+#### meta-llama/Meta-Llama-3.1-8B-Instruct
 |Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
 |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
 |gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.7710|±  |0.0116|
 |     |       |strict-match    |     5|exact_match|↑  |0.7043|±  |0.0126|
 
-##### Meta-Llama-3.1-8B-Instruct-NVFP4 (QuantizationModifier)
+#### Meta-Llama-3.1-8B-Instruct-NVFP4 (QuantizationModifier)
 |Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
 |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
 |gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.7248|±  |0.0123|
 |     |       |strict-match    |     5|exact_match|↑  |0.6611|±  |0.0130|
 
 
-##### Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound (AutoRoundModifier, iters=0)
+#### Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound (AutoRoundModifier, iters=0)
 |Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
 |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
 |gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.7362|±  |0.0121|
 |     |       |strict-match    |     5|exact_match|↑  |0.6702|±  |0.0129|
 
-##### Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound (AutoRoundModifier, iters=200)
+#### Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound (AutoRoundModifier, iters=200)
 |Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
 |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
 |gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.7210|±  |0.0124|
 |     |       |strict-match    |     5|exact_match|↑  |0.6945|±  |0.0127|
 
 > Note: quantized model accuracy may vary slightly due to nondeterminism.
 
-### Qwen3-VL Example
-
-```bash
-python3 qwen3_vl_example.py
-```
-
-The resulting model `Qwen3-VL-8B-Instruct-NVFP4-AutoRound` is ready to be loaded into vLLM.
-
-#### Evaluate Accuracy
-
-Run the following to test accuracy on GSM-8K:
-
-```bash
-lm_eval --model vllm-vlm \
-  --model_args pretrained="./Qwen3-VL-8B-Instruct-NVFP4-AutoRound",add_bos_token=true \
-  --tasks gsm8k \
-  --num_fewshot 5 \
-  --batch_size 'auto'
-```
-
-##### Qwen3-VL-8B-Instruct (Baseline)
-|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
-|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
-|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8628|±  |0.0095|
-|     |       |strict-match    |     5|exact_match|↑  |0.8453|±  |0.0100|
-
-
-##### Qwen3-VL-8B-Instruct-NVFP4-AutoRound (AutoRoundModifier, iters=200)
-|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|                                       
-|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|                                       
-|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8415|±  |0.0101|                                       
-|     |       |strict-match    |     5|exact_match|↑  |0.8408|±  |0.0101|
-
-> Note: quantized model accuracy may vary slightly due to nondeterminism.
-
 ### Questions or Feature Request?
 
 Please open up an issue on [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor) or [intel/auto-round](https://github.com/intel/auto-round).
diff --git a/examples/autoround/quantization_w4a4_fp4/qwen3_vl_example.py b/examples/autoround/quantization_w4a4_fp4/qwen3_vl_example.py