@@ -16,17 +16,15 @@ pip install -e .
1616
1717## Quickstart
1818
19- The example includes end-to-end scripts for applying the AutoRound quantization algorithm.
20-
21- ### Llama 3.1 Example
19+ The example includes an end-to-end script for applying the AutoRound quantization algorithm.
2220
2321``` bash
2422python3 llama3.1_example.py
2523```
2624
2725The resulting model ` Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound ` is ready to be loaded into vLLM.
2826
29- #### Evaluate Accuracy
27+ ### Evaluate Accuracy
3028
3129With the model created, we can now load and run in vLLM (after installing).
3230
@@ -48,68 +46,33 @@ lm_eval --model vllm \
4846 --batch_size ' auto'
4947```
5048
51- ##### meta-llama/Meta-Llama-3.1-8B-Instruct
49+ #### meta-llama/Meta-Llama-3.1-8B-Instruct
5250| Tasks| Version| Filter | n-shot| Metric | | Value | | Stderr|
5351| -----| ------:| ----------------| -----:| -----------| ---| -----:| ---| -----:|
5452| gsm8k| 3| flexible-extract| 5| exact_match| ↑ | 0.7710| ± | 0.0116|
5553| | | strict-match | 5| exact_match| ↑ | 0.7043| ± | 0.0126|
5654
57- ##### Meta-Llama-3.1-8B-Instruct-NVFP4 (QuantizationModifier)
55+ #### Meta-Llama-3.1-8B-Instruct-NVFP4 (QuantizationModifier)
5856| Tasks| Version| Filter | n-shot| Metric | | Value | | Stderr|
5957| -----| ------:| ----------------| -----:| -----------| ---| -----:| ---| -----:|
6058| gsm8k| 3| flexible-extract| 5| exact_match| ↑ | 0.7248| ± | 0.0123|
6159| | | strict-match | 5| exact_match| ↑ | 0.6611| ± | 0.0130|
6260
6361
64- ##### Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound (AutoRoundModifier, iters=0)
62+ #### Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound (AutoRoundModifier, iters=0)
6563| Tasks| Version| Filter | n-shot| Metric | | Value | | Stderr|
6664| -----| ------:| ----------------| -----:| -----------| ---| -----:| ---| -----:|
6765| gsm8k| 3| flexible-extract| 5| exact_match| ↑ | 0.7362| ± | 0.0121|
6866| | | strict-match | 5| exact_match| ↑ | 0.6702| ± | 0.0129|
6967
70- ##### Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound (AutoRoundModifier, iters=200)
68+ #### Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound (AutoRoundModifier, iters=200)
7169| Tasks| Version| Filter | n-shot| Metric | | Value | | Stderr|
7270| -----| ------:| ----------------| -----:| -----------| ---| -----:| ---| -----:|
7371| gsm8k| 3| flexible-extract| 5| exact_match| ↑ | 0.7210| ± | 0.0124|
7472| | | strict-match | 5| exact_match| ↑ | 0.6945| ± | 0.0127|
7573
7674> Note: quantized model accuracy may vary slightly due to nondeterminism.
7775
78- ### Qwen3-VL Example
79-
80- ``` bash
81- python3 qwen3_vl_example.py
82- ```
83-
84- The resulting model ` Qwen3-VL-8B-Instruct-NVFP4-AutoRound ` is ready to be loaded into vLLM.
85-
86- #### Evaluate Accuracy
87-
88- Run the following to test accuracy on GSM-8K:
89-
90- ``` bash
91- lm_eval --model vllm-vlm \
92- --model_args pretrained=" ./Qwen3-VL-8B-Instruct-NVFP4-AutoRound" ,add_bos_token=true \
93- --tasks gsm8k \
94- --num_fewshot 5 \
95- --batch_size ' auto'
96- ```
97-
98- ##### Qwen3-VL-8B-Instruct (Baseline)
99- | Tasks| Version| Filter | n-shot| Metric | | Value | | Stderr|
100- | -----| ------:| ----------------| -----:| -----------| ---| -----:| ---| -----:|
101- | gsm8k| 3| flexible-extract| 5| exact_match| ↑ | 0.8628| ± | 0.0095|
102- | | | strict-match | 5| exact_match| ↑ | 0.8453| ± | 0.0100|
103-
104-
105- ##### Qwen3-VL-8B-Instruct-NVFP4-AutoRound (AutoRoundModifier, iters=200)
106- | Tasks| Version| Filter | n-shot| Metric | | Value | | Stderr|
107- | -----| ------:| ----------------| -----:| -----------| ---| -----:| ---| -----:|
108- | gsm8k| 3| flexible-extract| 5| exact_match| ↑ | 0.8415| ± | 0.0101|
109- | | | strict-match | 5| exact_match| ↑ | 0.8408| ± | 0.0101|
110-
111- > Note: quantized model accuracy may vary slightly due to nondeterminism.
112-
11376### Questions or Feature Request?
11477
11578Please open up an issue on [ vllm-project/llm-compressor] ( https://github.com/vllm-project/llm-compressor ) or [ intel/auto-round] ( https://github.com/intel/auto-round ) .
0 commit comments