Skip to content

Commit 22fc354

Browse files
authored
Revert "add qwen3 vl autoround example (#2334)" (#2351)
This reverts commit 7b36671.
1 parent 7b36671 commit 22fc354

File tree

2 files changed

+6
-103
lines changed

2 files changed

+6
-103
lines changed

examples/autoround/quantization_w4a4_fp4/README.md

100755100644
Lines changed: 6 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -16,17 +16,15 @@ pip install -e .
1616

1717
## Quickstart
1818

19-
The example includes end-to-end scripts for applying the AutoRound quantization algorithm.
20-
21-
### Llama 3.1 Example
19+
The example includes an end-to-end script for applying the AutoRound quantization algorithm.
2220

2321
```bash
2422
python3 llama3.1_example.py
2523
```
2624

2725
The resulting model `Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound` is ready to be loaded into vLLM.
2826

29-
#### Evaluate Accuracy
27+
### Evaluate Accuracy
3028

3129
With the model created, we can now load and run in vLLM (after installing).
3230

@@ -48,68 +46,33 @@ lm_eval --model vllm \
4846
--batch_size 'auto'
4947
```
5048

51-
##### meta-llama/Meta-Llama-3.1-8B-Instruct
49+
#### meta-llama/Meta-Llama-3.1-8B-Instruct
5250
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
5351
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
5452
|gsm8k| 3|flexible-extract| 5|exact_match||0.7710|± |0.0116|
5553
| | |strict-match | 5|exact_match||0.7043|± |0.0126|
5654

57-
##### Meta-Llama-3.1-8B-Instruct-NVFP4 (QuantizationModifier)
55+
#### Meta-Llama-3.1-8B-Instruct-NVFP4 (QuantizationModifier)
5856
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
5957
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
6058
|gsm8k| 3|flexible-extract| 5|exact_match||0.7248|± |0.0123|
6159
| | |strict-match | 5|exact_match||0.6611|± |0.0130|
6260

6361

64-
##### Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound (AutoRoundModifier, iters=0)
62+
#### Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound (AutoRoundModifier, iters=0)
6563
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
6664
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
6765
|gsm8k| 3|flexible-extract| 5|exact_match||0.7362|± |0.0121|
6866
| | |strict-match | 5|exact_match||0.6702|± |0.0129|
6967

70-
##### Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound (AutoRoundModifier, iters=200)
68+
#### Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound (AutoRoundModifier, iters=200)
7169
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
7270
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
7371
|gsm8k| 3|flexible-extract| 5|exact_match||0.7210|± |0.0124|
7472
| | |strict-match | 5|exact_match||0.6945|± |0.0127|
7573

7674
> Note: quantized model accuracy may vary slightly due to nondeterminism.
7775
78-
### Qwen3-VL Example
79-
80-
```bash
81-
python3 qwen3_vl_example.py
82-
```
83-
84-
The resulting model `Qwen3-VL-8B-Instruct-NVFP4-AutoRound` is ready to be loaded into vLLM.
85-
86-
#### Evaluate Accuracy
87-
88-
Run the following to test accuracy on GSM-8K:
89-
90-
```bash
91-
lm_eval --model vllm-vlm \
92-
--model_args pretrained="./Qwen3-VL-8B-Instruct-NVFP4-AutoRound",add_bos_token=true \
93-
--tasks gsm8k \
94-
--num_fewshot 5 \
95-
--batch_size 'auto'
96-
```
97-
98-
##### Qwen3-VL-8B-Instruct (Baseline)
99-
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
100-
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
101-
|gsm8k| 3|flexible-extract| 5|exact_match||0.8628|± |0.0095|
102-
| | |strict-match | 5|exact_match||0.8453|± |0.0100|
103-
104-
105-
##### Qwen3-VL-8B-Instruct-NVFP4-AutoRound (AutoRoundModifier, iters=200)
106-
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
107-
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
108-
|gsm8k| 3|flexible-extract| 5|exact_match||0.8415|± |0.0101|
109-
| | |strict-match | 5|exact_match||0.8408|± |0.0101|
110-
111-
> Note: quantized model accuracy may vary slightly due to nondeterminism.
112-
11376
### Questions or Feature Request?
11477

11578
Please open up an issue on [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor) or [intel/auto-round](https://github.com/intel/auto-round).

examples/autoround/quantization_w4a4_fp4/qwen3_vl_example.py

Lines changed: 0 additions & 60 deletions
This file was deleted.

0 commit comments

Comments
 (0)