Skip to content

Commit 9534d2a

Browse files
authored
improve the robustness of scheme (#803)
1 parent 78fce45 commit 9534d2a

File tree

2 files changed

+10
-13
lines changed

2 files changed

+10
-13
lines changed

README.md

Lines changed: 7 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,7 @@ and [fbaldassarri](https://huggingface.co/fbaldassarri). For usage instructions,
3838
all bits other than 3 bits. Example
3939
models: [Intel/Qwen3-235B-A22B-q2ks-mixed-AutoRound](https://huggingface.co/Intel/Qwen3-235B-A22B-q2ks-mixed-AutoRound)
4040
and [Intel/DeepSeek-R1-0528-q2ks-mixed-AutoRound](https://huggingface.co/Intel/DeepSeek-R1-0528-q2ks-mixed-AutoRound). **A more advanced algorithm** tailored for specific configurations may be available in
41-
v0.6.2.
42-
43-
[2025/05] AutoRound provides some recipes for **DeepSeek-R1-0528**, please refer
44-
to [OPEA/DeepSeek-R1-0528-int2-mixed-AutoRound](https://huggingface.co/OPEA/DeepSeek-R1-0528-int2-mixed-AutoRound) and [OPEA/DeepSeek-R1-0528-int4-AutoRound](https://huggingface.co/OPEA/DeepSeek-R1-0528-int4-AutoRound) for
45-
more details.
41+
v0.7.1.
4642

4743
[2025/05] AutoRound has been integrated into **vLLM**. You can now run models in the AutoRound format directly with
4844
vLLM versions later than v0.85.post1.
@@ -121,24 +117,24 @@ Please change to `auto-round-mllm` for visual-language models (VLMs) quantizatio
121117
auto-round \
122118
--model Qwen/Qwen3-0.6B \
123119
--scheme "W4A16" \
124-
--format "auto_gptq,auto_awq,auto_round" \
120+
--format "auto_round" \
125121
--output_dir ./tmp_autoround
126122
```
127123

128-
We offer another two configurations, `auto-round-best` and `auto-round-light`, designed for optimal accuracy and improved speed, respectively. Details are as follows.
124+
We offer another two recipes, `auto-round-best` and `auto-round-light`, designed for optimal accuracy and improved speed, respectively. Details are as follows.
129125
<details>
130126
<summary>Other Recipes</summary>
131127

132128
```bash
133-
## best accuracy, 3X slower, low_gpu_mem_usage could save ~20G but ~30% slower
129+
# Best accuracy, 3X slower, low_gpu_mem_usage could save ~20G but ~30% slower
134130
auto-round-best \
135131
--model Qwen/Qwen3-0.6B \
136132
--scheme "W4A16" \
137133
--low_gpu_mem_usage
138134
```
139135

140136
```bash
141-
## light accuracy, 2-3X speedup, slight accuracy drop at W4 and larger accuracy drop at W2
137+
# 2-3X speedup, slight accuracy drop at W4 and larger accuracy drop at W2
142138
auto-round-light \
143139
--model Qwen/Qwen3-0.6B \
144140
--scheme "W4A16"
@@ -147,7 +143,7 @@ auto-round-light \
147143

148144
<!-- ```bash
149145
auto-round-fast \
150-
## fast and low memory, 2-3X speedup, slight accuracy drop at W4G128
146+
# Fast and low memory, 2-3X speedup, slight accuracy drop at W4G128
151147
--model Qwen/Qwen3-0.6B \
152148
--bits 4 \
153149
--group_size 128 \
@@ -176,10 +172,8 @@ ar = AutoRound(model_name_or_path, scheme="W4A16")
176172
# Faster quantization (2–3× speedup) with slight accuracy drop at W4G128.
177173
# ar = AutoRound(model_name_or_path, nsamples=128, iters=50, lr=5e-3)
178174

179-
# Save quantized model
180-
output_dir = "./tmp_autoround"
181175
# Supported formats: "auto_round" (default), "auto_gptq", "auto_awq", "llm_compressor", "gguf:q4_k_m", etc.
182-
ar.quantize_and_save(output_dir, format="auto_round")
176+
ar.quantize_and_save(output_dir="./tmp_autoround", format="auto_round")
183177
```
184178

185179
<details>

auto_round/schemes.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@ def is_preset_scheme(name: str) -> bool:
101101
"data_type": "mx_fp",
102102
"act_bits": 4,
103103
"act_data_type": "mx_fp_rceil",
104+
"act_group_size": 32,
104105
}
105106
)
106107

@@ -111,6 +112,7 @@ def is_preset_scheme(name: str) -> bool:
111112
"data_type": "mx_fp",
112113
"act_bits": 8,
113114
"act_data_type": "mx_fp_rceil",
115+
"act_group_size": 32,
114116
}
115117
)
116118

@@ -121,6 +123,7 @@ def is_preset_scheme(name: str) -> bool:
121123
"data_type": "nv_fp",
122124
"act_bits": 4,
123125
"act_data_type": "nv_fp4_with_static_gs",
126+
"act_group_size": 16,
124127
}
125128
)
126129

0 commit comments

Comments
 (0)