You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
and [Intel/DeepSeek-R1-0528-q2ks-mixed-AutoRound](https://huggingface.co/Intel/DeepSeek-R1-0528-q2ks-mixed-AutoRound). **A more advanced algorithm** tailored for specific configurations may be available in
41
-
v0.6.2.
42
-
43
-
[2025/05] AutoRound provides some recipes for **DeepSeek-R1-0528**, please refer
44
-
to [OPEA/DeepSeek-R1-0528-int2-mixed-AutoRound](https://huggingface.co/OPEA/DeepSeek-R1-0528-int2-mixed-AutoRound) and [OPEA/DeepSeek-R1-0528-int4-AutoRound](https://huggingface.co/OPEA/DeepSeek-R1-0528-int4-AutoRound) for
45
-
more details.
41
+
v0.7.1.
46
42
47
43
[2025/05] AutoRound has been integrated into **vLLM**. You can now run models in the AutoRound format directly with
48
44
vLLM versions later than v0.85.post1.
@@ -121,24 +117,24 @@ Please change to `auto-round-mllm` for visual-language models (VLMs) quantizatio
121
117
auto-round \
122
118
--model Qwen/Qwen3-0.6B \
123
119
--scheme "W4A16" \
124
-
--format "auto_gptq,auto_awq,auto_round" \
120
+
--format "auto_round" \
125
121
--output_dir ./tmp_autoround
126
122
```
127
123
128
-
We offer another two configurations, `auto-round-best` and `auto-round-light`, designed for optimal accuracy and improved speed, respectively. Details are as follows.
124
+
We offer another two recipes, `auto-round-best` and `auto-round-light`, designed for optimal accuracy and improved speed, respectively. Details are as follows.
129
125
<details>
130
126
<summary>Other Recipes</summary>
131
127
132
128
```bash
133
-
## best accuracy, 3X slower, low_gpu_mem_usage could save ~20G but ~30% slower
129
+
# Best accuracy, 3X slower, low_gpu_mem_usage could save ~20G but ~30% slower
134
130
auto-round-best \
135
131
--model Qwen/Qwen3-0.6B \
136
132
--scheme "W4A16" \
137
133
--low_gpu_mem_usage
138
134
```
139
135
140
136
```bash
141
-
## light accuracy, 2-3X speedup, slight accuracy drop at W4 and larger accuracy drop at W2
137
+
# 2-3X speedup, slight accuracy drop at W4 and larger accuracy drop at W2
142
138
auto-round-light \
143
139
--model Qwen/Qwen3-0.6B \
144
140
--scheme "W4A16"
@@ -147,7 +143,7 @@ auto-round-light \
147
143
148
144
<!-- ```bash
149
145
auto-round-fast \
150
-
## fast and low memory, 2-3X speedup, slight accuracy drop at W4G128
146
+
# Fast and low memory, 2-3X speedup, slight accuracy drop at W4G128
151
147
--model Qwen/Qwen3-0.6B \
152
148
--bits 4 \
153
149
--group_size 128 \
@@ -176,10 +172,8 @@ ar = AutoRound(model_name_or_path, scheme="W4A16")
176
172
# Faster quantization (2–3× speedup) with slight accuracy drop at W4G128.
177
173
# ar = AutoRound(model_name_or_path, nsamples=128, iters=50, lr=5e-3)
178
174
179
-
# Save quantized model
180
-
output_dir ="./tmp_autoround"
181
175
# Supported formats: "auto_round" (default), "auto_gptq", "auto_awq", "llm_compressor", "gguf:q4_k_m", etc.
0 commit comments