You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,9 +20,9 @@
20
20
21
21
## 🚀 What is AutoRound?
22
22
23
-
AutoRound is an advanced quantization library designed for Large Language Models (LLMs) and Vision-Language Models (VLMs).
24
-
It delivers high accuracy at ultra-low bit widths (2–4 bits) with minimal tuning by leveraging sign-gradient descent and offering broad hardware compatibility.
25
-
See our [paper](https://arxiv.org/pdf/2309.05516) for more details. For usage instructions, please refer to [User Guide](./docs/step_by_step.md).
23
+
AutoRound is an advanced quantization toolkit designed for Large Language Models (LLMs) and Vision-Language Models (VLMs).
24
+
It achieves high accuracy at ultra-low bit widths (2–4 bits) with minimal tuning by leveraging sign-gradient descent and providing broad hardware compatibility.
25
+
See our [paper](https://arxiv.org/pdf/2309.05516) for more details. For usage instructions, please refer to the[User Guide](./docs/step_by_step.md).
@@ -34,7 +34,7 @@ See our [paper](https://arxiv.org/pdf/2309.05516) for more details. For usage in
34
34
35
35
[2025/11] AutoRound now offers preliminary support for an enhanced GGUF quantization algorithm via `--enable_alg_ext`. For detailed accuracy benchmarks, please refer to the [documentation](./docs/gguf_alg_ext_acc.md).
36
36
37
-
[2025/10] AutoRound has been integrated into **SGLang**. You can now run models in the AutoRound format directly using the latest SGLang later than v0.5.4.
37
+
[2025/10] AutoRound has been integrated into **SGLang**. You can now run models in the AutoRound format directly using the SGLang versions newer than v0.5.4.
38
38
39
39
[2025/10] We enhanced the RTN mode (--iters 0) to significantly reduce quantization cost compared to the default tuning mode. Check out [this doc](./docs/opt_rtn.md) for some accuracy results. If you don’t have sufficient resources, you can use this mode for 4-bit quantization.
-**`enable_torch_compile` (bool)**: If no exception is raised, typically we recommend setting it to True for faster quantization with lower resource.
214
214
-**`low_gpu_mem_usage` (bool)**: Whether to offload intermediate features to CPU at the cost of ~20% more tuning time (default is `False`).
215
-
-**`low_cpu_mem_usage` (bool)**: [Experimental Feature]Whether to enable saving immediately to save ram usage (default is `False`).
215
+
-**`low_cpu_mem_usage` (bool)**: [Experimental Feature]Whether to enable saving immediately to reduce ram usage (default is `False`).
216
216
-**`device_map` (str|dict|int)**: The device to be used for tuning, e.g., `auto`, "cpu"`, `"cuda"`, `"0,1,2"` (default is `'0'`). When using "auto", it will try to use all available GPUs.
217
217
218
218
</details>
219
219
220
220
### Adaptive Bits/Dtype Usage
221
-
AutoScheme provide automatically algorithm to provide mixed bits/data_type quantization recipes. For some accuracy result, please refer to this [doc](https://github.com/intel/auto-round/blob/main/docs/auto_scheme_acc.md).
221
+
AutoScheme provides an automatic algorithm to generate adaptive mixed bits/data-type quantization recipes.
222
222
Please refer to the [user guide](https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme) for more details on AutoScheme.
223
223
~~~python
224
224
from auto_round import AutoRound, AutoScheme
@@ -249,7 +249,7 @@ ar.quantize_and_save()
249
249
250
250
### API Usage for VLMs
251
251
252
-
If you encounter issues during quantization, try setting iters=0 (to enable RTN) and use group_size=32 for better
252
+
If you encounter issues during quantization, try setting iters=0 (to enable RTN) and group_size=32 for better
253
253
results.
254
254
255
255
@@ -320,7 +320,7 @@ for prompt, output in zip(prompts, outputs):
320
320
### Transformers (CPU/Intel GPU/Gaudi/CUDA)
321
321
322
322
323
-
AutoRound support 10+ backends and automatically selects the best available backend based on the installed libraries and prompts the user to
323
+
AutoRound supports 10+ backends and automatically selects the best available backend based on the installed libraries and prompts the user to
324
324
install additional libraries when a better backend is found.
325
325
326
326
**Please avoid manually moving the quantized model to a different device** (e.g., model.to('cpu')) during inference, as
Copy file name to clipboardExpand all lines: docs/step_by_step.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -284,7 +284,7 @@ W2G64 Average Accuracy of 13 tasks and Time Cost Results(Testing was conducted o
284
284
285
285
### AutoScheme
286
286
287
-
AutoScheme provide automatically algorithm to provide mixed bits/data_type quantization recipes. For some accuracy result, please refer this doc [here](./auto_scheme_acc.md)
287
+
AutoScheme provides an automatic algorithm to generate adaptive mixed bits/data-type quantization recipes. For some accuracy result, please refer this doc [here](./auto_scheme_acc.md)
288
288
289
289
**Please note that mixed data types are supported during tuning, but cannot be exported to real models at this time..**
- Enable low_cpu_mem_usage (experimental): Only one export formatis supported. The quantized model is saved immediately after each block is packed, reducing peak CPU memory usage.
562
+
- Enable `low_cpu_mem_usage` (experimental): Only one export formatis supported. The quantized model is saved immediately after each block is packed, reducing peak CPU memory usage.
563
563
564
564
- Trigger immediate packing: Packing will be triggered immediately when using the command-line interface or the
565
565
quantize_and_save API, aslongas only one export formatis specified.
0 commit comments