Skip to content

Commit 2102257

Browse files
authored
refine readme (#1063)
1 parent 6d40e20 commit 2102257

File tree

2 files changed

+10
-10
lines changed

2 files changed

+10
-10
lines changed

README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@
2020

2121
## 🚀 What is AutoRound?
2222

23-
AutoRound is an advanced quantization library designed for Large Language Models (LLMs) and Vision-Language Models (VLMs).
24-
It delivers high accuracy at ultra-low bit widths (2–4 bits) with minimal tuning by leveraging sign-gradient descent and offering broad hardware compatibility.
25-
See our [paper](https://arxiv.org/pdf/2309.05516) for more details. For usage instructions, please refer to [User Guide](./docs/step_by_step.md).
23+
AutoRound is an advanced quantization toolkit designed for Large Language Models (LLMs) and Vision-Language Models (VLMs).
24+
It achieves high accuracy at ultra-low bit widths (2–4 bits) with minimal tuning by leveraging sign-gradient descent and providing broad hardware compatibility.
25+
See our [paper](https://arxiv.org/pdf/2309.05516) for more details. For usage instructions, please refer to the [User Guide](./docs/step_by_step.md).
2626

2727
<p align="center">
2828
<img src="docs/imgs/autoround_overview.png" alt="AutoRound Overview" width="80%">
@@ -34,7 +34,7 @@ See our [paper](https://arxiv.org/pdf/2309.05516) for more details. For usage in
3434

3535
[2025/11] AutoRound now offers preliminary support for an enhanced GGUF quantization algorithm via `--enable_alg_ext`. For detailed accuracy benchmarks, please refer to the [documentation](./docs/gguf_alg_ext_acc.md).
3636

37-
[2025/10] AutoRound has been integrated into **SGLang**. You can now run models in the AutoRound format directly using the latest SGLang later than v0.5.4.
37+
[2025/10] AutoRound has been integrated into **SGLang**. You can now run models in the AutoRound format directly using the SGLang versions newer than v0.5.4.
3838

3939
[2025/10] We enhanced the RTN mode (--iters 0) to significantly reduce quantization cost compared to the default tuning mode. Check out [this doc](./docs/opt_rtn.md) for some accuracy results. If you don’t have sufficient resources, you can use this mode for 4-bit quantization.
4040

@@ -212,13 +212,13 @@ ar.quantize_and_save(output_dir="./qmodel", format="auto_round")
212212
##### Device/Speed Configuration
213213
- **`enable_torch_compile` (bool)**: If no exception is raised, typically we recommend setting it to True for faster quantization with lower resource.
214214
- **`low_gpu_mem_usage` (bool)**: Whether to offload intermediate features to CPU at the cost of ~20% more tuning time (default is `False`).
215-
- **`low_cpu_mem_usage` (bool)**: [Experimental Feature]Whether to enable saving immediately to save ram usage (default is `False`).
215+
- **`low_cpu_mem_usage` (bool)**: [Experimental Feature]Whether to enable saving immediately to reduce ram usage (default is `False`).
216216
- **`device_map` (str|dict|int)**: The device to be used for tuning, e.g., `auto`, "cpu"`, `"cuda"`, `"0,1,2"` (default is `'0'`). When using "auto", it will try to use all available GPUs.
217217

218218
</details>
219219

220220
### Adaptive Bits/Dtype Usage
221-
AutoScheme provide automatically algorithm to provide mixed bits/data_type quantization recipes. For some accuracy result, please refer to this [doc](https://github.com/intel/auto-round/blob/main/docs/auto_scheme_acc.md).
221+
AutoScheme provides an automatic algorithm to generate adaptive mixed bits/data-type quantization recipes.
222222
Please refer to the [user guide](https://github.com/intel/auto-round/blob/main/docs/step_by_step.md#autoscheme) for more details on AutoScheme.
223223
~~~python
224224
from auto_round import AutoRound, AutoScheme
@@ -249,7 +249,7 @@ ar.quantize_and_save()
249249

250250
### API Usage for VLMs
251251

252-
If you encounter issues during quantization, try setting iters=0 (to enable RTN) and use group_size=32 for better
252+
If you encounter issues during quantization, try setting iters=0 (to enable RTN) and group_size=32 for better
253253
results.
254254

255255

@@ -320,7 +320,7 @@ for prompt, output in zip(prompts, outputs):
320320
### Transformers (CPU/Intel GPU/Gaudi/CUDA)
321321

322322

323-
AutoRound support 10+ backends and automatically selects the best available backend based on the installed libraries and prompts the user to
323+
AutoRound supports 10+ backends and automatically selects the best available backend based on the installed libraries and prompts the user to
324324
install additional libraries when a better backend is found.
325325

326326
**Please avoid manually moving the quantized model to a different device** (e.g., model.to('cpu')) during inference, as

docs/step_by_step.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -284,7 +284,7 @@ W2G64 Average Accuracy of 13 tasks and Time Cost Results(Testing was conducted o
284284

285285
### AutoScheme
286286

287-
AutoScheme provide automatically algorithm to provide mixed bits/data_type quantization recipes. For some accuracy result, please refer this doc [here](./auto_scheme_acc.md)
287+
AutoScheme provides an automatic algorithm to generate adaptive mixed bits/data-type quantization recipes. For some accuracy result, please refer this doc [here](./auto_scheme_acc.md)
288288

289289
**Please note that mixed data types are supported during tuning, but cannot be exported to real models at this time..**
290290

@@ -559,7 +559,7 @@ autoround.save_quantized(format="auto_awq", output_dir="tmp_autoround")
559559
560560
561561
- **Reduced CPU Memory Usage :**
562-
- Enable low_cpu_mem_usage (experimental): Only one export format is supported. The quantized model is saved immediately after each block is packed, reducing peak CPU memory usage.
562+
- Enable `low_cpu_mem_usage` (experimental): Only one export format is supported. The quantized model is saved immediately after each block is packed, reducing peak CPU memory usage.
563563
564564
- Trigger immediate packing: Packing will be triggered immediately when using the command-line interface or the
565565
quantize_and_save API, as long as only one export format is specified.

0 commit comments

Comments
 (0)