You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: "Advancing Low‑Bit Quantization for LLMs: AutoRound x LLM Compressor"
4
+
author: "Intel Neural Compressor Team, Red Hat AI Model Optimization Team"
6
5
---
7
6
8
7
@@ -30,13 +29,13 @@ Core strengths:
30
29
31
30
AutoRound enables quantized models in a range of low‑bit formats that are designed to accelerate inference on **Intel® Xeon ® processors**, **Intel® Gaudi® AI accelerators**, **Intel® Data Center GPUs**, **Intel® Arc™ B‑Series Graphics**, as well as other GPUs (e.g., CUDA‑based devices).
32
31
33
-
Looking forward, as Intel’s next‑generation GPUs—**including Intel® Crescent Island**—add native support for **FP8, MXFP8, and MXFP4** formats, models optimized with AutoRound will naturally scale to take advantage of these data types across the Intel AI hardware portfolio. This creates a consistent path from algorithmic innovation to real‑world deployment.
32
+
Looking forward, Intel is adding native support for FP8, MXFP8, and MXFP4 formats to its next-generation **Data Center GPUs, codenamed Crescent Island**. Models quantized with AutoRound will naturally scale to take advantage of these data types across the Intel AI hardware portfolio. This creates a consistent path from algorithmic innovation to real‑world deployment.
34
33
35
34
For more details, please refer to the paper [AutoRound (EMNLP 2024)](https://aclanthology.org/2024.findings-emnlp.662.pdf) and the GitHub repository [intel/auto-round](https://github.com/intel/auto-round).
36
35
37
36
## Why Integrate Into LLM Compressor?
38
37
39
-
**LLM****Compressor** already provides a unified, modular system for compression primitives such as quantization, pruning, and distillation. Integrating AutoRound into this ecosystem:
38
+
**LLM****Compressor** already provides a unified, modular system for compression primitives such as quantizationand pruning. Integrating AutoRound into this ecosystem:
40
39
41
40
- Aligns with the existing modifier architecture (e.g., `GPTQModifier`)
42
41
- Reuses the sequential calibration and layer‑onloading infrastructure
Note: The results may fluctuate due to non-determinism.
144
142
145
143
## Conclusion & Future Plans
146
144
@@ -152,7 +150,7 @@ If you’d like to influence which formats, models, and workflows we prioritize
152
150
153
151
### Acknowledgements
154
152
155
-
We’d like to thank the **vLLM / LLM Compressor** communityfor extensive early discussions on the proposal and for their thoughtful reviews of the pull requests.
153
+
We wish to acknowledge the contributions of the LLM Compressor community. Specifically, we thank Kyle Sayers, Dipika Sikka, Brian Dellabetta, Charles Hernandez, and Robert Shaw for their invaluable feedback on the early proposal and their diligent review of the pull requests.
0 commit comments