Skip to content

Commit e823e92

Browse files
committed
update
Signed-off-by: yiliu30 <[email protected]>
1 parent dbeed26 commit e823e92

File tree

1 file changed

+7
-9
lines changed

1 file changed

+7
-9
lines changed

_posts/2025-11-27-intel-autoround-llmc.md renamed to _posts/2025-12-03-intel-autoround-llmc.md

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
---
22
layout: post
3-
title: "Advancing Low‑Bit Quantization for LLMs: AutoRound x LLM Compressor [Draft]"
4-
author: "Intel Neural Compressor Team"
5-
image: /assets/figures/2025-vllm-on-intel-arc/perf-figure1.png
3+
title: "Advancing Low‑Bit Quantization for LLMs: AutoRound x LLM Compressor"
4+
author: "Intel Neural Compressor Team, Red Hat AI Model Optimization Team"
65
---
76

87

@@ -30,13 +29,13 @@ Core strengths:
3029

3130
AutoRound enables quantized models in a range of low‑bit formats that are designed to accelerate inference on **Intel® Xeon ® processors**, **Intel® Gaudi® AI accelerators**, **Intel® Data Center GPUs**, **Intel® Arc™ B‑Series Graphics**, as well as other GPUs (e.g., CUDA‑based devices).
3231

33-
Looking forward, as Intel’s next‑generation GPUs—**including Intel® Crescent Island**—add native support for **FP8, MXFP8, and MXFP4** formats, models optimized with AutoRound will naturally scale to take advantage of these data types across the Intel AI hardware portfolio. This creates a consistent path from algorithmic innovation to real‑world deployment.
32+
Looking forward, Intel is adding native support for FP8, MXFP8, and MXFP4 formats to its next-generation **Data Center GPUs, codenamed Crescent Island**. Models quantized with AutoRound will naturally scale to take advantage of these data types across the Intel AI hardware portfolio. This creates a consistent path from algorithmic innovation to real‑world deployment.
3433

3534
For more details, please refer to the paper [AutoRound (EMNLP 2024)](https://aclanthology.org/2024.findings-emnlp.662.pdf) and the GitHub repository [intel/auto-round](https://github.com/intel/auto-round).
3635

3736
## Why Integrate Into LLM Compressor?
3837

39-
**LLM** **Compressor** already provides a unified, modular system for compression primitives such as quantization, pruning, and distillation. Integrating AutoRound into this ecosystem:
38+
**LLM** **Compressor** already provides a unified, modular system for compression primitives such as quantization and pruning. Integrating AutoRound into this ecosystem:
4039

4140
- Aligns with the existing modifier architecture (e.g., `GPTQModifier`)
4241
- Reuses the sequential calibration and layer‑onloading infrastructure
@@ -90,8 +89,6 @@ recipe = AutoRoundModifier(
9089
scheme="W4A16",
9190
ignore=["lm_head"],
9291
iters=200,
93-
enable_torch_compile=False,
94-
batch_size=2,
9592
)
9693

9794
oneshot(
@@ -141,6 +138,7 @@ lm_eval --model vllm \
141138
|gsm8k| 3|flexible-extract| 5|exact_match||0.908|± |0.0091|
142139
| | |strict-match | 5|exact_match||0.907|± |0.0092|
143140
```
141+
Note: The results may fluctuate due to non-determinism.
144142

145143
## Conclusion & Future Plans
146144

@@ -152,7 +150,7 @@ If you’d like to influence which formats, models, and workflows we prioritize
152150

153151
### Acknowledgements
154152

155-
We’d like to thank the **vLLM / LLM Compressor** community for extensive early discussions on the proposal and for their thoughtful reviews of the pull requests.
153+
We wish to acknowledge the contributions of the LLM Compressor community. Specifically, we thank Kyle Sayers, Dipika Sikka, Brian Dellabetta, Charles Hernandez, and Robert Shaw for their invaluable feedback on the early proposal and their diligent review of the pull requests.
156154

157155
#### Related RFCs and PRs
158156

@@ -162,5 +160,5 @@ PRs:
162160

163161
- https://github.com/vllm-project/llm-compressor/pull/1994
164162
- https://github.com/vllm-project/llm-compressor/pull/2055
165-
- https://github.com/vllm-project/llm-compressor/pull/2062 (Under Review)
163+
- https://github.com/vllm-project/llm-compressor/pull/2062
166164
- https://github.com/vllm-project/vllm/pull/29484/ (Under Review)

0 commit comments

Comments
 (0)