Skip to content

Commit ec298db

Browse files
aireillydsikka
andauthored
v0.8.0 New in this release (#1892)
SUMMARY: Added What's new to the docs front page ~and a release notes draft~. v0.8.0 release notes are here: https://gist.github.com/aireilly/7866a8f71f99e7005a8d809c136c5984 Signed-off-by: Aidan Reilly <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>
1 parent cb8d775 commit ec298db

File tree

1 file changed

+23
-6
lines changed

1 file changed

+23
-6
lines changed

docs/index.md

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,29 @@
1313
<img alt="LLM Compressor Flow" src="assets/llmcompressor-user-flows.png" width="100%" style="max-width: 100%;"/>
1414
</p>
1515

16+
## New in this release
17+
18+
Review the [LLM Compressor v0.8.0 release notes](https://github.com/vllm-project/llm-compressor/releases/tag/0.8.0) for details about new features. Highlights include:
19+
20+
!!! info "Support for multiple modifiers in oneshot compression runs"
21+
LLM Compressor now supports using multiple modifiers in oneshot compression runs such as applying both AWQ and GPTQ in a single model.
22+
23+
Using multiple modifiers is an advanced usage of LLM Compressor and an active area of research. See [Non-uniform Quantization](examples/quantization_non_uniform/) for more detail and example usage.
24+
25+
!!! info "Quantization and calibration support for Qwen3 models"
26+
Quantization and calibration support for Qwen3 Next models has been added to LLM Compressor.
27+
28+
LLM Compressor now supports quantization for Qwen3 Next and Qwen3 VL MoE models. You can now use data-free pathways such as FP8 channel-wise and block-wise quantization. Pathways requiring data such W4A16 and NVFP4 are planned for a future release.
29+
30+
Examples for NVFP4 and FP8 quantization have been added for the Qwen3-Next-80B-A3B-Instruct model.
31+
32+
For the Qwen3 VL MoE model, support has been added for the data-free pathway. The data-free pathway applies FP8 quantization, for example, channel-wise and block-wise quantization.
33+
34+
**NOTE**: These models are not supported in tranformers<=4.56.2. You may need to install transformers from source.
35+
36+
!!! info "Transforms support for non-full-size rotation sizes"
37+
You can now set a `transform_block_size` field in the Transform-based modifier classes `SpinQuantModifier` and `QuIPModifier`. You can configure transforms of variable size with this field, and you don't need to restrict hadamards to match the size of the weight.
38+
1639
## Recent Updates
1740

1841
!!! info "QuIP and SpinQuant-style Transforms"
@@ -27,12 +50,6 @@
2750
!!! info "Llama4 Quantization Support"
2851
Quantize a Llama4 model to [W4A16](examples/quantization_w4a16.md) or [NVFP4](examples/quantization_w4a16.md). The checkpoint produced can seamlessly run in vLLM.
2952

30-
!!! info "Large Model Support with Sequential Onloading"
31-
As of llm-compressor>=0.6.0, you can now quantize very large language models on a single GPU. Models are broken into disjoint layers which are then onloaded to the GPU one layer at a time. For more information on sequential onloading, see [Big Modeling with Sequential Onloading](examples/big_models_with_sequential_onloading.md) as well as the [DeepSeek-R1 Example](examples/quantizing_moe.md).
32-
33-
!!! info "Axolotl Sparse Finetuning Integration"
34-
Seamlessly finetune sparse LLMs with our Axolotl integration. Learn how to create [fast sparse open-source models with Axolotl and LLM Compressor](https://developers.redhat.com/articles/2025/06/17/axolotl-meets-llm-compressor-fast-sparse-open). See also the [Axolotl integration docs](https://docs.axolotl.ai/docs/custom_integrations.html#llmcompressor).
35-
3653
For more information, check out the [latest release on GitHub](https://github.com/vllm-project/llm-compressor/releases/latest).
3754

3855
## Key Features

0 commit comments

Comments
 (0)