Add new-features section (#1408)

rahul-tuli · web-flow · commit 7bc1881e79ce · 2025-05-02T13:45:39.000-04:00
This PR updates the main README.md to introduce a "New Features" section, improving visibility for recent major additions to LLM Compressor. This section highlights: - Axolotl Sparse Finetuning Integration (https://docs.axolotl.ai/docs/custom_integrations.html#llmcompressor) - AutoAWQ Integration for low-bit weight quantization (#1177) - Day 0 Llama 4 support and its use by Meta This helps users quickly understand the latest capabilities of the library. --------- Signed-off-by: Rahul Tuli <rtuli@redhat.com>
diff --git a/README.md b/README.md
@@ -12,6 +12,14 @@
    <img alt="LLM Compressor Flow" src="https://github.com/user-attachments/assets/adf07594-6487-48ae-af62-d9555046d51b" width="80%" />
 </p>
 
+## 🚀 What's New!
+
+Big updates have landed in LLM Compressor! Check out these exciting new features:
+
+* **Axolotl Sparse Finetuning Integration:** Easily finetune sparse LLMs through our seamless integration with Axolotl. [Learn more here](https://docs.axolotl.ai/docs/custom_integrations.html#llmcompressor).
+* **AutoAWQ Integration:** Perform low-bit weight-only quantization efficiently using AutoAWQ, now part of LLM Compressor. *Note: This integration should be considered experimental for now. Enhanced support, including for MoE models and improved handling of larger models via layer sequential pipelining, is planned for upcoming releases.* [See the details](https://github.com/vllm-project/llm-compressor/pull/1177).
+* **Day 0 Llama 4 Support:** Meta utilized LLM Compressor to create the [FP8-quantized Llama-4-Maverick-17B-128E](https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8), optimized for vLLM inference using [compressed-tensors](https://github.com/neuralmagic/compressed-tensors) format.
+
 ### Supported Formats
 * Activation Quantization: W8A8 (int8 and fp8)
 * Mixed Precision: W4A16, W8A16