Update README with PEFT integration + installation (#2559)

andrewor14 · web-flow · commit d1d5549a2020 · 2025-07-17T11:47:54.000-04:00
Highlight our missing PEFT LoRA integration and add a section for installation (fixes #2483).
diff --git a/README.md b/README.md
@@ -17,7 +17,7 @@
 [![](https://img.shields.io/badge/torchao-documentation-blue?color=DE3412)](https://docs.pytorch.org/ao/stable/index.html)
 [![license](https://img.shields.io/badge/license-BSD_3--Clause-lightgrey.svg)](./LICENSE)
 
-[Latest News](#-latest-news) | [Overview](#-overview) | [Quick Start](#-quick-start)  | [Integrations](#-integrations) | [Inference](#-inference) | [Training](#-training) | [Videos](#-videos) | [Citation](#-citation)
+[Latest News](#-latest-news) | [Overview](#-overview) | [Quick Start](#-quick-start)  | [Installation](#-installation) | [Integrations](#-integrations) | [Inference](#-inference) | [Training](#-training) | [Videos](#-videos) | [Citation](#-citation)
 
 </div>
 
@@ -71,23 +71,6 @@ First, install TorchAO. We recommend installing the latest stable version:
 pip install torchao
 ```
 
-<details>
-  <summary>Other installation options</summary>
-
-  ```
-  # Nightly
-  pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cu126
-  
-  # Different CUDA versions
-  pip install torchao --index-url https://download.pytorch.org/whl/cu126  # CUDA 12.6
-  pip install torchao --index-url https://download.pytorch.org/whl/cpu    # CPU only
-
-  # For developers
-  USE_CUDA=1 python setup.py develop
-  ```
-
-</details>
-
 Quantize your model weights to int4!
 ```
 from torchao.quantization import Int4WeightOnlyConfig, quantize_
@@ -106,14 +89,40 @@ speedup: 6.9x
 For the full model setup and benchmark details, check out our [quick start guide](https://docs.pytorch.org/ao/stable/quick_start.html). Alternatively, try quantizing your favorite model using our [HuggingFace space](https://huggingface.co/spaces/pytorch/torchao-my-repo)!
 
 
+## 🛠 Installation
+
+To install the latest stable version:
+```
+pip install torchao
+```
+
+<details>
+  <summary>Other installation options</summary>
+
+  ```
+  # Nightly
+  pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cu126
+  
+  # Different CUDA versions
+  pip install torchao --index-url https://download.pytorch.org/whl/cu126  # CUDA 12.6
+  pip install torchao --index-url https://download.pytorch.org/whl/cpu    # CPU only
+
+  # For developers
+  USE_CUDA=1 python setup.py develop
+  USE_CPP=0 python setup.py develop
+  ```
+</details>
+
+
 ## 🔗 Integrations
 
 TorchAO is integrated into some of the leading open-source libraries including:
 
 * HuggingFace transformers with a [builtin inference backend](https://huggingface.co/docs/transformers/main/quantization/torchao) and [low bit optimizers](https://github.com/huggingface/transformers/pull/31865)
 * HuggingFace diffusers best practices with `torch.compile` and TorchAO in a standalone repo [diffusers-torchao](https://github.com/huggingface/diffusers/blob/main/docs/source/en/quantization/torchao.md)
+* HuggingFace PEFT for LoRA using TorchAO as their [quantization backend](https://huggingface.co/docs/peft/en/developer_guides/quantization#torchao-pytorch-architecture-optimization)
 * Mobius HQQ backend leveraged our int4 kernels to get [195 tok/s on a 4090](https://github.com/mobiusml/hqq#faster-inference)
-* TorchTune for our [QLoRA](https://docs.pytorch.org/torchtune/main/tutorials/qlora_finetune.html), [QAT](https://docs.pytorch.org/torchtune/main/recipes/qat_distributed.html), and [float8 quantized fine-tuning](https://github.com/pytorch/torchtune/pull/2546) recipes
+* TorchTune for our NF4 [QLoRA](https://docs.pytorch.org/torchtune/main/tutorials/qlora_finetune.html), [QAT](https://docs.pytorch.org/torchtune/main/recipes/qat_distributed.html), and [float8 quantized fine-tuning](https://github.com/pytorch/torchtune/pull/2546) recipes
 * TorchTitan for [float8 pre-training](https://github.com/pytorch/torchtitan/blob/main/docs/float8.md)
 * VLLM for LLM serving: [usage](https://docs.vllm.ai/en/latest/features/quantization/torchao.html), [detailed docs](https://docs.pytorch.org/ao/main/torchao_vllm_integration.html)
 * SGLang for LLM serving: [usage](https://docs.sglang.ai/backend/server_arguments.html#server-arguments) and the major [PR](https://github.com/sgl-project/sglang/pull/1341).