Skip to content

Commit d1d5549

Browse files
authored
Update README with PEFT integration + installation (#2559)
Highlight our missing PEFT LoRA integration and add a section for installation (fixes #2483).
1 parent 9750e13 commit d1d5549

File tree

1 file changed

+28
-19
lines changed

1 file changed

+28
-19
lines changed

README.md

Lines changed: 28 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
[![](https://img.shields.io/badge/torchao-documentation-blue?color=DE3412)](https://docs.pytorch.org/ao/stable/index.html)
1818
[![license](https://img.shields.io/badge/license-BSD_3--Clause-lightgrey.svg)](./LICENSE)
1919

20-
[Latest News](#-latest-news) | [Overview](#-overview) | [Quick Start](#-quick-start) | [Integrations](#-integrations) | [Inference](#-inference) | [Training](#-training) | [Videos](#-videos) | [Citation](#-citation)
20+
[Latest News](#-latest-news) | [Overview](#-overview) | [Quick Start](#-quick-start) | [Installation](#-installation) | [Integrations](#-integrations) | [Inference](#-inference) | [Training](#-training) | [Videos](#-videos) | [Citation](#-citation)
2121

2222
</div>
2323

@@ -71,23 +71,6 @@ First, install TorchAO. We recommend installing the latest stable version:
7171
pip install torchao
7272
```
7373

74-
<details>
75-
<summary>Other installation options</summary>
76-
77-
```
78-
# Nightly
79-
pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cu126
80-
81-
# Different CUDA versions
82-
pip install torchao --index-url https://download.pytorch.org/whl/cu126 # CUDA 12.6
83-
pip install torchao --index-url https://download.pytorch.org/whl/cpu # CPU only
84-
85-
# For developers
86-
USE_CUDA=1 python setup.py develop
87-
```
88-
89-
</details>
90-
9174
Quantize your model weights to int4!
9275
```
9376
from torchao.quantization import Int4WeightOnlyConfig, quantize_
@@ -106,14 +89,40 @@ speedup: 6.9x
10689
For the full model setup and benchmark details, check out our [quick start guide](https://docs.pytorch.org/ao/stable/quick_start.html). Alternatively, try quantizing your favorite model using our [HuggingFace space](https://huggingface.co/spaces/pytorch/torchao-my-repo)!
10790

10891

92+
## 🛠 Installation
93+
94+
To install the latest stable version:
95+
```
96+
pip install torchao
97+
```
98+
99+
<details>
100+
<summary>Other installation options</summary>
101+
102+
```
103+
# Nightly
104+
pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cu126
105+
106+
# Different CUDA versions
107+
pip install torchao --index-url https://download.pytorch.org/whl/cu126 # CUDA 12.6
108+
pip install torchao --index-url https://download.pytorch.org/whl/cpu # CPU only
109+
110+
# For developers
111+
USE_CUDA=1 python setup.py develop
112+
USE_CPP=0 python setup.py develop
113+
```
114+
</details>
115+
116+
109117
## 🔗 Integrations
110118

111119
TorchAO is integrated into some of the leading open-source libraries including:
112120

113121
* HuggingFace transformers with a [builtin inference backend](https://huggingface.co/docs/transformers/main/quantization/torchao) and [low bit optimizers](https://github.com/huggingface/transformers/pull/31865)
114122
* HuggingFace diffusers best practices with `torch.compile` and TorchAO in a standalone repo [diffusers-torchao](https://github.com/huggingface/diffusers/blob/main/docs/source/en/quantization/torchao.md)
123+
* HuggingFace PEFT for LoRA using TorchAO as their [quantization backend](https://huggingface.co/docs/peft/en/developer_guides/quantization#torchao-pytorch-architecture-optimization)
115124
* Mobius HQQ backend leveraged our int4 kernels to get [195 tok/s on a 4090](https://github.com/mobiusml/hqq#faster-inference)
116-
* TorchTune for our [QLoRA](https://docs.pytorch.org/torchtune/main/tutorials/qlora_finetune.html), [QAT](https://docs.pytorch.org/torchtune/main/recipes/qat_distributed.html), and [float8 quantized fine-tuning](https://github.com/pytorch/torchtune/pull/2546) recipes
125+
* TorchTune for our NF4 [QLoRA](https://docs.pytorch.org/torchtune/main/tutorials/qlora_finetune.html), [QAT](https://docs.pytorch.org/torchtune/main/recipes/qat_distributed.html), and [float8 quantized fine-tuning](https://github.com/pytorch/torchtune/pull/2546) recipes
117126
* TorchTitan for [float8 pre-training](https://github.com/pytorch/torchtitan/blob/main/docs/float8.md)
118127
* VLLM for LLM serving: [usage](https://docs.vllm.ai/en/latest/features/quantization/torchao.html), [detailed docs](https://docs.pytorch.org/ao/main/torchao_vllm_integration.html)
119128
* SGLang for LLM serving: [usage](https://docs.sglang.ai/backend/server_arguments.html#server-arguments) and the major [PR](https://github.com/sgl-project/sglang/pull/1341).

0 commit comments

Comments
 (0)