Skip to content

Commit fd3cba4

Browse files
ariG23498stevhliu
andauthored
Apply suggestions from code review
Co-authored-by: Steven Liu <[email protected]>
1 parent 4a4d56f commit fd3cba4

File tree

1 file changed

+27
-23
lines changed

1 file changed

+27
-23
lines changed

docs/source/en/quantization/bitsandbytes.md

Lines changed: 27 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ fp16. This reduces the degradative effect outlier values have on a model's perfo
2121
4-bit quantization compresses a model even further, and it is commonly used with
2222
[QLoRA](https://hf.co/papers/2305.14314) to finetune quantized LLMs.
2323

24-
We'll work with the
25-
[FLUX.1-dev model](https://huggingface.co/black-forest-labs/FLUX.1-dev),
26-
demonstrating how quantization can help you run it on less than 16GB of VRAMeven on a free Google
24+
This guide demonstrates how quantization can enable running
25+
[FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)
26+
on less than 16GB of VRAM and even on a free Google
2727
Colab instance.
2828

2929
![comparison image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/quant-bnb/comparison.png)
@@ -41,10 +41,10 @@ This works for any model in any modality, as long as it supports loading with
4141
<hfoptions id="bnb">
4242
<hfoption id="8-bit">
4343

44-
Quantizing a model in 8-bit halves the memory-usage:
44+
Quantizing a model in 8-bit halves the memory-usage.
4545

46-
As `bitsandbytes` is supported in both `transformers` and `diffusers` we can quantize both the
47-
`FluxTransformer2DModel` and `T5EncoderModel`.
46+
bitsandbytes is supported in both Transformers and Diffusers, so you can quantize both the
47+
[`FluxTransformer2DModel`] and [`~transformers.T5EncoderModel`].
4848

4949
```py
5050
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
@@ -58,7 +58,7 @@ quant_config = TransformersBitsAndBytesConfig(
5858
)
5959

6060
text_encoder_2_8bit = T5EncoderModel.from_pretrained(
61-
"black-forest-labs/FLUX.1-dev,
61+
"black-forest-labs/FLUX.1-dev",
6262
subfolder="text_encoder_2",
6363
quantization_config=quant_config,
6464
torch_dtype=torch.float16,
@@ -77,7 +77,7 @@ transformer_8bit = FluxTransformer2DModel.from_pretrained(
7777
```
7878

7979
By default, all the other modules such as `torch.nn.LayerNorm` are converted to `torch.float16`.
80-
You can change the data type of these modules with the `torch_dtype` parameter if you want:
80+
You can change the data type of these modules with the `torch_dtype` parameter.
8181

8282
```py
8383
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
@@ -91,7 +91,7 @@ quant_config = TransformersBitsAndBytesConfig(
9191
)
9292

9393
text_encoder_2_8bit = T5EncoderModel.from_pretrained(
94-
"black-forest-labs/FLUX.1-dev,
94+
"black-forest-labs/FLUX.1-dev",
9595
subfolder="text_encoder_2",
9696
quantization_config=quant_config,
9797
torch_dtype=torch.float32,
@@ -113,7 +113,7 @@ Let's generate an image using our quantized models.
113113

114114
```py
115115
pipe = FluxPipeline.from_pretrained(
116-
"black-forest-labs/FLUX.1-dev,
116+
"black-forest-labs/FLUX.1-dev",
117117
transformer=transformer_8bit,
118118
text_encoder_2=text_encoder_2_8bit,
119119
torch_dtype=torch.float16,
@@ -137,7 +137,9 @@ image = pipe(
137137
image.resize((224, 224))
138138
```
139139

140-
![8 bit image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/quant-bnb/8bit.png)
140+
<div class="flex justify-center">
141+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/quant-bnb/8bit.png"/>
142+
</div>
141143

142144
Once a model is quantized, you can push the model to the Hub with the [`~ModelMixin.push_to_hub`] method.
143145
The quantization `config.json` file is pushed first, followed by the quantized model weights.
@@ -146,10 +148,10 @@ You can also save the serialized 8-bit models locally with [`~ModelMixin.save_pr
146148
</hfoption>
147149
<hfoption id="4-bit">
148150

149-
Quantizing a model in 4-bit reduces your memory-usage by 4x:
151+
Quantizing a model in 4-bit reduces your memory-usage by 4x.
150152

151-
As `bitsandbytes` is supported in both `transformers` and `diffusers` we can quantize both the
152-
`FluxTransformer2DModel` and `T5EncoderModel`.
153+
bitsandbytes` is supported in both Transformers and Diffusers, so you can can quantize both the
154+
[`FluxTransformer2DModel`] and [`~transformers.T5EncoderModel`].
153155

154156
```py
155157
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
@@ -163,7 +165,7 @@ quant_config = TransformersBitsAndBytesConfig(
163165
)
164166

165167
text_encoder_2_4bit = T5EncoderModel.from_pretrained(
166-
"black-forest-labs/FLUX.1-dev,
168+
"black-forest-labs/FLUX.1-dev",
167169
subfolder="text_encoder_2",
168170
quantization_config=quant_config,
169171
torch_dtype=torch.float16,
@@ -182,7 +184,7 @@ transformer_4bit = FluxTransformer2DModel.from_pretrained(
182184
```
183185

184186
By default, all the other modules such as `torch.nn.LayerNorm` are converted to `torch.float16`.
185-
You can change the data type of these modules with the `torch_dtype` parameter if you want:
187+
You can change the data type of these modules with the `torch_dtype` parameter.
186188

187189
```py
188190
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
@@ -218,7 +220,7 @@ Let's generate an image using our quantized models.
218220

219221
```py
220222
pipe = FluxPipeline.from_pretrained(
221-
"black-forest-labs/FLUX.1-dev,
223+
"black-forest-labs/FLUX.1-dev",
222224
transformer=transformer_4bit,
223225
text_encoder_2=text_encoder_2_4bit,
224226
torch_dtype=torch.float16,
@@ -242,7 +244,9 @@ image = pipe(
242244
image.resize((224, 224))
243245
```
244246

245-
![4 bit image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/quant-bnb/4bit.png)
247+
<div class="flex justify-center">
248+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/quant-bnb/4bit.png"/>
249+
</div>
246250

247251
Once a model is quantized, you can push the model to the Hub with the [`~ModelMixin.push_to_hub`] method.
248252
The quantization `config.json` file is pushed first, followed by the quantized model weights.
@@ -376,7 +380,7 @@ quant_config = TransformersBitsAndBytesConfig(
376380
)
377381

378382
text_encoder_2_4bit = T5EncoderModel.from_pretrained(
379-
"black-forest-labs/FLUX.1-dev,
383+
"black-forest-labs/FLUX.1-dev",
380384
subfolder="text_encoder_2",
381385
quantization_config=quant_config,
382386
torch_dtype=torch.float16,
@@ -418,7 +422,7 @@ quant_config = TransformersBitsAndBytesConfig(
418422
)
419423

420424
text_encoder_2_4bit = T5EncoderModel.from_pretrained(
421-
"black-forest-labs/FLUX.1-dev,
425+
"black-forest-labs/FLUX.1-dev",
422426
subfolder="text_encoder_2",
423427
quantization_config=quant_config,
424428
torch_dtype=torch.float16,
@@ -439,8 +443,8 @@ transformer_4bit = FluxTransformer2DModel.from_pretrained(
439443

440444
## Dequantizing `bitsandbytes` models
441445

442-
Once quantized, you can dequantize the model to the original precision but this might result in a
443-
small quality loss of the model. Make sure you have enough GPU RAM to fit the dequantized model.
446+
Once quantized, you can dequantize a model to its original precision, but this might result in a
447+
small loss of quality. Make sure you have enough GPU RAM to fit the dequantized model.
444448

445449
```python
446450
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
@@ -455,7 +459,7 @@ quant_config = TransformersBitsAndBytesConfig(
455459
)
456460

457461
text_encoder_2_4bit = T5EncoderModel.from_pretrained(
458-
"black-forest-labs/FLUX.1-dev,
462+
"black-forest-labs/FLUX.1-dev",
459463
subfolder="text_encoder_2",
460464
quantization_config=quant_config,
461465
torch_dtype=torch.float16,

0 commit comments

Comments
 (0)