@@ -21,9 +21,9 @@ fp16. This reduces the degradative effect outlier values have on a model's perfo
21214-bit quantization compresses a model even further, and it is commonly used with
2222[ QLoRA] ( https://hf.co/papers/2305.14314 ) to finetune quantized LLMs.
2323
24- We'll work with the
25- [ FLUX.1-dev model ] ( https://huggingface.co/black-forest-labs/FLUX.1-dev ) ,
26- demonstrating how quantization can help you run it on less than 16GB of VRAM— even on a free Google
24+ This guide demonstrates how quantization can enable running
25+ [ FLUX.1-dev] ( https://huggingface.co/black-forest-labs/FLUX.1-dev )
26+ on less than 16GB of VRAM and even on a free Google
2727Colab instance.
2828
2929![ comparison image] ( https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/quant-bnb/comparison.png )
@@ -41,10 +41,10 @@ This works for any model in any modality, as long as it supports loading with
4141<hfoptions id =" bnb " >
4242<hfoption id =" 8-bit " >
4343
44- Quantizing a model in 8-bit halves the memory-usage:
44+ Quantizing a model in 8-bit halves the memory-usage.
4545
46- As ` bitsandbytes ` is supported in both ` transformers ` and ` diffusers ` we can quantize both the
47- ` FluxTransformer2DModel ` and ` T5EncoderModel ` .
46+ bitsandbytes is supported in both Transformers and Diffusers, so you can quantize both the
47+ [ ` FluxTransformer2DModel ` ] and [ ` ~transformers. T5EncoderModel` ] .
4848
4949``` py
5050from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
@@ -58,7 +58,7 @@ quant_config = TransformersBitsAndBytesConfig(
5858)
5959
6060text_encoder_2_8bit = T5EncoderModel.from_pretrained(
61- " black-forest-labs/FLUX.1-dev,
61+ " black-forest-labs/FLUX.1-dev" ,
6262 subfolder = " text_encoder_2" ,
6363 quantization_config = quant_config,
6464 torch_dtype = torch.float16,
@@ -77,7 +77,7 @@ transformer_8bit = FluxTransformer2DModel.from_pretrained(
7777```
7878
7979By default, all the other modules such as ` torch.nn.LayerNorm ` are converted to ` torch.float16 ` .
80- You can change the data type of these modules with the ` torch_dtype ` parameter if you want:
80+ You can change the data type of these modules with the ` torch_dtype ` parameter.
8181
8282``` py
8383from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
@@ -91,7 +91,7 @@ quant_config = TransformersBitsAndBytesConfig(
9191)
9292
9393text_encoder_2_8bit = T5EncoderModel.from_pretrained(
94- " black-forest-labs/FLUX.1-dev,
94+ " black-forest-labs/FLUX.1-dev" ,
9595 subfolder = " text_encoder_2" ,
9696 quantization_config = quant_config,
9797 torch_dtype = torch.float32,
@@ -113,7 +113,7 @@ Let's generate an image using our quantized models.
113113
114114``` py
115115pipe = FluxPipeline.from_pretrained(
116- " black-forest-labs/FLUX.1-dev,
116+ " black-forest-labs/FLUX.1-dev" ,
117117 transformer = transformer_8bit,
118118 text_encoder_2 = text_encoder_2_8bit,
119119 torch_dtype = torch.float16,
@@ -137,7 +137,9 @@ image = pipe(
137137image.resize((224 , 224 ))
138138```
139139
140- ![ 8 bit image] ( https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/quant-bnb/8bit.png )
140+ <div class =" flex justify-center " >
141+ <img src =" https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/quant-bnb/8bit.png " />
142+ </div >
141143
142144Once a model is quantized, you can push the model to the Hub with the [ ` ~ModelMixin.push_to_hub ` ] method.
143145The quantization ` config.json ` file is pushed first, followed by the quantized model weights.
@@ -146,10 +148,10 @@ You can also save the serialized 8-bit models locally with [`~ModelMixin.save_pr
146148</hfoption >
147149<hfoption id =" 4-bit " >
148150
149- Quantizing a model in 4-bit reduces your memory-usage by 4x:
151+ Quantizing a model in 4-bit reduces your memory-usage by 4x.
150152
151- As ` bitsandbytes ` is supported in both ` transformers ` and ` diffusers ` we can quantize both the
152- ` FluxTransformer2DModel ` and ` T5EncoderModel ` .
153+ bitsandbytes` is supported in both Transformers and Diffusers, so you can can quantize both the
154+ [ ` FluxTransformer2DModel ` ] and [ ` ~transformers. T5EncoderModel` ] .
153155
154156``` py
155157from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
@@ -163,7 +165,7 @@ quant_config = TransformersBitsAndBytesConfig(
163165)
164166
165167text_encoder_2_4bit = T5EncoderModel.from_pretrained(
166- " black-forest-labs/FLUX.1-dev,
168+ " black-forest-labs/FLUX.1-dev" ,
167169 subfolder = " text_encoder_2" ,
168170 quantization_config = quant_config,
169171 torch_dtype = torch.float16,
@@ -182,7 +184,7 @@ transformer_4bit = FluxTransformer2DModel.from_pretrained(
182184```
183185
184186By default, all the other modules such as ` torch.nn.LayerNorm ` are converted to ` torch.float16 ` .
185- You can change the data type of these modules with the ` torch_dtype ` parameter if you want:
187+ You can change the data type of these modules with the ` torch_dtype ` parameter.
186188
187189``` py
188190from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
@@ -218,7 +220,7 @@ Let's generate an image using our quantized models.
218220
219221``` py
220222pipe = FluxPipeline.from_pretrained(
221- " black-forest-labs/FLUX.1-dev,
223+ " black-forest-labs/FLUX.1-dev" ,
222224 transformer = transformer_4bit,
223225 text_encoder_2 = text_encoder_2_4bit,
224226 torch_dtype = torch.float16,
@@ -242,7 +244,9 @@ image = pipe(
242244image.resize((224 , 224 ))
243245```
244246
245- ![ 4 bit image] ( https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/quant-bnb/4bit.png )
247+ <div class =" flex justify-center " >
248+ <img src =" https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/quant-bnb/4bit.png " />
249+ </div >
246250
247251Once a model is quantized, you can push the model to the Hub with the [ ` ~ModelMixin.push_to_hub ` ] method.
248252The quantization ` config.json ` file is pushed first, followed by the quantized model weights.
@@ -376,7 +380,7 @@ quant_config = TransformersBitsAndBytesConfig(
376380)
377381
378382text_encoder_2_4bit = T5EncoderModel.from_pretrained(
379- " black-forest-labs/FLUX.1-dev,
383+ " black-forest-labs/FLUX.1-dev" ,
380384 subfolder = " text_encoder_2" ,
381385 quantization_config = quant_config,
382386 torch_dtype = torch.float16,
@@ -418,7 +422,7 @@ quant_config = TransformersBitsAndBytesConfig(
418422)
419423
420424text_encoder_2_4bit = T5EncoderModel.from_pretrained(
421- " black-forest-labs/FLUX.1-dev,
425+ " black-forest-labs/FLUX.1-dev" ,
422426 subfolder = " text_encoder_2" ,
423427 quantization_config = quant_config,
424428 torch_dtype = torch.float16,
@@ -439,8 +443,8 @@ transformer_4bit = FluxTransformer2DModel.from_pretrained(
439443
440444## Dequantizing ` bitsandbytes ` models
441445
442- Once quantized, you can dequantize the model to the original precision but this might result in a
443- small quality loss of the model . Make sure you have enough GPU RAM to fit the dequantized model.
446+ Once quantized, you can dequantize a model to its original precision, but this might result in a
447+ small loss of quality . Make sure you have enough GPU RAM to fit the dequantized model.
444448
445449``` python
446450from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
@@ -455,7 +459,7 @@ quant_config = TransformersBitsAndBytesConfig(
455459)
456460
457461text_encoder_2_4bit = T5EncoderModel.from_pretrained(
458- " black-forest-labs/FLUX.1-dev,
462+ " black-forest-labs/FLUX.1-dev" ,
459463 subfolder = " text_encoder_2" ,
460464 quantization_config = quant_config,
461465 torch_dtype = torch.float16,
0 commit comments