diff --git a/docs/source/en/quantization/bitsandbytes.md b/docs/source/en/quantization/bitsandbytes.md index 744351c9b15e..b1c130b792c3 100644 --- a/docs/source/en/quantization/bitsandbytes.md +++ b/docs/source/en/quantization/bitsandbytes.md @@ -48,7 +48,7 @@ For Ada and higher-series GPUs. we recommend changing `torch_dtype` to `torch.bf ```py from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig - +import torch from diffusers import AutoModel from transformers import T5EncoderModel @@ -88,6 +88,8 @@ Setting `device_map="auto"` automatically fills all available space on the GPU(s CPU, and finally, the hard drive (the absolute slowest option) if there is still not enough memory. ```py +from diffusers import FluxPipeline + pipe = FluxPipeline.from_pretrained( "black-forest-labs/FLUX.1-dev", transformer=transformer_8bit, @@ -132,7 +134,7 @@ For Ada and higher-series GPUs. we recommend changing `torch_dtype` to `torch.bf ```py from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig - +import torch from diffusers import AutoModel from transformers import T5EncoderModel @@ -171,6 +173,8 @@ Let's generate an image using our quantized models. Setting `device_map="auto"` automatically fills all available space on the GPU(s) first, then the CPU, and finally, the hard drive (the absolute slowest option) if there is still not enough memory. ```py +from diffusers import FluxPipeline + pipe = FluxPipeline.from_pretrained( "black-forest-labs/FLUX.1-dev", transformer=transformer_4bit, @@ -214,6 +218,8 @@ Check your memory footprint with the `get_memory_footprint` method: print(model.get_memory_footprint()) ``` +Note that this only tells you the memory footprint of the model params and does _not_ estimate the inference memory requirements. + Quantized models can be loaded from the [`~ModelMixin.from_pretrained`] method without needing to specify the `quantization_config` parameters: ```py @@ -413,4 +419,4 @@ transformer_4bit.dequantize() ## Resources * [End-to-end notebook showing Flux.1 Dev inference in a free-tier Colab](https://gist.github.com/sayakpaul/c76bd845b48759e11687ac550b99d8b4) -* [Training](https://gist.github.com/sayakpaul/05afd428bc089b47af7c016e42004527) \ No newline at end of file +* [Training](https://github.com/huggingface/diffusers/blob/8c661ea586bf11cb2440da740dd3c4cf84679b85/examples/dreambooth/README_hidream.md#using-quantization) \ No newline at end of file