Skip to content

Commit d6b48ea

Browse files
committed
clarify recommendations.
1 parent 0ae2a9a commit d6b48ea

File tree

1 file changed

+18
-1
lines changed

1 file changed

+18
-1
lines changed

docs/source/en/quantization/overview.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,4 +97,21 @@ This `pipeline_quant_config` can now be passed to [`~DiffusionPipeline.from_pret
9797
In this case, `quant_kwargs` will be used to initialize the quantization specifications
9898
of the respective quantization configuration class of `quant_backend`. `components_to_quantize`
9999
is used to denote the components that will be quantized. For most pipelines, you would want to
100-
keep `transformer` in the list as that is often the most compute and memory intensive.
100+
keep `transformer` in the list as that is often the most compute and memory intensive.
101+
102+
The config below will work for most diffusion pipelines that have a `transformer` component present.
103+
In most case, you will want to quantize the `transformer` component as that is often the most compute-
104+
intensive part of a diffusion pipeline.
105+
106+
```py
107+
pipeline_quant_config = PipelineQuantizationConfig(
108+
quant_backend="bitsandbytes_4bit",
109+
quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4", "bnb_4bit_compute_dtype": torch.bfloat16},
110+
components_to_quantize=["transformer"],
111+
)
112+
```
113+
114+
Diffusion pipelines can have multiple text encoders. [`FluxPipeline`] has two, for example. It's
115+
recommended to quantize the text encoders that are memory-intensive. Some examples include T5,
116+
Llama, Gemma, etc. In the above example, you quantized the T5 model of [`FluxPipeline`] through
117+
`text_encoder_2` while keeping the CLIP model intact (accessible through `text_encoder`).

0 commit comments

Comments
 (0)