[Quantization] bring quantization to diffusers core

Now that we have a working PoC (#9165) of NF4 quantization through `bitsandbytes` and also [this](https://huggingface.co/blog/quanto-diffusers) through `optimum.quanto`, it's time to bring in quantization more formally in `diffusers` 🎸

In this issue, I want to devise a rough plan to attack the integration. We are going to start with `bitsandbytes` and then slowly increase the list of our supported quantizers based on community interest. This integration will also allow us to do LoRA fine-tuning of large models like [Flux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux) through `peft` ([guide](https://huggingface.co/docs/peft/en/developer_guides/quantization)).  

Three PRs are expected: 

- [ ] Introduce a [base quantization config class](https://github.com/huggingface/transformers/blob/main/src/transformers/quantizers/base.py) like we have in `transformers`. 
- [ ] Introduce `bitsandbytes` related utilities to handle processing, post-processing of layers for injecting `bitsandbytes` layers. Example is [here](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/bitsandbytes.py). 
- [ ] Introduce a `bitsandbytes` config ([example](https://github.com/huggingface/transformers/blob/main/src/transformers/quantizers/quantizer_bnb_4bit.py)) and quantization loader mixin aka `QuantizationLoaderMixin`. This loader will enable passing a quantization config to `from_pretrained()` of a `ModelMixin` and will tackle how to modify and prepare the model for the provided quantization config. This will also allow us to serialize the model according to the quantization config. 

--- 

Notes:

* We could have done this with `accelerate` ([guide](https://huggingface.co/docs/accelerate/en/usage_guides/quantization)) but this doesn't yet support NF4 serialization. 
* Good example PR: https://github.com/huggingface/transformers/pull/32306

---

@DN6 @SunMarc sounds good? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Quantization] bring quantization to diffusers core #9174

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Quantization] bring quantization to diffusers core #9174

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions