-
Couldn't load subscription status.
- Fork 6.4k
Closed
Labels
Description
Now that we have a working PoC (#9165) of NF4 quantization through bitsandbytes and also this through optimum.quanto, it's time to bring in quantization more formally in diffusers 🎸
In this issue, I want to devise a rough plan to attack the integration. We are going to start with bitsandbytes and then slowly increase the list of our supported quantizers based on community interest. This integration will also allow us to do LoRA fine-tuning of large models like Flux through peft (guide).
Three PRs are expected:
- Introduce a base quantization config class like we have in
transformers. - Introduce
bitsandbytesrelated utilities to handle processing, post-processing of layers for injectingbitsandbyteslayers. Example is here. - Introduce a
bitsandbytesconfig (example) and quantization loader mixin akaQuantizationLoaderMixin. This loader will enable passing a quantization config tofrom_pretrained()of aModelMixinand will tackle how to modify and prepare the model for the provided quantization config. This will also allow us to serialize the model according to the quantization config.
Notes:
- We could have done this with
accelerate(guide) but this doesn't yet support NF4 serialization. - Good example PR: Add TorchAOHfQuantizer transformers#32306
DN6 and chuck-matin2tin, tolgacangoz, ghunkins, ariG23498, charchit7 and 5 moretolgacangoz