Skip to content

Add optimum.quanto as supported load-time quantization_config  #10328

@vladmandic

Description

@vladmandic

Recent additions to diffusers added BitsAndBytesConfig as well as TorchAoConfig options that can be used as quantization_config when loading model components using from_pretrained

for example:

quantization_config = BitsAndBytesConfig(...)
transformer = SD3Transformer2DModel.from_pretrained(repo_id, subfolder="transformer", quantization_config=quantization_config)

ask is to also support Huggingface's own Optimum Quanto
right now its possible to use it, but only as post-load on-demand quantization, there is no option to use it like BnB or TorchAO to apply quantization automatically during load itself.

@yiyixuxu @sayakpaul @DN6 @asomoza

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions