Skip to content

Commit 204f521

Browse files
Merge branch 'main' into lora-hot-swapping
2 parents dec4d10 + 4e3ddd5 commit 204f521

File tree

71 files changed

+11578
-543
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+11578
-543
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -418,6 +418,8 @@ jobs:
418418
test_location: "gguf"
419419
- backend: "torchao"
420420
test_location: "torchao"
421+
- backend: "optimum_quanto"
422+
test_location: "quanto"
421423
runs-on:
422424
group: aws-g6e-xlarge-plus
423425
container:

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,8 @@
173173
title: gguf
174174
- local: quantization/torchao
175175
title: torchao
176+
- local: quantization/quanto
177+
title: quanto
176178
title: Quantization Methods
177179
- sections:
178180
- local: optimization/fp16

docs/source/en/api/pipelines/wan.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,22 @@ pipe = WanPipeline.from_pretrained("Wan-AI/Wan2.1-T2V-1.3B-Diffusers", scheduler
4545
pipe.scheduler = <CUSTOM_SCHEDULER_HERE>
4646
```
4747

48+
### Using single file loading with Wan
49+
50+
The `WanTransformer3DModel` and `AutoencoderKLWan` models support loading checkpoints in their original format via the `from_single_file` loading
51+
method.
52+
53+
54+
```python
55+
import torch
56+
from diffusers import WanPipeline, WanTransformer3DModel
57+
58+
ckpt_path = "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/diffusion_models/wan2.1_t2v_1.3B_bf16.safetensors"
59+
transformer = WanTransformer3DModel.from_single_file(ckpt_path, torch_dtype=torch.bfloat16)
60+
61+
pipe = WanPipeline.from_pretrained("Wan-AI/Wan2.1-T2V-1.3B-Diffusers", transformer=transformer)
62+
```
63+
4864
## WanPipeline
4965

5066
[[autodoc]] WanPipeline

docs/source/en/api/quantization.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,11 @@ Learn how to quantize models in the [Quantization](../quantization/overview) gui
3131
## GGUFQuantizationConfig
3232

3333
[[autodoc]] GGUFQuantizationConfig
34+
35+
## QuantoConfig
36+
37+
[[autodoc]] QuantoConfig
38+
3439
## TorchAoConfig
3540

3641
[[autodoc]] TorchAoConfig

docs/source/en/quantization/overview.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,5 +36,6 @@ Diffusers currently supports the following quantization methods.
3636
- [BitsandBytes](./bitsandbytes)
3737
- [TorchAO](./torchao)
3838
- [GGUF](./gguf)
39+
- [Quanto](./quanto.md)
3940

4041
[This resource](https://huggingface.co/docs/transformers/main/en/quantization/overview#when-to-use-what) provides a good overview of the pros and cons of different quantization techniques.
Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
12+
-->
13+
14+
# Quanto
15+
16+
[Quanto](https://github.com/huggingface/optimum-quanto) is a PyTorch quantization backend for [Optimum](https://huggingface.co/docs/optimum/en/index). It has been designed with versatility and simplicity in mind:
17+
18+
- All features are available in eager mode (works with non-traceable models)
19+
- Supports quantization aware training
20+
- Quantized models are compatible with `torch.compile`
21+
- Quantized models are Device agnostic (e.g CUDA,XPU,MPS,CPU)
22+
23+
In order to use the Quanto backend, you will first need to install `optimum-quanto>=0.2.6` and `accelerate`
24+
25+
```shell
26+
pip install optimum-quanto accelerate
27+
```
28+
29+
Now you can quantize a model by passing the `QuantoConfig` object to the `from_pretrained()` method. Although the Quanto library does allow quantizing `nn.Conv2d` and `nn.LayerNorm` modules, currently, Diffusers only supports quantizing the weights in the `nn.Linear` layers of a model. The following snippet demonstrates how to apply `float8` quantization with Quanto.
30+
31+
```python
32+
import torch
33+
from diffusers import FluxTransformer2DModel, QuantoConfig
34+
35+
model_id = "black-forest-labs/FLUX.1-dev"
36+
quantization_config = QuantoConfig(weights_dtype="float8")
37+
transformer = FluxTransformer2DModel.from_pretrained(
38+
model_id,
39+
subfolder="transformer",
40+
quantization_config=quantization_config,
41+
torch_dtype=torch.bfloat16,
42+
)
43+
44+
pipe = FluxPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch_dtype)
45+
pipe.to("cuda")
46+
47+
prompt = "A cat holding a sign that says hello world"
48+
image = pipe(
49+
prompt, num_inference_steps=50, guidance_scale=4.5, max_sequence_length=512
50+
).images[0]
51+
image.save("output.png")
52+
```
53+
54+
## Skipping Quantization on specific modules
55+
56+
It is possible to skip applying quantization on certain modules using the `modules_to_not_convert` argument in the `QuantoConfig`. Please ensure that the modules passed in to this argument match the keys of the modules in the `state_dict`
57+
58+
```python
59+
import torch
60+
from diffusers import FluxTransformer2DModel, QuantoConfig
61+
62+
model_id = "black-forest-labs/FLUX.1-dev"
63+
quantization_config = QuantoConfig(weights_dtype="float8", modules_to_not_convert=["proj_out"])
64+
transformer = FluxTransformer2DModel.from_pretrained(
65+
model_id,
66+
subfolder="transformer",
67+
quantization_config=quantization_config,
68+
torch_dtype=torch.bfloat16,
69+
)
70+
```
71+
72+
## Using `from_single_file` with the Quanto Backend
73+
74+
`QuantoConfig` is compatible with `~FromOriginalModelMixin.from_single_file`.
75+
76+
```python
77+
import torch
78+
from diffusers import FluxTransformer2DModel, QuantoConfig
79+
80+
ckpt_path = "https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/flux1-dev.safetensors"
81+
quantization_config = QuantoConfig(weights_dtype="float8")
82+
transformer = FluxTransformer2DModel.from_single_file(ckpt_path, quantization_config=quantization_config, torch_dtype=torch.bfloat16)
83+
```
84+
85+
## Saving Quantized models
86+
87+
Diffusers supports serializing Quanto models using the `~ModelMixin.save_pretrained` method.
88+
89+
The serialization and loading requirements are different for models quantized directly with the Quanto library and models quantized
90+
with Diffusers using Quanto as the backend. It is currently not possible to load models quantized directly with Quanto into Diffusers using `~ModelMixin.from_pretrained`
91+
92+
```python
93+
import torch
94+
from diffusers import FluxTransformer2DModel, QuantoConfig
95+
96+
model_id = "black-forest-labs/FLUX.1-dev"
97+
quantization_config = QuantoConfig(weights_dtype="float8")
98+
transformer = FluxTransformer2DModel.from_pretrained(
99+
model_id,
100+
subfolder="transformer",
101+
quantization_config=quantization_config,
102+
torch_dtype=torch.bfloat16,
103+
)
104+
# save quantized model to reuse
105+
transformer.save_pretrained("<your quantized model save path>")
106+
107+
# you can reload your quantized model with
108+
model = FluxTransformer2DModel.from_pretrained("<your quantized model save path>")
109+
```
110+
111+
## Using `torch.compile` with Quanto
112+
113+
Currently the Quanto backend supports `torch.compile` for the following quantization types:
114+
115+
- `int8` weights
116+
117+
```python
118+
import torch
119+
from diffusers import FluxPipeline, FluxTransformer2DModel, QuantoConfig
120+
121+
model_id = "black-forest-labs/FLUX.1-dev"
122+
quantization_config = QuantoConfig(weights_dtype="int8")
123+
transformer = FluxTransformer2DModel.from_pretrained(
124+
model_id,
125+
subfolder="transformer",
126+
quantization_config=quantization_config,
127+
torch_dtype=torch.bfloat16,
128+
)
129+
transformer = torch.compile(transformer, mode="max-autotune", fullgraph=True)
130+
131+
pipe = FluxPipeline.from_pretrained(
132+
model_id, transformer=transformer, torch_dtype=torch_dtype
133+
)
134+
pipe.to("cuda")
135+
images = pipe("A cat holding a sign that says hello").images[0]
136+
images.save("flux-quanto-compile.png")
137+
```
138+
139+
## Supported Quantization Types
140+
141+
### Weights
142+
143+
- float8
144+
- int8
145+
- int4
146+
- int2
147+
148+

docs/source/en/quantization/torchao.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ image = pipe(prompt, num_inference_steps=30, guidance_scale=7.0).images[0]
126126
image.save("output.png")
127127
```
128128

129-
Some quantization methods, such as `uint4wo`, cannot be loaded directly and may result in an `UnpicklingError` when trying to load the models, but work as expected when saving them. In order to work around this, one can load the state dict manually into the model. Note, however, that this requires using `weights_only=False` in `torch.load`, so it should be run only if the weights were obtained from a trustable source.
129+
If you are using `torch<=2.6.0`, some quantization methods, such as `uint4wo`, cannot be loaded directly and may result in an `UnpicklingError` when trying to load the models, but work as expected when saving them. In order to work around this, one can load the state dict manually into the model. Note, however, that this requires using `weights_only=False` in `torch.load`, so it should be run only if the weights were obtained from a trustable source.
130130

131131
```python
132132
import torch

examples/community/README.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif
1010

1111
| Example | Description | Code Example | Colab | Author |
1212
|:--------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------:|
13+
|Spatiotemporal Skip Guidance (STG)|[Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling](https://arxiv.org/abs/2411.18664) (CVPR 2025) enhances video diffusion models by generating a weaker model through layer skipping and using it as guidance, improving fidelity in models like HunyuanVideo, LTXVideo, and Mochi.|[Spatiotemporal Skip Guidance](#spatiotemporal-skip-guidance)|-|[Junha Hyung](https://junhahyung.github.io/), [Kinam Kim](https://kinam0252.github.io/)|
1314
|Adaptive Mask Inpainting|Adaptive Mask Inpainting algorithm from [Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models](https://github.com/snuvclab/coma) (ECCV '24, Oral) provides a way to insert human inside the scene image without altering the background, by inpainting with adapting mask.|[Adaptive Mask Inpainting](#adaptive-mask-inpainting)|-|[Hyeonwoo Kim](https://sshowbiz.xyz),[Sookwan Han](https://jellyheadandrew.github.io)|
1415
|Flux with CFG|[Flux with CFG](https://github.com/ToTheBeginning/PuLID/blob/main/docs/pulid_for_flux.md) provides an implementation of using CFG in [Flux](https://blackforestlabs.ai/announcing-black-forest-labs/).|[Flux with CFG](#flux-with-cfg)|[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/flux_with_cfg.ipynb)|[Linoy Tsaban](https://github.com/linoytsaban), [Apolinário](https://github.com/apolinario), and [Sayak Paul](https://github.com/sayakpaul)|
1516
|Differential Diffusion|[Differential Diffusion](https://github.com/exx8/differential-diffusion) modifies an image according to a text prompt, and according to a map that specifies the amount of change in each region.|[Differential Diffusion](#differential-diffusion)|[![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/exx8/differential-diffusion) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/exx8/differential-diffusion/blob/main/examples/SD2.ipynb)|[Eran Levin](https://github.com/exx8) and [Ohad Fried](https://www.ohadf.com/)|
@@ -93,6 +94,55 @@ pipe = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion
9394

9495
## Example usages
9596

97+
### Spatiotemporal Skip Guidance
98+
99+
**Junha Hyung\*, Kinam Kim\*, Susung Hong, Min-Jung Kim, Jaegul Choo**
100+
101+
**KAIST AI, University of Washington**
102+
103+
[*Spatiotemporal Skip Guidance (STG) for Enhanced Video Diffusion Sampling*](https://arxiv.org/abs/2411.18664) (CVPR 2025) is a simple training-free sampling guidance method for enhancing transformer-based video diffusion models. STG employs an implicit weak model via self-perturbation, avoiding the need for external models or additional training. By selectively skipping spatiotemporal layers, STG produces an aligned, degraded version of the original model to boost sample quality without compromising diversity or dynamic degree.
104+
105+
Following is the example video of STG applied to Mochi.
106+
107+
108+
https://github.com/user-attachments/assets/148adb59-da61-4c50-9dfa-425dcb5c23b3
109+
110+
More examples and information can be found on the [GitHub repository](https://github.com/junhahyung/STGuidance) and the [Project website](https://junhahyung.github.io/STGuidance/).
111+
112+
#### Usage example
113+
```python
114+
import torch
115+
from pipeline_stg_mochi import MochiSTGPipeline
116+
from diffusers.utils import export_to_video
117+
118+
# Load the pipeline
119+
pipe = MochiSTGPipeline.from_pretrained("genmo/mochi-1-preview", variant="bf16", torch_dtype=torch.bfloat16)
120+
121+
# Enable memory savings
122+
pipe = pipe.to("cuda")
123+
124+
#--------Option--------#
125+
prompt = "A close-up of a beautiful woman's face with colored powder exploding around her, creating an abstract splash of vibrant hues, realistic style."
126+
stg_applied_layers_idx = [34]
127+
stg_mode = "STG"
128+
stg_scale = 1.0 # 0.0 for CFG
129+
#----------------------#
130+
131+
# Generate video frames
132+
frames = pipe(
133+
prompt,
134+
height=480,
135+
width=480,
136+
num_frames=81,
137+
stg_applied_layers_idx=stg_applied_layers_idx,
138+
stg_scale=stg_scale,
139+
generator = torch.Generator().manual_seed(42),
140+
do_rescaling=do_rescaling,
141+
).frames[0]
142+
143+
export_to_video(frames, "output.mp4", fps=30)
144+
```
145+
96146
### Adaptive Mask Inpainting
97147

98148
**Hyeonwoo Kim\*, Sookwan Han\*, Patrick Kwon, Hanbyul Joo**

examples/community/mixture_tiling_sdxl.py

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright 2025 The HuggingFace Team. All rights reserved.
1+
# Copyright 2025 The DEVAIEXP Team and The HuggingFace Team. All rights reserved.
22
#
33
# Licensed under the Apache License, Version 2.0 (the "License");
44
# you may not use this file except in compliance with the License.
@@ -1070,32 +1070,32 @@ def __call__(
10701070
text_encoder_projection_dim = int(pooled_prompt_embeds.shape[-1])
10711071
else:
10721072
text_encoder_projection_dim = self.text_encoder_2.config.projection_dim
1073-
add_time_ids = self._get_add_time_ids(
1074-
original_size,
1075-
crops_coords_top_left[row][col],
1076-
target_size,
1073+
add_time_ids = self._get_add_time_ids(
1074+
original_size,
1075+
crops_coords_top_left[row][col],
1076+
target_size,
1077+
dtype=prompt_embeds.dtype,
1078+
text_encoder_projection_dim=text_encoder_projection_dim,
1079+
)
1080+
if negative_original_size is not None and negative_target_size is not None:
1081+
negative_add_time_ids = self._get_add_time_ids(
1082+
negative_original_size,
1083+
negative_crops_coords_top_left[row][col],
1084+
negative_target_size,
10771085
dtype=prompt_embeds.dtype,
10781086
text_encoder_projection_dim=text_encoder_projection_dim,
10791087
)
1080-
if negative_original_size is not None and negative_target_size is not None:
1081-
negative_add_time_ids = self._get_add_time_ids(
1082-
negative_original_size,
1083-
negative_crops_coords_top_left[row][col],
1084-
negative_target_size,
1085-
dtype=prompt_embeds.dtype,
1086-
text_encoder_projection_dim=text_encoder_projection_dim,
1087-
)
1088-
else:
1089-
negative_add_time_ids = add_time_ids
1088+
else:
1089+
negative_add_time_ids = add_time_ids
10901090

1091-
if self.do_classifier_free_guidance:
1092-
prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)
1093-
add_text_embeds = torch.cat([negative_pooled_prompt_embeds, add_text_embeds], dim=0)
1094-
add_time_ids = torch.cat([negative_add_time_ids, add_time_ids], dim=0)
1091+
if self.do_classifier_free_guidance:
1092+
prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)
1093+
add_text_embeds = torch.cat([negative_pooled_prompt_embeds, add_text_embeds], dim=0)
1094+
add_time_ids = torch.cat([negative_add_time_ids, add_time_ids], dim=0)
10951095

1096-
prompt_embeds = prompt_embeds.to(device)
1097-
add_text_embeds = add_text_embeds.to(device)
1098-
add_time_ids = add_time_ids.to(device).repeat(batch_size * num_images_per_prompt, 1)
1096+
prompt_embeds = prompt_embeds.to(device)
1097+
add_text_embeds = add_text_embeds.to(device)
1098+
add_time_ids = add_time_ids.to(device).repeat(batch_size * num_images_per_prompt, 1)
10991099
addition_embed_type_row.append((prompt_embeds, add_text_embeds, add_time_ids))
11001100
embeddings_and_added_time.append(addition_embed_type_row)
11011101

0 commit comments

Comments
 (0)