Skip to content

Commit 6d15594

Browse files
authored
Merge branch 'main' into parallel-shards-loading
2 parents 2fdc091 + 11d22e0 commit 6d15594

32 files changed

+3839
-130
lines changed

docs/source/en/_toctree.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -366,6 +366,8 @@
366366
title: PixArtTransformer2DModel
367367
- local: api/models/prior_transformer
368368
title: PriorTransformer
369+
- local: api/models/qwenimage_transformer2d
370+
title: QwenImageTransformer2DModel
369371
- local: api/models/sana_transformer2d
370372
title: SanaTransformer2DModel
371373
- local: api/models/sd3_transformer2d
@@ -418,6 +420,8 @@
418420
title: AutoencoderKLMagvit
419421
- local: api/models/autoencoderkl_mochi
420422
title: AutoencoderKLMochi
423+
- local: api/models/autoencoderkl_qwenimage
424+
title: AutoencoderKLQwenImage
421425
- local: api/models/autoencoder_kl_wan
422426
title: AutoencoderKLWan
423427
- local: api/models/consistency_decoder_vae
@@ -554,6 +558,8 @@
554558
title: PixArt-α
555559
- local: api/pipelines/pixart_sigma
556560
title: PixArt-Σ
561+
- local: api/pipelines/qwenimage
562+
title: QwenImage
557563
- local: api/pipelines/sana
558564
title: Sana
559565
- local: api/pipelines/sana_sprint
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLQwenImage
13+
14+
The model can be loaded with the following code snippet.
15+
16+
```python
17+
from diffusers import AutoencoderKLQwenImage
18+
19+
vae = AutoencoderKLQwenImage.from_pretrained("Qwen/QwenImage-20B", subfolder="vae")
20+
```
21+
22+
## AutoencoderKLQwenImage
23+
24+
[[autodoc]] AutoencoderKLQwenImage
25+
- decode
26+
- encode
27+
- all
28+
29+
## AutoencoderKLOutput
30+
31+
[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
32+
33+
## DecoderOutput
34+
35+
[[autodoc]] models.autoencoders.vae.DecoderOutput
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# QwenImageTransformer2DModel
13+
14+
The model can be loaded with the following code snippet.
15+
16+
```python
17+
from diffusers import QwenImageTransformer2DModel
18+
19+
transformer = QwenImageTransformer2DModel.from_pretrained("Qwen/QwenImage-20B", subfolder="transformer", torch_dtype=torch.bfloat16)
20+
```
21+
22+
## QwenImageTransformer2DModel
23+
24+
[[autodoc]] QwenImageTransformer2DModel
25+
26+
## Transformer2DModelOutput
27+
28+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License. -->
14+
15+
# QwenImage
16+
17+
<!-- TODO: update this section when model is out -->
18+
19+
<Tip>
20+
21+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
22+
23+
</Tip>
24+
25+
## QwenImagePipeline
26+
27+
[[autodoc]] QwenImagePipeline
28+
- all
29+
- __call__
30+
31+
## QwenImagePipeline
32+
33+
[[autodoc]] pipelines.qwenimage.pipeline_output.QwenImagePipelineOutput

docs/source/en/api/pipelines/wan.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,13 +29,17 @@
2929
You can find all the original Wan2.1 checkpoints under the [Wan-AI](https://huggingface.co/Wan-AI) organization.
3030

3131
The following Wan models are supported in Diffusers:
32+
3233
- [Wan 2.1 T2V 1.3B](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers)
3334
- [Wan 2.1 T2V 14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B-Diffusers)
3435
- [Wan 2.1 I2V 14B - 480P](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P-Diffusers)
3536
- [Wan 2.1 I2V 14B - 720P](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P-Diffusers)
3637
- [Wan 2.1 FLF2V 14B - 720P](https://huggingface.co/Wan-AI/Wan2.1-FLF2V-14B-720P-diffusers)
3738
- [Wan 2.1 VACE 1.3B](https://huggingface.co/Wan-AI/Wan2.1-VACE-1.3B-diffusers)
3839
- [Wan 2.1 VACE 14B](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B-diffusers)
40+
- [Wan 2.2 T2V 14B](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers)
41+
- [Wan 2.2 I2V 14B](https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B-Diffusers)
42+
- [Wan 2.2 TI2V 5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers)
3943

4044
> [!TIP]
4145
> Click on the Wan2.1 models in the right sidebar for more examples of video generation.
@@ -327,6 +331,8 @@ The general rule of thumb to keep in mind when preparing inputs for the VACE pip
327331

328332
- Try lower `shift` values (`2.0` to `5.0`) for lower resolution videos and higher `shift` values (`7.0` to `12.0`) for higher resolution images.
329333

334+
- Wan 2.1 and 2.2 support using [LightX2V LoRAs](https://huggingface.co/Kijai/WanVideo_comfy/tree/main/Lightx2v) to speed up inference. Using them on Wan 2.2 is slightly more involed. Refer to [this code snippet](https://github.com/huggingface/diffusers/pull/12040#issuecomment-3144185272) to learn more.
335+
330336
## WanPipeline
331337

332338
[[autodoc]] WanPipeline

src/diffusers/__init__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,7 @@
174174
"AutoencoderKLLTXVideo",
175175
"AutoencoderKLMagvit",
176176
"AutoencoderKLMochi",
177+
"AutoencoderKLQwenImage",
177178
"AutoencoderKLTemporalDecoder",
178179
"AutoencoderKLWan",
179180
"AutoencoderOobleck",
@@ -215,6 +216,7 @@
215216
"OmniGenTransformer2DModel",
216217
"PixArtTransformer2DModel",
217218
"PriorTransformer",
219+
"QwenImageTransformer2DModel",
218220
"SanaControlNetModel",
219221
"SanaTransformer2DModel",
220222
"SD3ControlNetModel",
@@ -486,6 +488,7 @@
486488
"PixArtAlphaPipeline",
487489
"PixArtSigmaPAGPipeline",
488490
"PixArtSigmaPipeline",
491+
"QwenImagePipeline",
489492
"ReduxImageEncoder",
490493
"SanaControlNetPipeline",
491494
"SanaPAGPipeline",
@@ -832,6 +835,7 @@
832835
AutoencoderKLLTXVideo,
833836
AutoencoderKLMagvit,
834837
AutoencoderKLMochi,
838+
AutoencoderKLQwenImage,
835839
AutoencoderKLTemporalDecoder,
836840
AutoencoderKLWan,
837841
AutoencoderOobleck,
@@ -873,6 +877,7 @@
873877
OmniGenTransformer2DModel,
874878
PixArtTransformer2DModel,
875879
PriorTransformer,
880+
QwenImageTransformer2DModel,
876881
SanaControlNetModel,
877882
SanaTransformer2DModel,
878883
SD3ControlNetModel,
@@ -1119,6 +1124,7 @@
11191124
PixArtAlphaPipeline,
11201125
PixArtSigmaPAGPipeline,
11211126
PixArtSigmaPipeline,
1127+
QwenImagePipeline,
11221128
ReduxImageEncoder,
11231129
SanaControlNetPipeline,
11241130
SanaPAGPipeline,

src/diffusers/hooks/_helpers.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,7 @@ def _register_transformer_blocks_metadata():
153153
)
154154
from ..models.transformers.transformer_ltx import LTXVideoTransformerBlock
155155
from ..models.transformers.transformer_mochi import MochiTransformerBlock
156+
from ..models.transformers.transformer_qwenimage import QwenImageTransformerBlock
156157
from ..models.transformers.transformer_wan import WanTransformerBlock
157158

158159
# BasicTransformerBlock
@@ -255,6 +256,15 @@ def _register_transformer_blocks_metadata():
255256
),
256257
)
257258

259+
# QwenImage
260+
TransformerBlockRegistry.register(
261+
model_class=QwenImageTransformerBlock,
262+
metadata=TransformerBlockMetadata(
263+
return_hidden_states_index=1,
264+
return_encoder_hidden_states_index=0,
265+
),
266+
)
267+
258268

259269
# fmt: off
260270
def _skip_attention___ret___hidden_states(self, *args, **kwargs):

src/diffusers/loaders/lora_conversion_utils.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1974,6 +1974,10 @@ def _convert_non_diffusers_wan_lora_to_diffusers(state_dict):
19741974
converted_key = f"condition_embedder.image_embedder.{img_ours}.lora_B.weight"
19751975
if original_key in original_state_dict:
19761976
converted_state_dict[converted_key] = original_state_dict.pop(original_key)
1977+
bias_key_theirs = original_key.removesuffix(f".{lora_up_key}.weight") + ".diff_b"
1978+
if bias_key_theirs in original_state_dict:
1979+
bias_key = converted_key.removesuffix(".weight") + ".bias"
1980+
converted_state_dict[bias_key] = original_state_dict.pop(bias_key_theirs)
19771981

19781982
if len(original_state_dict) > 0:
19791983
diff = all(".diff" in k for k in original_state_dict)

src/diffusers/models/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
_import_structure["autoencoders.autoencoder_kl_ltx"] = ["AutoencoderKLLTXVideo"]
3939
_import_structure["autoencoders.autoencoder_kl_magvit"] = ["AutoencoderKLMagvit"]
4040
_import_structure["autoencoders.autoencoder_kl_mochi"] = ["AutoencoderKLMochi"]
41+
_import_structure["autoencoders.autoencoder_kl_qwenimage"] = ["AutoencoderKLQwenImage"]
4142
_import_structure["autoencoders.autoencoder_kl_temporal_decoder"] = ["AutoencoderKLTemporalDecoder"]
4243
_import_structure["autoencoders.autoencoder_kl_wan"] = ["AutoencoderKLWan"]
4344
_import_structure["autoencoders.autoencoder_oobleck"] = ["AutoencoderOobleck"]
@@ -88,6 +89,7 @@
8889
_import_structure["transformers.transformer_lumina2"] = ["Lumina2Transformer2DModel"]
8990
_import_structure["transformers.transformer_mochi"] = ["MochiTransformer3DModel"]
9091
_import_structure["transformers.transformer_omnigen"] = ["OmniGenTransformer2DModel"]
92+
_import_structure["transformers.transformer_qwenimage"] = ["QwenImageTransformer2DModel"]
9193
_import_structure["transformers.transformer_sd3"] = ["SD3Transformer2DModel"]
9294
_import_structure["transformers.transformer_skyreels_v2"] = ["SkyReelsV2Transformer3DModel"]
9395
_import_structure["transformers.transformer_temporal"] = ["TransformerTemporalModel"]
@@ -126,6 +128,7 @@
126128
AutoencoderKLLTXVideo,
127129
AutoencoderKLMagvit,
128130
AutoencoderKLMochi,
131+
AutoencoderKLQwenImage,
129132
AutoencoderKLTemporalDecoder,
130133
AutoencoderKLWan,
131134
AutoencoderOobleck,
@@ -177,6 +180,7 @@
177180
OmniGenTransformer2DModel,
178181
PixArtTransformer2DModel,
179182
PriorTransformer,
183+
QwenImageTransformer2DModel,
180184
SanaTransformer2DModel,
181185
SD3Transformer2DModel,
182186
SkyReelsV2Transformer3DModel,

src/diffusers/models/autoencoders/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
from .autoencoder_kl_ltx import AutoencoderKLLTXVideo
99
from .autoencoder_kl_magvit import AutoencoderKLMagvit
1010
from .autoencoder_kl_mochi import AutoencoderKLMochi
11+
from .autoencoder_kl_qwenimage import AutoencoderKLQwenImage
1112
from .autoencoder_kl_temporal_decoder import AutoencoderKLTemporalDecoder
1213
from .autoencoder_kl_wan import AutoencoderKLWan
1314
from .autoencoder_oobleck import AutoencoderOobleck

0 commit comments

Comments
 (0)