Skip to content

Commit 53fc1d0

Browse files
authored
Merge branch 'main' into fixes-issue-11005
2 parents 536b185 + a4f9c3c commit 53fc1d0

File tree

357 files changed

+17271
-6487
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

357 files changed

+17271
-6487
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -417,7 +417,7 @@ jobs:
417417
additional_deps: ["peft"]
418418
- backend: "gguf"
419419
test_location: "gguf"
420-
additional_deps: []
420+
additional_deps: ["peft"]
421421
- backend: "torchao"
422422
test_location: "torchao"
423423
additional_deps: []

docker/diffusers-onnxruntime-cpu/Dockerfile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,9 @@ ENV PATH="/opt/venv/bin:$PATH"
2828
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
2929
RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
3030
python3 -m uv pip install --no-cache-dir \
31-
torch==2.1.2 \
32-
torchvision==0.16.2 \
33-
torchaudio==2.1.2 \
31+
torch \
32+
torchvision \
33+
torchaudio\
3434
onnxruntime \
3535
--extra-index-url https://download.pytorch.org/whl/cpu && \
3636
python3 -m uv pip install --no-cache-dir \

docs/source/en/_toctree.yml

Lines changed: 43 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@
175175
title: gguf
176176
- local: quantization/torchao
177177
title: torchao
178-
- local: quantization/quanto
178+
- local: quantization/quanto
179179
title: quanto
180180
title: Quantization Methods
181181
- sections:
@@ -265,19 +265,23 @@
265265
sections:
266266
- local: api/models/overview
267267
title: Overview
268+
- local: api/models/auto_model
269+
title: AutoModel
268270
- sections:
269271
- local: api/models/controlnet
270272
title: ControlNetModel
273+
- local: api/models/controlnet_union
274+
title: ControlNetUnionModel
271275
- local: api/models/controlnet_flux
272276
title: FluxControlNetModel
273277
- local: api/models/controlnet_hunyuandit
274278
title: HunyuanDiT2DControlNetModel
279+
- local: api/models/controlnet_sana
280+
title: SanaControlNetModel
275281
- local: api/models/controlnet_sd3
276282
title: SD3ControlNetModel
277283
- local: api/models/controlnet_sparsectrl
278284
title: SparseControlNetModel
279-
- local: api/models/controlnet_union
280-
title: ControlNetUnionModel
281285
title: ControlNets
282286
- sections:
283287
- local: api/models/allegro_transformer3d
@@ -286,30 +290,32 @@
286290
title: AuraFlowTransformer2DModel
287291
- local: api/models/cogvideox_transformer3d
288292
title: CogVideoXTransformer3DModel
289-
- local: api/models/consisid_transformer3d
290-
title: ConsisIDTransformer3DModel
291293
- local: api/models/cogview3plus_transformer2d
292294
title: CogView3PlusTransformer2DModel
293295
- local: api/models/cogview4_transformer2d
294296
title: CogView4Transformer2DModel
297+
- local: api/models/consisid_transformer3d
298+
title: ConsisIDTransformer3DModel
295299
- local: api/models/dit_transformer2d
296300
title: DiTTransformer2DModel
297301
- local: api/models/easyanimate_transformer3d
298302
title: EasyAnimateTransformer3DModel
299303
- local: api/models/flux_transformer
300304
title: FluxTransformer2DModel
305+
- local: api/models/hidream_image_transformer
306+
title: HiDreamImageTransformer2DModel
301307
- local: api/models/hunyuan_transformer2d
302308
title: HunyuanDiT2DModel
303309
- local: api/models/hunyuan_video_transformer_3d
304310
title: HunyuanVideoTransformer3DModel
305311
- local: api/models/latte_transformer3d
306312
title: LatteTransformer3DModel
307-
- local: api/models/lumina_nextdit2d
308-
title: LuminaNextDiT2DModel
309-
- local: api/models/lumina2_transformer2d
310-
title: Lumina2Transformer2DModel
311313
- local: api/models/ltx_video_transformer3d
312314
title: LTXVideoTransformer3DModel
315+
- local: api/models/lumina2_transformer2d
316+
title: Lumina2Transformer2DModel
317+
- local: api/models/lumina_nextdit2d
318+
title: LuminaNextDiT2DModel
313319
- local: api/models/mochi_transformer3d
314320
title: MochiTransformer3DModel
315321
- local: api/models/omnigen_transformer
@@ -318,10 +324,10 @@
318324
title: PixArtTransformer2DModel
319325
- local: api/models/prior_transformer
320326
title: PriorTransformer
321-
- local: api/models/sd3_transformer2d
322-
title: SD3Transformer2DModel
323327
- local: api/models/sana_transformer2d
324328
title: SanaTransformer2DModel
329+
- local: api/models/sd3_transformer2d
330+
title: SD3Transformer2DModel
325331
- local: api/models/stable_audio_transformer
326332
title: StableAudioDiTModel
327333
- local: api/models/transformer2d
@@ -336,10 +342,10 @@
336342
title: StableCascadeUNet
337343
- local: api/models/unet
338344
title: UNet1DModel
339-
- local: api/models/unet2d
340-
title: UNet2DModel
341345
- local: api/models/unet2d-cond
342346
title: UNet2DConditionModel
347+
- local: api/models/unet2d
348+
title: UNet2DModel
343349
- local: api/models/unet3d-cond
344350
title: UNet3DConditionModel
345351
- local: api/models/unet-motion
@@ -348,6 +354,10 @@
348354
title: UViT2DModel
349355
title: UNets
350356
- sections:
357+
- local: api/models/asymmetricautoencoderkl
358+
title: AsymmetricAutoencoderKL
359+
- local: api/models/autoencoder_dc
360+
title: AutoencoderDC
351361
- local: api/models/autoencoderkl
352362
title: AutoencoderKL
353363
- local: api/models/autoencoderkl_allegro
@@ -364,10 +374,6 @@
364374
title: AutoencoderKLMochi
365375
- local: api/models/autoencoder_kl_wan
366376
title: AutoencoderKLWan
367-
- local: api/models/asymmetricautoencoderkl
368-
title: AsymmetricAutoencoderKL
369-
- local: api/models/autoencoder_dc
370-
title: AutoencoderDC
371377
- local: api/models/consistency_decoder_vae
372378
title: ConsistencyDecoderVAE
373379
- local: api/models/autoencoder_oobleck
@@ -420,6 +426,8 @@
420426
title: ControlNet with Stable Diffusion 3
421427
- local: api/pipelines/controlnet_sdxl
422428
title: ControlNet with Stable Diffusion XL
429+
- local: api/pipelines/controlnet_sana
430+
title: ControlNet-Sana
423431
- local: api/pipelines/controlnetxs
424432
title: ControlNet-XS
425433
- local: api/pipelines/controlnetxs_sdxl
@@ -444,6 +452,8 @@
444452
title: Flux
445453
- local: api/pipelines/control_flux_inpaint
446454
title: FluxControlInpaint
455+
- local: api/pipelines/hidream
456+
title: HiDream-I1
447457
- local: api/pipelines/hunyuandit
448458
title: Hunyuan-DiT
449459
- local: api/pipelines/hunyuan_video
@@ -511,40 +521,40 @@
511521
- sections:
512522
- local: api/pipelines/stable_diffusion/overview
513523
title: Overview
514-
- local: api/pipelines/stable_diffusion/text2img
515-
title: Text-to-image
524+
- local: api/pipelines/stable_diffusion/depth2img
525+
title: Depth-to-image
526+
- local: api/pipelines/stable_diffusion/gligen
527+
title: GLIGEN (Grounded Language-to-Image Generation)
528+
- local: api/pipelines/stable_diffusion/image_variation
529+
title: Image variation
516530
- local: api/pipelines/stable_diffusion/img2img
517531
title: Image-to-image
518532
- local: api/pipelines/stable_diffusion/svd
519533
title: Image-to-video
520534
- local: api/pipelines/stable_diffusion/inpaint
521535
title: Inpainting
522-
- local: api/pipelines/stable_diffusion/depth2img
523-
title: Depth-to-image
524-
- local: api/pipelines/stable_diffusion/image_variation
525-
title: Image variation
536+
- local: api/pipelines/stable_diffusion/k_diffusion
537+
title: K-Diffusion
538+
- local: api/pipelines/stable_diffusion/latent_upscale
539+
title: Latent upscaler
540+
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
541+
title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
526542
- local: api/pipelines/stable_diffusion/stable_diffusion_safe
527543
title: Safe Stable Diffusion
544+
- local: api/pipelines/stable_diffusion/sdxl_turbo
545+
title: SDXL Turbo
528546
- local: api/pipelines/stable_diffusion/stable_diffusion_2
529547
title: Stable Diffusion 2
530548
- local: api/pipelines/stable_diffusion/stable_diffusion_3
531549
title: Stable Diffusion 3
532550
- local: api/pipelines/stable_diffusion/stable_diffusion_xl
533551
title: Stable Diffusion XL
534-
- local: api/pipelines/stable_diffusion/sdxl_turbo
535-
title: SDXL Turbo
536-
- local: api/pipelines/stable_diffusion/latent_upscale
537-
title: Latent upscaler
538552
- local: api/pipelines/stable_diffusion/upscale
539553
title: Super-resolution
540-
- local: api/pipelines/stable_diffusion/k_diffusion
541-
title: K-Diffusion
542-
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
543-
title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
544554
- local: api/pipelines/stable_diffusion/adapter
545555
title: T2I-Adapter
546-
- local: api/pipelines/stable_diffusion/gligen
547-
title: GLIGEN (Grounded Language-to-Image Generation)
556+
- local: api/pipelines/stable_diffusion/text2img
557+
title: Text-to-image
548558
title: Stable Diffusion
549559
- local: api/pipelines/stable_unclip
550560
title: Stable unCLIP

docs/source/en/api/loaders/lora.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,15 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
2020
- [`FluxLoraLoaderMixin`] provides similar functions for [Flux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux).
2121
- [`CogVideoXLoraLoaderMixin`] provides similar functions for [CogVideoX](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox).
2222
- [`Mochi1LoraLoaderMixin`] provides similar functions for [Mochi](https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi).
23+
- [`AuraFlowLoraLoaderMixin`] provides similar functions for [AuraFlow](https://huggingface.co/fal/AuraFlow).
2324
- [`LTXVideoLoraLoaderMixin`] provides similar functions for [LTX-Video](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video).
2425
- [`SanaLoraLoaderMixin`] provides similar functions for [Sana](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana).
2526
- [`HunyuanVideoLoraLoaderMixin`] provides similar functions for [HunyuanVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video).
2627
- [`Lumina2LoraLoaderMixin`] provides similar functions for [Lumina2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/lumina2).
28+
- [`WanLoraLoaderMixin`] provides similar functions for [Wan](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan).
29+
- [`CogView4LoraLoaderMixin`] provides similar functions for [CogView4](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogview4).
2730
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
31+
- [`HiDreamImageLoraLoaderMixin`] provides similar functions for [HiDream Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hidream)
2832
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
2933

3034
<Tip>
@@ -56,6 +60,9 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
5660
## Mochi1LoraLoaderMixin
5761

5862
[[autodoc]] loaders.lora_pipeline.Mochi1LoraLoaderMixin
63+
## AuraFlowLoraLoaderMixin
64+
65+
[[autodoc]] loaders.lora_pipeline.AuraFlowLoraLoaderMixin
5966

6067
## LTXVideoLoraLoaderMixin
6168

@@ -73,10 +80,22 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
7380

7481
[[autodoc]] loaders.lora_pipeline.Lumina2LoraLoaderMixin
7582

83+
## CogView4LoraLoaderMixin
84+
85+
[[autodoc]] loaders.lora_pipeline.CogView4LoraLoaderMixin
86+
87+
## WanLoraLoaderMixin
88+
89+
[[autodoc]] loaders.lora_pipeline.WanLoraLoaderMixin
90+
7691
## AmusedLoraLoaderMixin
7792

7893
[[autodoc]] loaders.lora_pipeline.AmusedLoraLoaderMixin
7994

95+
## HiDreamImageLoraLoaderMixin
96+
97+
[[autodoc]] loaders.lora_pipeline.HiDreamImageLoraLoaderMixin
98+
8099
## LoraBaseMixin
81100

82101
[[autodoc]] loaders.lora_base.LoraBaseMixin
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# AutoModel
14+
15+
The `AutoModel` is designed to make it easy to load a checkpoint without needing to know the specific model class. `AutoModel` automatically retrieves the correct model class from the checkpoint `config.json` file.
16+
17+
```python
18+
from diffusers import AutoModel, AutoPipelineForText2Image
19+
20+
unet = AutoModel.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="unet")
21+
pipe = AutoPipelineForText2Image.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", unet=unet)
22+
```
23+
24+
25+
## AutoModel
26+
27+
[[autodoc]] AutoModel
28+
- all
29+
- from_pretrained

docs/source/en/api/models/autoencoderkl_allegro.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
1818
```python
1919
from diffusers import AutoencoderKLAllegro
2020

21-
vae = AutoencoderKLCogVideoX.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32).to("cuda")
21+
vae = AutoencoderKLAllegro.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32).to("cuda")
2222
```
2323

2424
## AutoencoderKLAllegro
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# SanaControlNetModel
14+
15+
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
16+
17+
The abstract from the paper is:
18+
19+
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
20+
21+
This model was contributed by [ishan24](https://huggingface.co/ishan24). ❤️
22+
The original codebase can be found at [NVlabs/Sana](https://github.com/NVlabs/Sana), and you can find official ControlNet checkpoints on [Efficient-Large-Model's](https://huggingface.co/Efficient-Large-Model) Hub profile.
23+
24+
## SanaControlNetModel
25+
[[autodoc]] SanaControlNetModel
26+
27+
## SanaControlNetOutput
28+
[[autodoc]] models.controlnets.controlnet_sana.SanaControlNetOutput
29+
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# HiDreamImageTransformer2DModel
13+
14+
A Transformer model for image-like data from [HiDream-I1](https://huggingface.co/HiDream-ai).
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import HiDreamImageTransformer2DModel
20+
21+
transformer = HiDreamImageTransformer2DModel.from_pretrained("HiDream-ai/HiDream-I1-Full", subfolder="transformer", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## HiDreamImageTransformer2DModel
25+
26+
[[autodoc]] HiDreamImageTransformer2DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

docs/source/en/api/pipelines/aura_flow.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,23 @@ image = pipeline(prompt).images[0]
8989
image.save("auraflow.png")
9090
```
9191

92+
## Support for `torch.compile()`
93+
94+
AuraFlow can be compiled with `torch.compile()` to speed up inference latency even for different resolutions. First, install PyTorch nightly following the instructions from [here](https://pytorch.org/). The snippet below shows the changes needed to enable this:
95+
96+
```diff
97+
+ torch.fx.experimental._config.use_duck_shape = False
98+
+ pipeline.transformer = torch.compile(
99+
pipeline.transformer, fullgraph=True, dynamic=True
100+
)
101+
```
102+
103+
Specifying `use_duck_shape` to be `False` instructs the compiler if it should use the same symbolic variable to represent input sizes that are the same. For more details, check out [this comment](https://github.com/huggingface/diffusers/pull/11327#discussion_r2047659790).
104+
105+
This enables from 100% (on low resolutions) to a 30% (on 1536x1536 resolution) speed improvements.
106+
107+
Thanks to [AstraliteHeart](https://github.com/huggingface/diffusers/pull/11297/) who helped us rewrite the [`AuraFlowTransformer2DModel`] class so that the above works for different resolutions ([PR](https://github.com/huggingface/diffusers/pull/11297/)).
108+
92109
## AuraFlowPipeline
93110

94111
[[autodoc]] AuraFlowPipeline

0 commit comments

Comments
 (0)