Skip to content

Commit f0de830

Browse files
authored
Merge branch 'main' into improve-lora-fusion-tests
2 parents c610766 + b4be422 commit f0de830

File tree

160 files changed

+16386
-1523
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

160 files changed

+16386
-1523
lines changed

docker/diffusers-onnxruntime-cpu/Dockerfile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,9 @@ ENV PATH="/opt/venv/bin:$PATH"
2828
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
2929
RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
3030
python3 -m uv pip install --no-cache-dir \
31-
torch==2.1.2 \
32-
torchvision==0.16.2 \
33-
torchaudio==2.1.2 \
31+
torch \
32+
torchvision \
33+
torchaudio\
3434
onnxruntime \
3535
--extra-index-url https://download.pytorch.org/whl/cpu && \
3636
python3 -m uv pip install --no-cache-dir \

docs/source/en/_toctree.yml

Lines changed: 41 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@
175175
title: gguf
176176
- local: quantization/torchao
177177
title: torchao
178-
- local: quantization/quanto
178+
- local: quantization/quanto
179179
title: quanto
180180
title: Quantization Methods
181181
- sections:
@@ -270,16 +270,18 @@
270270
- sections:
271271
- local: api/models/controlnet
272272
title: ControlNetModel
273+
- local: api/models/controlnet_union
274+
title: ControlNetUnionModel
273275
- local: api/models/controlnet_flux
274276
title: FluxControlNetModel
275277
- local: api/models/controlnet_hunyuandit
276278
title: HunyuanDiT2DControlNetModel
279+
- local: api/models/controlnet_sana
280+
title: SanaControlNetModel
277281
- local: api/models/controlnet_sd3
278282
title: SD3ControlNetModel
279283
- local: api/models/controlnet_sparsectrl
280284
title: SparseControlNetModel
281-
- local: api/models/controlnet_union
282-
title: ControlNetUnionModel
283285
title: ControlNets
284286
- sections:
285287
- local: api/models/allegro_transformer3d
@@ -288,30 +290,32 @@
288290
title: AuraFlowTransformer2DModel
289291
- local: api/models/cogvideox_transformer3d
290292
title: CogVideoXTransformer3DModel
291-
- local: api/models/consisid_transformer3d
292-
title: ConsisIDTransformer3DModel
293293
- local: api/models/cogview3plus_transformer2d
294294
title: CogView3PlusTransformer2DModel
295295
- local: api/models/cogview4_transformer2d
296296
title: CogView4Transformer2DModel
297+
- local: api/models/consisid_transformer3d
298+
title: ConsisIDTransformer3DModel
297299
- local: api/models/dit_transformer2d
298300
title: DiTTransformer2DModel
299301
- local: api/models/easyanimate_transformer3d
300302
title: EasyAnimateTransformer3DModel
301303
- local: api/models/flux_transformer
302304
title: FluxTransformer2DModel
305+
- local: api/models/hidream_image_transformer
306+
title: HiDreamImageTransformer2DModel
303307
- local: api/models/hunyuan_transformer2d
304308
title: HunyuanDiT2DModel
305309
- local: api/models/hunyuan_video_transformer_3d
306310
title: HunyuanVideoTransformer3DModel
307311
- local: api/models/latte_transformer3d
308312
title: LatteTransformer3DModel
309-
- local: api/models/lumina_nextdit2d
310-
title: LuminaNextDiT2DModel
311-
- local: api/models/lumina2_transformer2d
312-
title: Lumina2Transformer2DModel
313313
- local: api/models/ltx_video_transformer3d
314314
title: LTXVideoTransformer3DModel
315+
- local: api/models/lumina2_transformer2d
316+
title: Lumina2Transformer2DModel
317+
- local: api/models/lumina_nextdit2d
318+
title: LuminaNextDiT2DModel
315319
- local: api/models/mochi_transformer3d
316320
title: MochiTransformer3DModel
317321
- local: api/models/omnigen_transformer
@@ -320,10 +324,10 @@
320324
title: PixArtTransformer2DModel
321325
- local: api/models/prior_transformer
322326
title: PriorTransformer
323-
- local: api/models/sd3_transformer2d
324-
title: SD3Transformer2DModel
325327
- local: api/models/sana_transformer2d
326328
title: SanaTransformer2DModel
329+
- local: api/models/sd3_transformer2d
330+
title: SD3Transformer2DModel
327331
- local: api/models/stable_audio_transformer
328332
title: StableAudioDiTModel
329333
- local: api/models/transformer2d
@@ -338,10 +342,10 @@
338342
title: StableCascadeUNet
339343
- local: api/models/unet
340344
title: UNet1DModel
341-
- local: api/models/unet2d
342-
title: UNet2DModel
343345
- local: api/models/unet2d-cond
344346
title: UNet2DConditionModel
347+
- local: api/models/unet2d
348+
title: UNet2DModel
345349
- local: api/models/unet3d-cond
346350
title: UNet3DConditionModel
347351
- local: api/models/unet-motion
@@ -350,6 +354,10 @@
350354
title: UViT2DModel
351355
title: UNets
352356
- sections:
357+
- local: api/models/asymmetricautoencoderkl
358+
title: AsymmetricAutoencoderKL
359+
- local: api/models/autoencoder_dc
360+
title: AutoencoderDC
353361
- local: api/models/autoencoderkl
354362
title: AutoencoderKL
355363
- local: api/models/autoencoderkl_allegro
@@ -366,10 +374,6 @@
366374
title: AutoencoderKLMochi
367375
- local: api/models/autoencoder_kl_wan
368376
title: AutoencoderKLWan
369-
- local: api/models/asymmetricautoencoderkl
370-
title: AsymmetricAutoencoderKL
371-
- local: api/models/autoencoder_dc
372-
title: AutoencoderDC
373377
- local: api/models/consistency_decoder_vae
374378
title: ConsistencyDecoderVAE
375379
- local: api/models/autoencoder_oobleck
@@ -422,6 +426,8 @@
422426
title: ControlNet with Stable Diffusion 3
423427
- local: api/pipelines/controlnet_sdxl
424428
title: ControlNet with Stable Diffusion XL
429+
- local: api/pipelines/controlnet_sana
430+
title: ControlNet-Sana
425431
- local: api/pipelines/controlnetxs
426432
title: ControlNet-XS
427433
- local: api/pipelines/controlnetxs_sdxl
@@ -446,6 +452,8 @@
446452
title: Flux
447453
- local: api/pipelines/control_flux_inpaint
448454
title: FluxControlInpaint
455+
- local: api/pipelines/hidream
456+
title: HiDream-I1
449457
- local: api/pipelines/hunyuandit
450458
title: Hunyuan-DiT
451459
- local: api/pipelines/hunyuan_video
@@ -513,40 +521,40 @@
513521
- sections:
514522
- local: api/pipelines/stable_diffusion/overview
515523
title: Overview
516-
- local: api/pipelines/stable_diffusion/text2img
517-
title: Text-to-image
524+
- local: api/pipelines/stable_diffusion/depth2img
525+
title: Depth-to-image
526+
- local: api/pipelines/stable_diffusion/gligen
527+
title: GLIGEN (Grounded Language-to-Image Generation)
528+
- local: api/pipelines/stable_diffusion/image_variation
529+
title: Image variation
518530
- local: api/pipelines/stable_diffusion/img2img
519531
title: Image-to-image
520532
- local: api/pipelines/stable_diffusion/svd
521533
title: Image-to-video
522534
- local: api/pipelines/stable_diffusion/inpaint
523535
title: Inpainting
524-
- local: api/pipelines/stable_diffusion/depth2img
525-
title: Depth-to-image
526-
- local: api/pipelines/stable_diffusion/image_variation
527-
title: Image variation
536+
- local: api/pipelines/stable_diffusion/k_diffusion
537+
title: K-Diffusion
538+
- local: api/pipelines/stable_diffusion/latent_upscale
539+
title: Latent upscaler
540+
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
541+
title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
528542
- local: api/pipelines/stable_diffusion/stable_diffusion_safe
529543
title: Safe Stable Diffusion
544+
- local: api/pipelines/stable_diffusion/sdxl_turbo
545+
title: SDXL Turbo
530546
- local: api/pipelines/stable_diffusion/stable_diffusion_2
531547
title: Stable Diffusion 2
532548
- local: api/pipelines/stable_diffusion/stable_diffusion_3
533549
title: Stable Diffusion 3
534550
- local: api/pipelines/stable_diffusion/stable_diffusion_xl
535551
title: Stable Diffusion XL
536-
- local: api/pipelines/stable_diffusion/sdxl_turbo
537-
title: SDXL Turbo
538-
- local: api/pipelines/stable_diffusion/latent_upscale
539-
title: Latent upscaler
540552
- local: api/pipelines/stable_diffusion/upscale
541553
title: Super-resolution
542-
- local: api/pipelines/stable_diffusion/k_diffusion
543-
title: K-Diffusion
544-
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
545-
title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
546554
- local: api/pipelines/stable_diffusion/adapter
547555
title: T2I-Adapter
548-
- local: api/pipelines/stable_diffusion/gligen
549-
title: GLIGEN (Grounded Language-to-Image Generation)
556+
- local: api/pipelines/stable_diffusion/text2img
557+
title: Text-to-image
550558
title: Stable Diffusion
551559
- local: api/pipelines/stable_unclip
552560
title: Stable unCLIP

docs/source/en/api/loaders/lora.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,15 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
2020
- [`FluxLoraLoaderMixin`] provides similar functions for [Flux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux).
2121
- [`CogVideoXLoraLoaderMixin`] provides similar functions for [CogVideoX](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox).
2222
- [`Mochi1LoraLoaderMixin`] provides similar functions for [Mochi](https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi).
23+
- [`AuraFlowLoraLoaderMixin`] provides similar functions for [AuraFlow](https://huggingface.co/fal/AuraFlow).
2324
- [`LTXVideoLoraLoaderMixin`] provides similar functions for [LTX-Video](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video).
2425
- [`SanaLoraLoaderMixin`] provides similar functions for [Sana](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana).
2526
- [`HunyuanVideoLoraLoaderMixin`] provides similar functions for [HunyuanVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video).
2627
- [`Lumina2LoraLoaderMixin`] provides similar functions for [Lumina2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/lumina2).
28+
- [`WanLoraLoaderMixin`] provides similar functions for [Wan](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan).
29+
- [`CogView4LoraLoaderMixin`] provides similar functions for [CogView4](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogview4).
2730
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
31+
- [`HiDreamImageLoraLoaderMixin`] provides similar functions for [HiDream Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hidream)
2832
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
2933

3034
<Tip>
@@ -56,6 +60,9 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
5660
## Mochi1LoraLoaderMixin
5761

5862
[[autodoc]] loaders.lora_pipeline.Mochi1LoraLoaderMixin
63+
## AuraFlowLoraLoaderMixin
64+
65+
[[autodoc]] loaders.lora_pipeline.AuraFlowLoraLoaderMixin
5966

6067
## LTXVideoLoraLoaderMixin
6168

@@ -73,10 +80,22 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
7380

7481
[[autodoc]] loaders.lora_pipeline.Lumina2LoraLoaderMixin
7582

83+
## CogView4LoraLoaderMixin
84+
85+
[[autodoc]] loaders.lora_pipeline.CogView4LoraLoaderMixin
86+
87+
## WanLoraLoaderMixin
88+
89+
[[autodoc]] loaders.lora_pipeline.WanLoraLoaderMixin
90+
7691
## AmusedLoraLoaderMixin
7792

7893
[[autodoc]] loaders.lora_pipeline.AmusedLoraLoaderMixin
7994

95+
## HiDreamImageLoraLoaderMixin
96+
97+
[[autodoc]] loaders.lora_pipeline.HiDreamImageLoraLoaderMixin
98+
8099
## LoraBaseMixin
81100

82101
[[autodoc]] loaders.lora_base.LoraBaseMixin

docs/source/en/api/models/autoencoderkl_allegro.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
1818
```python
1919
from diffusers import AutoencoderKLAllegro
2020

21-
vae = AutoencoderKLCogVideoX.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32).to("cuda")
21+
vae = AutoencoderKLAllegro.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32).to("cuda")
2222
```
2323

2424
## AutoencoderKLAllegro
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# SanaControlNetModel
14+
15+
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
16+
17+
The abstract from the paper is:
18+
19+
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
20+
21+
This model was contributed by [ishan24](https://huggingface.co/ishan24). ❤️
22+
The original codebase can be found at [NVlabs/Sana](https://github.com/NVlabs/Sana), and you can find official ControlNet checkpoints on [Efficient-Large-Model's](https://huggingface.co/Efficient-Large-Model) Hub profile.
23+
24+
## SanaControlNetModel
25+
[[autodoc]] SanaControlNetModel
26+
27+
## SanaControlNetOutput
28+
[[autodoc]] models.controlnets.controlnet_sana.SanaControlNetOutput
29+
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# HiDreamImageTransformer2DModel
13+
14+
A Transformer model for image-like data from [HiDream-I1](https://huggingface.co/HiDream-ai).
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import HiDreamImageTransformer2DModel
20+
21+
transformer = HiDreamImageTransformer2DModel.from_pretrained("HiDream-ai/HiDream-I1-Full", subfolder="transformer", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## HiDreamImageTransformer2DModel
25+
26+
[[autodoc]] HiDreamImageTransformer2DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

docs/source/en/api/pipelines/aura_flow.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,23 @@ image = pipeline(prompt).images[0]
8989
image.save("auraflow.png")
9090
```
9191

92+
## Support for `torch.compile()`
93+
94+
AuraFlow can be compiled with `torch.compile()` to speed up inference latency even for different resolutions. First, install PyTorch nightly following the instructions from [here](https://pytorch.org/). The snippet below shows the changes needed to enable this:
95+
96+
```diff
97+
+ torch.fx.experimental._config.use_duck_shape = False
98+
+ pipeline.transformer = torch.compile(
99+
pipeline.transformer, fullgraph=True, dynamic=True
100+
)
101+
```
102+
103+
Specifying `use_duck_shape` to be `False` instructs the compiler if it should use the same symbolic variable to represent input sizes that are the same. For more details, check out [this comment](https://github.com/huggingface/diffusers/pull/11327#discussion_r2047659790).
104+
105+
This enables from 100% (on low resolutions) to a 30% (on 1536x1536 resolution) speed improvements.
106+
107+
Thanks to [AstraliteHeart](https://github.com/huggingface/diffusers/pull/11297/) who helped us rewrite the [`AuraFlowTransformer2DModel`] class so that the above works for different resolutions ([PR](https://github.com/huggingface/diffusers/pull/11297/)).
108+
92109
## AuraFlowPipeline
93110

94111
[[autodoc]] AuraFlowPipeline
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# ControlNet
14+
15+
<div class="flex flex-wrap space-x-1">
16+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
</div>
18+
19+
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
20+
21+
With a ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
22+
23+
The abstract from the paper is:
24+
25+
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
26+
27+
This pipeline was contributed by [ishan24](https://huggingface.co/ishan24). ❤️
28+
The original codebase can be found at [NVlabs/Sana](https://github.com/NVlabs/Sana), and you can find official ControlNet checkpoints on [Efficient-Large-Model's](https://huggingface.co/Efficient-Large-Model) Hub profile.
29+
30+
## SanaControlNetPipeline
31+
[[autodoc]] SanaControlNetPipeline
32+
- all
33+
- __call__
34+
35+
## SanaPipelineOutput
36+
[[autodoc]] pipelines.sana.pipeline_output.SanaPipelineOutput

0 commit comments

Comments
 (0)