Skip to content

Commit 59a3ba4

Browse files
authored
Merge branch 'main' into ltx-rename
2 parents bfc6197 + 2739241 commit 59a3ba4

File tree

109 files changed

+11821
-355
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

109 files changed

+11821
-355
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -357,6 +357,8 @@ jobs:
357357
config:
358358
- backend: "bitsandbytes"
359359
test_location: "bnb"
360+
- backend: "gguf"
361+
test_location: "gguf"
360362
runs-on:
361363
group: aws-g6e-xlarge-plus
362364
container:

.github/workflows/push_tests.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,8 @@ jobs:
165165
group: gcp-ct5lp-hightpu-8t
166166
container:
167167
image: diffusers/diffusers-flax-tpu
168-
options: --shm-size "16gb" --ipc host --privileged ${{ vars.V5_LITEPOD_8_ENV}} -v /mnt/hf_cache:/mnt/hf_cache defaults:
168+
options: --shm-size "16gb" --ipc host --privileged ${{ vars.V5_LITEPOD_8_ENV}} -v /mnt/hf_cache:/mnt/hf_cache
169+
defaults:
169170
run:
170171
shell: bash
171172
steps:

docs/source/en/_toctree.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,10 @@
157157
title: Getting Started
158158
- local: quantization/bitsandbytes
159159
title: bitsandbytes
160+
- local: quantization/gguf
161+
title: gguf
162+
- local: quantization/torchao
163+
title: torchao
160164
title: Quantization Methods
161165
- sections:
162166
- local: optimization/fp16
@@ -270,6 +274,8 @@
270274
title: FluxTransformer2DModel
271275
- local: api/models/hunyuan_transformer2d
272276
title: HunyuanDiT2DModel
277+
- local: api/models/hunyuan_video_transformer_3d
278+
title: HunyuanVideoTransformer3DModel
273279
- local: api/models/latte_transformer3d
274280
title: LatteTransformer3DModel
275281
- local: api/models/lumina_nextdit2d
@@ -284,6 +290,8 @@
284290
title: PriorTransformer
285291
- local: api/models/sd3_transformer2d
286292
title: SD3Transformer2DModel
293+
- local: api/models/sana_transformer2d
294+
title: SanaTransformer2DModel
287295
- local: api/models/stable_audio_transformer
288296
title: StableAudioDiTModel
289297
- local: api/models/transformer2d
@@ -314,6 +322,8 @@
314322
title: AutoencoderKLAllegro
315323
- local: api/models/autoencoderkl_cogvideox
316324
title: AutoencoderKLCogVideoX
325+
- local: api/models/autoencoder_kl_hunyuan_video
326+
title: AutoencoderKLHunyuanVideo
317327
- local: api/models/autoencoderkl_ltx_video
318328
title: AutoencoderKLLTXVideo
319329
- local: api/models/autoencoderkl_mochi
@@ -392,6 +402,8 @@
392402
title: Flux
393403
- local: api/pipelines/hunyuandit
394404
title: Hunyuan-DiT
405+
- local: api/pipelines/hunyuan_video
406+
title: HunyuanVideo
395407
- local: api/pipelines/i2vgenxl
396408
title: I2VGen-XL
397409
- local: api/pipelines/pix2pix
@@ -434,6 +446,8 @@
434446
title: PixArt-α
435447
- local: api/pipelines/pixart_sigma
436448
title: PixArt-Σ
449+
- local: api/pipelines/sana
450+
title: Sana
437451
- local: api/pipelines/self_attention_guidance
438452
title: Self-Attention Guidance
439453
- local: api/pipelines/semantic_stable_diffusion

docs/source/en/api/attnprocessor.md

Lines changed: 104 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -15,40 +15,133 @@ specific language governing permissions and limitations under the License.
1515
An attention processor is a class for applying different types of attention mechanisms.
1616

1717
## AttnProcessor
18+
1819
[[autodoc]] models.attention_processor.AttnProcessor
1920

20-
## AttnProcessor2_0
2121
[[autodoc]] models.attention_processor.AttnProcessor2_0
2222

23-
## AttnAddedKVProcessor
2423
[[autodoc]] models.attention_processor.AttnAddedKVProcessor
2524

26-
## AttnAddedKVProcessor2_0
2725
[[autodoc]] models.attention_processor.AttnAddedKVProcessor2_0
2826

27+
[[autodoc]] models.attention_processor.AttnProcessorNPU
28+
29+
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
30+
31+
## Allegro
32+
33+
[[autodoc]] models.attention_processor.AllegroAttnProcessor2_0
34+
35+
## AuraFlow
36+
37+
[[autodoc]] models.attention_processor.AuraFlowAttnProcessor2_0
38+
39+
[[autodoc]] models.attention_processor.FusedAuraFlowAttnProcessor2_0
40+
41+
## CogVideoX
42+
43+
[[autodoc]] models.attention_processor.CogVideoXAttnProcessor2_0
44+
45+
[[autodoc]] models.attention_processor.FusedCogVideoXAttnProcessor2_0
46+
2947
## CrossFrameAttnProcessor
48+
3049
[[autodoc]] pipelines.text_to_video_synthesis.pipeline_text_to_video_zero.CrossFrameAttnProcessor
3150

32-
## CustomDiffusionAttnProcessor
51+
## Custom Diffusion
52+
3353
[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor
3454

35-
## CustomDiffusionAttnProcessor2_0
3655
[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor2_0
3756

38-
## CustomDiffusionXFormersAttnProcessor
3957
[[autodoc]] models.attention_processor.CustomDiffusionXFormersAttnProcessor
4058

41-
## FusedAttnProcessor2_0
42-
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
59+
## Flux
60+
61+
[[autodoc]] models.attention_processor.FluxAttnProcessor2_0
62+
63+
[[autodoc]] models.attention_processor.FusedFluxAttnProcessor2_0
64+
65+
[[autodoc]] models.attention_processor.FluxSingleAttnProcessor2_0
66+
67+
## Hunyuan
68+
69+
[[autodoc]] models.attention_processor.HunyuanAttnProcessor2_0
70+
71+
[[autodoc]] models.attention_processor.FusedHunyuanAttnProcessor2_0
72+
73+
[[autodoc]] models.attention_processor.PAGHunyuanAttnProcessor2_0
74+
75+
[[autodoc]] models.attention_processor.PAGCFGHunyuanAttnProcessor2_0
76+
77+
## IdentitySelfAttnProcessor2_0
78+
79+
[[autodoc]] models.attention_processor.PAGIdentitySelfAttnProcessor2_0
80+
81+
[[autodoc]] models.attention_processor.PAGCFGIdentitySelfAttnProcessor2_0
82+
83+
## IP-Adapter
84+
85+
[[autodoc]] models.attention_processor.IPAdapterAttnProcessor
86+
87+
[[autodoc]] models.attention_processor.IPAdapterAttnProcessor2_0
88+
89+
## JointAttnProcessor2_0
90+
91+
[[autodoc]] models.attention_processor.JointAttnProcessor2_0
92+
93+
[[autodoc]] models.attention_processor.PAGJointAttnProcessor2_0
94+
95+
[[autodoc]] models.attention_processor.PAGCFGJointAttnProcessor2_0
96+
97+
[[autodoc]] models.attention_processor.FusedJointAttnProcessor2_0
98+
99+
## LoRA
100+
101+
[[autodoc]] models.attention_processor.LoRAAttnProcessor
102+
103+
[[autodoc]] models.attention_processor.LoRAAttnProcessor2_0
104+
105+
[[autodoc]] models.attention_processor.LoRAAttnAddedKVProcessor
106+
107+
[[autodoc]] models.attention_processor.LoRAXFormersAttnProcessor
108+
109+
## Lumina-T2X
110+
111+
[[autodoc]] models.attention_processor.LuminaAttnProcessor2_0
112+
113+
## Mochi
114+
115+
[[autodoc]] models.attention_processor.MochiAttnProcessor2_0
116+
117+
[[autodoc]] models.attention_processor.MochiVaeAttnProcessor2_0
118+
119+
## Sana
120+
121+
[[autodoc]] models.attention_processor.SanaLinearAttnProcessor2_0
122+
123+
[[autodoc]] models.attention_processor.SanaMultiscaleAttnProcessor2_0
124+
125+
[[autodoc]] models.attention_processor.PAGCFGSanaLinearAttnProcessor2_0
126+
127+
[[autodoc]] models.attention_processor.PAGIdentitySanaLinearAttnProcessor2_0
128+
129+
## Stable Audio
130+
131+
[[autodoc]] models.attention_processor.StableAudioAttnProcessor2_0
43132

44133
## SlicedAttnProcessor
134+
45135
[[autodoc]] models.attention_processor.SlicedAttnProcessor
46136

47-
## SlicedAttnAddedKVProcessor
48137
[[autodoc]] models.attention_processor.SlicedAttnAddedKVProcessor
49138

50139
## XFormersAttnProcessor
140+
51141
[[autodoc]] models.attention_processor.XFormersAttnProcessor
52142

53-
## AttnProcessorNPU
54-
[[autodoc]] models.attention_processor.AttnProcessorNPU
143+
[[autodoc]] models.attention_processor.XFormersAttnAddedKVProcessor
144+
145+
## XLAFlashAttnProcessor2_0
146+
147+
[[autodoc]] models.attention_processor.XLAFlashAttnProcessor2_0

docs/source/en/api/loaders/lora.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
1717
- [`StableDiffusionLoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
1818
- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`StableDiffusionLoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
1919
- [`SD3LoraLoaderMixin`] provides similar functions for [Stable Diffusion 3](https://huggingface.co/blog/sd3).
20+
- [`FluxLoraLoaderMixin`] provides similar functions for [Flux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux).
21+
- [`CogVideoXLoraLoaderMixin`] provides similar functions for [CogVideoX](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox).
22+
- [`Mochi1LoraLoaderMixin`] provides similar functions for [Mochi](https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi).
2023
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
2124
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
2225

@@ -38,6 +41,18 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
3841

3942
[[autodoc]] loaders.lora_pipeline.SD3LoraLoaderMixin
4043

44+
## FluxLoraLoaderMixin
45+
46+
[[autodoc]] loaders.lora_pipeline.FluxLoraLoaderMixin
47+
48+
## CogVideoXLoraLoaderMixin
49+
50+
[[autodoc]] loaders.lora_pipeline.CogVideoXLoraLoaderMixin
51+
52+
## Mochi1LoraLoaderMixin
53+
54+
[[autodoc]] loaders.lora_pipeline.Mochi1LoraLoaderMixin
55+
4156
## AmusedLoraLoaderMixin
4257

4358
[[autodoc]] loaders.lora_pipeline.AmusedLoraLoaderMixin

docs/source/en/api/models/autoencoder_dc.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ The following DCAE models are released and supported in Diffusers.
2929
| [`mit-han-lab/dc-ae-f128c512-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0)
3030
| [`mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0)
3131

32+
This model was contributed by [lawrence-cj](https://github.com/lawrence-cj).
33+
3234
Load a model in Diffusers format with [`~ModelMixin.from_pretrained`].
3335

3436
```python
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLHunyuanVideo
13+
14+
The 3D variational autoencoder (VAE) model with KL loss used in [HunyuanVideo](https://github.com/Tencent/HunyuanVideo/), which was introduced in [HunyuanVideo: A Systematic Framework For Large Video Generative Models](https://huggingface.co/papers/2412.03603) by Tencent.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLHunyuanVideo
20+
21+
vae = AutoencoderKLHunyuanVideo.from_pretrained("tencent/HunyuanVideo", torch_dtype=torch.float16)
22+
```
23+
24+
## AutoencoderKLHunyuanVideo
25+
26+
[[autodoc]] AutoencoderKLHunyuanVideo
27+
- decode
28+
- all
29+
30+
## DecoderOutput
31+
32+
[[autodoc]] models.autoencoders.vae.DecoderOutput
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# HunyuanVideoTransformer3DModel
13+
14+
A Diffusion Transformer model for 3D video-like data was introduced in [HunyuanVideo: A Systematic Framework For Large Video Generative Models](https://huggingface.co/papers/2412.03603) by Tencent.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import HunyuanVideoTransformer3DModel
20+
21+
transformer = HunyuanVideoTransformer3DModel.from_pretrained("tencent/HunyuanVideo", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## HunyuanVideoTransformer3DModel
25+
26+
[[autodoc]] HunyuanVideoTransformer3DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# SanaTransformer2DModel
13+
14+
A Diffusion Transformer model for 2D data from [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://huggingface.co/papers/2410.10629) was introduced from NVIDIA and MIT HAN Lab, by Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, Song Han.
15+
16+
The abstract from the paper is:
17+
18+
*We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096×4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include: (1) Deep compression autoencoder: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens. (2) Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality. (3) Decoder-only text encoder: we replaced T5 with modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment. (4) Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence. As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024×1024 resolution image. Sana enables content creation at low cost. Code and model will be publicly released.*
19+
20+
The model can be loaded with the following code snippet.
21+
22+
```python
23+
from diffusers import SanaTransformer2DModel
24+
25+
transformer = SanaTransformer2DModel.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_diffusers", subfolder="transformer", torch_dtype=torch.float16)
26+
```
27+
28+
## SanaTransformer2DModel
29+
30+
[[autodoc]] SanaTransformer2DModel
31+
32+
## Transformer2DModelOutput
33+
34+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

0 commit comments

Comments
 (0)