Skip to content

Commit d46ac3e

Browse files
authored
Merge branch 'main' into Add-AnyText
2 parents 9c43a65 + 97abdd2 commit d46ac3e

File tree

411 files changed

+20272
-2901
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

411 files changed

+20272
-2901
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -265,7 +265,7 @@ jobs:
265265
266266
- name: Run PyTorch CUDA tests
267267
env:
268-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
268+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
269269
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
270270
CUBLAS_WORKSPACE_CONFIG: :16:8
271271
run: |
@@ -505,7 +505,7 @@ jobs:
505505
# shell: arch -arch arm64 bash {0}
506506
# env:
507507
# HF_HOME: /System/Volumes/Data/mnt/cache
508-
# HF_TOKEN: ${{ secrets.HF_TOKEN }}
508+
# HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
509509
# run: |
510510
# ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
511511
# --report-log=tests_torch_mps.log \
@@ -561,7 +561,7 @@ jobs:
561561
# shell: arch -arch arm64 bash {0}
562562
# env:
563563
# HF_HOME: /System/Volumes/Data/mnt/cache
564-
# HF_TOKEN: ${{ secrets.HF_TOKEN }}
564+
# HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
565565
# run: |
566566
# ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
567567
# --report-log=tests_torch_mps.log \

.github/workflows/push_tests.yml

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ jobs:
8383
python utils/print_env.py
8484
- name: PyTorch CUDA checkpoint tests on Ubuntu
8585
env:
86-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
86+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
8787
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
8888
CUBLAS_WORKSPACE_CONFIG: :16:8
8989
run: |
@@ -137,7 +137,7 @@ jobs:
137137
138138
- name: Run PyTorch CUDA tests
139139
env:
140-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
140+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
141141
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
142142
CUBLAS_WORKSPACE_CONFIG: :16:8
143143
run: |
@@ -187,7 +187,7 @@ jobs:
187187
188188
- name: Run Flax TPU tests
189189
env:
190-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
190+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
191191
run: |
192192
python -m pytest -n 0 \
193193
-s -v -k "Flax" \
@@ -235,7 +235,7 @@ jobs:
235235
236236
- name: Run ONNXRuntime CUDA tests
237237
env:
238-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
238+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
239239
run: |
240240
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
241241
-s -v -k "Onnx" \
@@ -283,7 +283,7 @@ jobs:
283283
python utils/print_env.py
284284
- name: Run example tests on GPU
285285
env:
286-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
286+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
287287
RUN_COMPILE: yes
288288
run: |
289289
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
@@ -326,7 +326,7 @@ jobs:
326326
python utils/print_env.py
327327
- name: Run example tests on GPU
328328
env:
329-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
329+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
330330
run: |
331331
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "xformers" --make-reports=tests_torch_xformers_cuda tests/
332332
- name: Failure short reports
@@ -372,7 +372,7 @@ jobs:
372372
373373
- name: Run example tests on GPU
374374
env:
375-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
375+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
376376
run: |
377377
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
378378
python -m uv pip install timm

.github/workflows/release_tests_fast.yml

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ jobs:
8181
python utils/print_env.py
8282
- name: Slow PyTorch CUDA checkpoint tests on Ubuntu
8383
env:
84-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
84+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
8585
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
8686
CUBLAS_WORKSPACE_CONFIG: :16:8
8787
run: |
@@ -135,7 +135,7 @@ jobs:
135135
136136
- name: Run PyTorch CUDA tests
137137
env:
138-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
138+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
139139
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
140140
CUBLAS_WORKSPACE_CONFIG: :16:8
141141
run: |
@@ -186,7 +186,7 @@ jobs:
186186
187187
- name: Run PyTorch CUDA tests
188188
env:
189-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
189+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
190190
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
191191
CUBLAS_WORKSPACE_CONFIG: :16:8
192192
run: |
@@ -241,7 +241,7 @@ jobs:
241241
242242
- name: Run slow Flax TPU tests
243243
env:
244-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
244+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
245245
run: |
246246
python -m pytest -n 0 \
247247
-s -v -k "Flax" \
@@ -289,7 +289,7 @@ jobs:
289289
290290
- name: Run slow ONNXRuntime CUDA tests
291291
env:
292-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
292+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
293293
run: |
294294
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
295295
-s -v -k "Onnx" \
@@ -337,7 +337,7 @@ jobs:
337337
python utils/print_env.py
338338
- name: Run example tests on GPU
339339
env:
340-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
340+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
341341
RUN_COMPILE: yes
342342
run: |
343343
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
@@ -380,7 +380,7 @@ jobs:
380380
python utils/print_env.py
381381
- name: Run example tests on GPU
382382
env:
383-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
383+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
384384
run: |
385385
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "xformers" --make-reports=tests_torch_xformers_cuda tests/
386386
- name: Failure short reports
@@ -426,7 +426,7 @@ jobs:
426426
427427
- name: Run example tests on GPU
428428
env:
429-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
429+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
430430
run: |
431431
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
432432
python -m uv pip install timm

.github/workflows/trufflehog.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,3 +13,6 @@ jobs:
1313
fetch-depth: 0
1414
- name: Secret Scanning
1515
uses: trufflesecurity/trufflehog@main
16+
with:
17+
extra_args: --results=verified,unknown
18+

docs/source/en/_toctree.yml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,8 @@
7979
- sections:
8080
- local: using-diffusers/cogvideox
8181
title: CogVideoX
82+
- local: using-diffusers/consisid
83+
title: ConsisID
8284
- local: using-diffusers/sdxl
8385
title: Stable Diffusion XL
8486
- local: using-diffusers/sdxl_turbo
@@ -87,6 +89,8 @@
8789
title: Kandinsky
8890
- local: using-diffusers/ip_adapter
8991
title: IP-Adapter
92+
- local: using-diffusers/omnigen
93+
title: OmniGen
9094
- local: using-diffusers/pag
9195
title: PAG
9296
- local: using-diffusers/controlnet
@@ -179,6 +183,8 @@
179183
title: TGATE
180184
- local: optimization/xdit
181185
title: xDiT
186+
- local: optimization/para_attn
187+
title: ParaAttention
182188
- sections:
183189
- local: using-diffusers/stable_diffusion_jax_how_to
184190
title: JAX/Flax
@@ -268,6 +274,8 @@
268274
title: AuraFlowTransformer2DModel
269275
- local: api/models/cogvideox_transformer3d
270276
title: CogVideoXTransformer3DModel
277+
- local: api/models/consisid_transformer3d
278+
title: ConsisIDTransformer3DModel
271279
- local: api/models/cogview3plus_transformer2d
272280
title: CogView3PlusTransformer2DModel
273281
- local: api/models/dit_transformer2d
@@ -282,10 +290,14 @@
282290
title: LatteTransformer3DModel
283291
- local: api/models/lumina_nextdit2d
284292
title: LuminaNextDiT2DModel
293+
- local: api/models/lumina2_transformer2d
294+
title: Lumina2Transformer2DModel
285295
- local: api/models/ltx_video_transformer3d
286296
title: LTXVideoTransformer3DModel
287297
- local: api/models/mochi_transformer3d
288298
title: MochiTransformer3DModel
299+
- local: api/models/omnigen_transformer
300+
title: OmniGenTransformer2DModel
289301
- local: api/models/pixart_transformer2d
290302
title: PixArtTransformer2DModel
291303
- local: api/models/prior_transformer
@@ -370,6 +382,8 @@
370382
title: CogVideoX
371383
- local: api/pipelines/cogview3
372384
title: CogView3
385+
- local: api/pipelines/consisid
386+
title: ConsisID
373387
- local: api/pipelines/consistency_models
374388
title: Consistency Models
375389
- local: api/pipelines/controlnet
@@ -430,6 +444,8 @@
430444
title: LEDITS++
431445
- local: api/pipelines/ltx_video
432446
title: LTXVideo
447+
- local: api/pipelines/lumina2
448+
title: Lumina 2.0
433449
- local: api/pipelines/lumina
434450
title: Lumina-T2X
435451
- local: api/pipelines/marigold
@@ -440,6 +456,8 @@
440456
title: MultiDiffusion
441457
- local: api/pipelines/musicldm
442458
title: MusicLDM
459+
- local: api/pipelines/omnigen
460+
title: OmniGen
443461
- local: api/pipelines/pag
444462
title: PAG
445463
- local: api/pipelines/paint_by_example
@@ -590,6 +608,8 @@
590608
title: Attention Processor
591609
- local: api/activations
592610
title: Custom activation functions
611+
- local: api/cache
612+
title: Caching methods
593613
- local: api/normalization
594614
title: Custom normalization layers
595615
- local: api/utilities

docs/source/en/api/cache.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# Caching methods
13+
14+
## Pyramid Attention Broadcast
15+
16+
[Pyramid Attention Broadcast](https://huggingface.co/papers/2408.12588) from Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang You.
17+
18+
Pyramid Attention Broadcast (PAB) is a method that speeds up inference in diffusion models by systematically skipping attention computations between successive inference steps and reusing cached attention states. The attention states are not very different between successive inference steps. The most prominent difference is in the spatial attention blocks, not as much in the temporal attention blocks, and finally the least in the cross attention blocks. Therefore, many cross attention computation blocks can be skipped, followed by the temporal and spatial attention blocks. By combining other techniques like sequence parallelism and classifier-free guidance parallelism, PAB achieves near real-time video generation.
19+
20+
Enable PAB with [`~PyramidAttentionBroadcastConfig`] on any pipeline. For some benchmarks, refer to [this](https://github.com/huggingface/diffusers/pull/9562) pull request.
21+
22+
```python
23+
import torch
24+
from diffusers import CogVideoXPipeline, PyramidAttentionBroadcastConfig
25+
26+
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
27+
pipe.to("cuda")
28+
29+
# Increasing the value of `spatial_attention_timestep_skip_range[0]` or decreasing the value of
30+
# `spatial_attention_timestep_skip_range[1]` will decrease the interval in which pyramid attention
31+
# broadcast is active, leader to slower inference speeds. However, large intervals can lead to
32+
# poorer quality of generated videos.
33+
config = PyramidAttentionBroadcastConfig(
34+
spatial_attention_block_skip_range=2,
35+
spatial_attention_timestep_skip_range=(100, 800),
36+
current_timestep_callback=lambda: pipe.current_timestep,
37+
)
38+
pipe.transformer.enable_cache(config)
39+
```
40+
41+
### CacheMixin
42+
43+
[[autodoc]] CacheMixin
44+
45+
### PyramidAttentionBroadcastConfig
46+
47+
[[autodoc]] PyramidAttentionBroadcastConfig
48+
49+
[[autodoc]] apply_pyramid_attention_broadcast
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# ConsisIDTransformer3DModel
13+
14+
A Diffusion Transformer model for 3D data from [ConsisID](https://github.com/PKU-YuanGroup/ConsisID) was introduced in [Identity-Preserving Text-to-Video Generation by Frequency Decomposition](https://arxiv.org/pdf/2411.17440) by Peking University & University of Rochester & etc.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import ConsisIDTransformer3DModel
20+
21+
transformer = ConsisIDTransformer3DModel.from_pretrained("BestWishYsh/ConsisID-preview", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
22+
```
23+
24+
## ConsisIDTransformer3DModel
25+
26+
[[autodoc]] ConsisIDTransformer3DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# Lumina2Transformer2DModel
13+
14+
A Diffusion Transformer model for 3D video-like data was introduced in [Lumina Image 2.0](https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0) by Alpha-VLLM.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import Lumina2Transformer2DModel
20+
21+
transformer = Lumina2Transformer2DModel.from_pretrained("Alpha-VLLM/Lumina-Image-2.0", subfolder="transformer", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## Lumina2Transformer2DModel
25+
26+
[[autodoc]] Lumina2Transformer2DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# OmniGenTransformer2DModel
14+
15+
A Transformer model that accepts multimodal instructions to generate images for [OmniGen](https://github.com/VectorSpaceLab/OmniGen/).
16+
17+
The abstract from the paper is:
18+
19+
*The emergence of Large Language Models (LLMs) has unified language generation tasks and revolutionized human-machine interaction. However, in the realm of image generation, a unified model capable of handling various tasks within a single framework remains largely unexplored. In this work, we introduce OmniGen, a new diffusion model for unified image generation. OmniGen is characterized by the following features: 1) Unification: OmniGen not only demonstrates text-to-image generation capabilities but also inherently supports various downstream tasks, such as image editing, subject-driven generation, and visual conditional generation. 2) Simplicity: The architecture of OmniGen is highly simplified, eliminating the need for additional plugins. Moreover, compared to existing diffusion models, it is more user-friendly and can complete complex tasks end-to-end through instructions without the need for extra intermediate steps, greatly simplifying the image generation workflow. 3) Knowledge Transfer: Benefit from learning in a unified format, OmniGen effectively transfers knowledge across different tasks, manages unseen tasks and domains, and exhibits novel capabilities. We also explore the model’s reasoning capabilities and potential applications of the chain-of-thought mechanism. This work represents the first attempt at a general-purpose image generation model, and we will release our resources at https://github.com/VectorSpaceLab/OmniGen to foster future advancements.*
20+
21+
```python
22+
import torch
23+
from diffusers import OmniGenTransformer2DModel
24+
25+
transformer = OmniGenTransformer2DModel.from_pretrained("Shitao/OmniGen-v1-diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
26+
```
27+
28+
## OmniGenTransformer2DModel
29+
30+
[[autodoc]] OmniGenTransformer2DModel

0 commit comments

Comments
 (0)