Skip to content

Commit 15ed1d1

Browse files
committed
Merge branch 'main' into attention-dispatcher
2 parents 45a809a + c934720 commit 15ed1d1

File tree

608 files changed

+6960
-5388
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

608 files changed

+6960
-5388
lines changed

.github/workflows/benchmark.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ jobs:
2323
runs-on:
2424
group: aws-g6-4xlarge-plus
2525
container:
26-
image: diffusers/diffusers-pytorch-compile-cuda
26+
image: diffusers/diffusers-pytorch-cuda
2727
options: --shm-size "16gb" --ipc host --gpus 0
2828
steps:
2929
- name: Checkout diffusers

.github/workflows/build_docker_images.yml

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,16 @@ jobs:
3838
token: ${{ secrets.GITHUB_TOKEN }}
3939

4040
- name: Build Changed Docker Images
41+
env:
42+
CHANGED_FILES: ${{ steps.file_changes.outputs.all }}
4143
run: |
42-
CHANGED_FILES="${{ steps.file_changes.outputs.all }}"
43-
for FILE in $CHANGED_FILES; do
44+
echo "$CHANGED_FILES"
45+
for FILE in $CHANGED_FILES; do
46+
# skip anything that isn't still on disk
47+
if [[ ! -f "$FILE" ]]; then
48+
echo "Skipping removed file $FILE"
49+
continue
50+
fi
4451
if [[ "$FILE" == docker/*Dockerfile ]]; then
4552
DOCKER_PATH="${FILE%/Dockerfile}"
4653
DOCKER_TAG=$(basename "$DOCKER_PATH")
@@ -65,7 +72,7 @@ jobs:
6572
image-name:
6673
- diffusers-pytorch-cpu
6774
- diffusers-pytorch-cuda
68-
- diffusers-pytorch-compile-cuda
75+
- diffusers-pytorch-cuda
6976
- diffusers-pytorch-xformers-cuda
7077
- diffusers-pytorch-minimum-cuda
7178
- diffusers-flax-cpu

.github/workflows/nightly_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,7 @@ jobs:
188188
group: aws-g4dn-2xlarge
189189

190190
container:
191-
image: diffusers/diffusers-pytorch-compile-cuda
191+
image: diffusers/diffusers-pytorch-cuda
192192
options: --gpus 0 --shm-size "16gb" --ipc host
193193

194194
steps:

.github/workflows/pr_tests.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -291,8 +291,8 @@ jobs:
291291
- name: Failure short reports
292292
if: ${{ failure() }}
293293
run: |
294-
cat reports/tests_lora_failures_short.txt
295-
cat reports/tests_models_lora_failures_short.txt
294+
cat reports/tests_peft_main_failures_short.txt
295+
cat reports/tests_models_lora_peft_main_failures_short.txt
296296
297297
- name: Test suite reports artifacts
298298
if: ${{ always() }}

.github/workflows/push_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,7 @@ jobs:
262262
group: aws-g4dn-2xlarge
263263

264264
container:
265-
image: diffusers/diffusers-pytorch-compile-cuda
265+
image: diffusers/diffusers-pytorch-cuda
266266
options: --gpus 0 --shm-size "16gb" --ipc host
267267

268268
steps:

.github/workflows/release_tests_fast.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -316,7 +316,7 @@ jobs:
316316
group: aws-g4dn-2xlarge
317317

318318
container:
319-
image: diffusers/diffusers-pytorch-compile-cuda
319+
image: diffusers/diffusers-pytorch-cuda
320320
options: --gpus 0 --shm-size "16gb" --ipc host
321321

322322
steps:

docker/diffusers-pytorch-compile-cuda/Dockerfile

Lines changed: 0 additions & 50 deletions
This file was deleted.

docs/source/en/_toctree.yml

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,6 @@
1717
title: AutoPipeline
1818
- local: tutorials/basic_training
1919
title: Train a diffusion model
20-
- local: tutorials/fast_diffusion
21-
title: Accelerate inference of text-to-image diffusion models
2220
title: Tutorials
2321
- sections:
2422
- local: using-diffusers/loading
@@ -94,8 +92,6 @@
9492
title: API Reference
9593
title: Hybrid Inference
9694
- sections:
97-
- local: using-diffusers/cogvideox
98-
title: CogVideoX
9995
- local: using-diffusers/consisid
10096
title: ConsisID
10197
- local: using-diffusers/sdxl
@@ -180,10 +176,10 @@
180176
- sections:
181177
- local: optimization/fp16
182178
title: Accelerate inference
179+
- local: optimization/cache
180+
title: Caching
183181
- local: optimization/memory
184182
title: Reduce memory usage
185-
- local: optimization/torch2.0
186-
title: PyTorch 2.0
187183
- local: optimization/xformers
188184
title: xFormers
189185
- local: optimization/tome
@@ -210,7 +206,7 @@
210206
- local: optimization/mps
211207
title: Metal Performance Shaders (MPS)
212208
- local: optimization/habana
213-
title: Habana Gaudi
209+
title: Intel Gaudi
214210
- local: optimization/neuron
215211
title: AWS Neuron
216212
title: Optimized hardware

docs/source/en/api/cache.md

Lines changed: 4 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -11,71 +11,19 @@ specific language governing permissions and limitations under the License. -->
1111

1212
# Caching methods
1313

14-
## Pyramid Attention Broadcast
14+
Cache methods speedup diffusion transformers by storing and reusing intermediate outputs of specific layers, such as attention and feedforward layers, instead of recalculating them at each inference step.
1515

16-
[Pyramid Attention Broadcast](https://huggingface.co/papers/2408.12588) from Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang You.
17-
18-
Pyramid Attention Broadcast (PAB) is a method that speeds up inference in diffusion models by systematically skipping attention computations between successive inference steps and reusing cached attention states. The attention states are not very different between successive inference steps. The most prominent difference is in the spatial attention blocks, not as much in the temporal attention blocks, and finally the least in the cross attention blocks. Therefore, many cross attention computation blocks can be skipped, followed by the temporal and spatial attention blocks. By combining other techniques like sequence parallelism and classifier-free guidance parallelism, PAB achieves near real-time video generation.
19-
20-
Enable PAB with [`~PyramidAttentionBroadcastConfig`] on any pipeline. For some benchmarks, refer to [this](https://github.com/huggingface/diffusers/pull/9562) pull request.
21-
22-
```python
23-
import torch
24-
from diffusers import CogVideoXPipeline, PyramidAttentionBroadcastConfig
25-
26-
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
27-
pipe.to("cuda")
28-
29-
# Increasing the value of `spatial_attention_timestep_skip_range[0]` or decreasing the value of
30-
# `spatial_attention_timestep_skip_range[1]` will decrease the interval in which pyramid attention
31-
# broadcast is active, leader to slower inference speeds. However, large intervals can lead to
32-
# poorer quality of generated videos.
33-
config = PyramidAttentionBroadcastConfig(
34-
spatial_attention_block_skip_range=2,
35-
spatial_attention_timestep_skip_range=(100, 800),
36-
current_timestep_callback=lambda: pipe.current_timestep,
37-
)
38-
pipe.transformer.enable_cache(config)
39-
```
40-
41-
## Faster Cache
42-
43-
[FasterCache](https://huggingface.co/papers/2410.19355) from Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong.
44-
45-
FasterCache is a method that speeds up inference in diffusion transformers by:
46-
- Reusing attention states between successive inference steps, due to high similarity between them
47-
- Skipping unconditional branch prediction used in classifier-free guidance by revealing redundancies between unconditional and conditional branch outputs for the same timestep, and therefore approximating the unconditional branch output using the conditional branch output
48-
49-
```python
50-
import torch
51-
from diffusers import CogVideoXPipeline, FasterCacheConfig
52-
53-
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
54-
pipe.to("cuda")
55-
56-
config = FasterCacheConfig(
57-
spatial_attention_block_skip_range=2,
58-
spatial_attention_timestep_skip_range=(-1, 681),
59-
current_timestep_callback=lambda: pipe.current_timestep,
60-
attention_weight_callback=lambda _: 0.3,
61-
unconditional_batch_skip_range=5,
62-
unconditional_batch_timestep_skip_range=(-1, 781),
63-
tensor_format="BFCHW",
64-
)
65-
pipe.transformer.enable_cache(config)
66-
```
67-
68-
### CacheMixin
16+
## CacheMixin
6917

7018
[[autodoc]] CacheMixin
7119

72-
### PyramidAttentionBroadcastConfig
20+
## PyramidAttentionBroadcastConfig
7321

7422
[[autodoc]] PyramidAttentionBroadcastConfig
7523

7624
[[autodoc]] apply_pyramid_attention_broadcast
7725

78-
### FasterCacheConfig
26+
## FasterCacheConfig
7927

8028
[[autodoc]] FasterCacheConfig
8129

docs/source/en/api/loaders/lora.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,4 +98,8 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
9898

9999
## LoraBaseMixin
100100

101-
[[autodoc]] loaders.lora_base.LoraBaseMixin
101+
[[autodoc]] loaders.lora_base.LoraBaseMixin
102+
103+
## WanLoraLoaderMixin
104+
105+
[[autodoc]] loaders.lora_pipeline.WanLoraLoaderMixin

0 commit comments

Comments
 (0)