Skip to content

Commit 386f8f4

Browse files
authored
Merge branch 'main' into repr-quant-config
2 parents 22794e6 + 6c7fad7 commit 386f8f4

File tree

70 files changed

+4330
-1665
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+4330
-1665
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 114 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,9 @@ env:
1313
PYTEST_TIMEOUT: 600
1414
RUN_SLOW: yes
1515
RUN_NIGHTLY: yes
16-
PIPELINE_USAGE_CUTOFF: 5000
16+
PIPELINE_USAGE_CUTOFF: 0
1717
SLACK_API_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
18+
CONSOLIDATED_REPORT_PATH: consolidated_test_report.md
1819

1920
jobs:
2021
setup_torch_cuda_pipeline_matrix:
@@ -99,11 +100,6 @@ jobs:
99100
with:
100101
name: pipeline_${{ matrix.module }}_test_reports
101102
path: reports
102-
- name: Generate Report and Notify Channel
103-
if: always()
104-
run: |
105-
pip install slack_sdk tabulate
106-
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
107103

108104
run_nightly_tests_for_other_torch_modules:
109105
name: Nightly Torch CUDA Tests
@@ -142,7 +138,6 @@ jobs:
142138
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
143139
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
144140
CUBLAS_WORKSPACE_CONFIG: :16:8
145-
RUN_COMPILE: yes
146141
run: |
147142
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
148143
-s -v -k "not Flax and not Onnx" \
@@ -175,12 +170,6 @@ jobs:
175170
name: torch_${{ matrix.module }}_cuda_test_reports
176171
path: reports
177172

178-
- name: Generate Report and Notify Channel
179-
if: always()
180-
run: |
181-
pip install slack_sdk tabulate
182-
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
183-
184173
run_torch_compile_tests:
185174
name: PyTorch Compile CUDA tests
186175

@@ -224,12 +213,6 @@ jobs:
224213
name: torch_compile_test_reports
225214
path: reports
226215

227-
- name: Generate Report and Notify Channel
228-
if: always()
229-
run: |
230-
pip install slack_sdk tabulate
231-
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
232-
233216
run_big_gpu_torch_tests:
234217
name: Torch tests on big GPU
235218
strategy:
@@ -280,12 +263,7 @@ jobs:
280263
with:
281264
name: torch_cuda_big_gpu_test_reports
282265
path: reports
283-
- name: Generate Report and Notify Channel
284-
if: always()
285-
run: |
286-
pip install slack_sdk tabulate
287-
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
288-
266+
289267
torch_minimum_version_cuda_tests:
290268
name: Torch Minimum Version CUDA Tests
291269
runs-on:
@@ -342,63 +320,6 @@ jobs:
342320
with:
343321
name: torch_minimum_version_cuda_test_reports
344322
path: reports
345-
346-
run_flax_tpu_tests:
347-
name: Nightly Flax TPU Tests
348-
runs-on:
349-
group: gcp-ct5lp-hightpu-8t
350-
if: github.event_name == 'schedule'
351-
352-
container:
353-
image: diffusers/diffusers-flax-tpu
354-
options: --shm-size "16gb" --ipc host --privileged ${{ vars.V5_LITEPOD_8_ENV}} -v /mnt/hf_cache:/mnt/hf_cache
355-
defaults:
356-
run:
357-
shell: bash
358-
steps:
359-
- name: Checkout diffusers
360-
uses: actions/checkout@v3
361-
with:
362-
fetch-depth: 2
363-
364-
- name: Install dependencies
365-
run: |
366-
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
367-
python -m uv pip install -e [quality,test]
368-
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
369-
python -m uv pip install pytest-reportlog
370-
371-
- name: Environment
372-
run: python utils/print_env.py
373-
374-
- name: Run nightly Flax TPU tests
375-
env:
376-
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
377-
run: |
378-
python -m pytest -n 0 \
379-
-s -v -k "Flax" \
380-
--make-reports=tests_flax_tpu \
381-
--report-log=tests_flax_tpu.log \
382-
tests/
383-
384-
- name: Failure short reports
385-
if: ${{ failure() }}
386-
run: |
387-
cat reports/tests_flax_tpu_stats.txt
388-
cat reports/tests_flax_tpu_failures_short.txt
389-
390-
- name: Test suite reports artifacts
391-
if: ${{ always() }}
392-
uses: actions/upload-artifact@v4
393-
with:
394-
name: flax_tpu_test_reports
395-
path: reports
396-
397-
- name: Generate Report and Notify Channel
398-
if: always()
399-
run: |
400-
pip install slack_sdk tabulate
401-
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
402323

403324
run_nightly_onnx_tests:
404325
name: Nightly ONNXRuntime CUDA tests on Ubuntu
@@ -449,18 +370,12 @@ jobs:
449370
name: tests_onnx_cuda_reports
450371
path: reports
451372

452-
- name: Generate Report and Notify Channel
453-
if: always()
454-
run: |
455-
pip install slack_sdk tabulate
456-
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
457-
458373
run_nightly_quantization_tests:
459374
name: Torch quantization nightly tests
460375
strategy:
461376
fail-fast: false
462377
max-parallel: 2
463-
matrix:
378+
matrix:
464379
config:
465380
- backend: "bitsandbytes"
466381
test_location: "bnb"
@@ -520,12 +435,7 @@ jobs:
520435
with:
521436
name: torch_cuda_${{ matrix.config.backend }}_reports
522437
path: reports
523-
- name: Generate Report and Notify Channel
524-
if: always()
525-
run: |
526-
pip install slack_sdk tabulate
527-
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
528-
438+
529439
run_nightly_pipeline_level_quantization_tests:
530440
name: Torch quantization nightly tests
531441
strategy:
@@ -574,12 +484,117 @@ jobs:
574484
with:
575485
name: torch_cuda_pipeline_level_quant_reports
576486
path: reports
577-
- name: Generate Report and Notify Channel
578-
if: always()
487+
488+
run_flax_tpu_tests:
489+
name: Nightly Flax TPU Tests
490+
runs-on:
491+
group: gcp-ct5lp-hightpu-8t
492+
if: github.event_name == 'schedule'
493+
494+
container:
495+
image: diffusers/diffusers-flax-tpu
496+
options: --shm-size "16gb" --ipc host --privileged ${{ vars.V5_LITEPOD_8_ENV}} -v /mnt/hf_cache:/mnt/hf_cache
497+
defaults:
498+
run:
499+
shell: bash
500+
steps:
501+
- name: Checkout diffusers
502+
uses: actions/checkout@v3
503+
with:
504+
fetch-depth: 2
505+
506+
- name: Install dependencies
507+
run: |
508+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
509+
python -m uv pip install -e [quality,test]
510+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
511+
python -m uv pip install pytest-reportlog
512+
513+
- name: Environment
514+
run: python utils/print_env.py
515+
516+
- name: Run nightly Flax TPU tests
517+
env:
518+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
519+
run: |
520+
python -m pytest -n 0 \
521+
-s -v -k "Flax" \
522+
--make-reports=tests_flax_tpu \
523+
--report-log=tests_flax_tpu.log \
524+
tests/
525+
526+
- name: Failure short reports
527+
if: ${{ failure() }}
528+
run: |
529+
cat reports/tests_flax_tpu_stats.txt
530+
cat reports/tests_flax_tpu_failures_short.txt
531+
532+
- name: Test suite reports artifacts
533+
if: ${{ always() }}
534+
uses: actions/upload-artifact@v4
535+
with:
536+
name: flax_tpu_test_reports
537+
path: reports
538+
539+
generate_consolidated_report:
540+
name: Generate Consolidated Test Report
541+
needs: [
542+
run_nightly_tests_for_torch_pipelines,
543+
run_nightly_tests_for_other_torch_modules,
544+
run_torch_compile_tests,
545+
run_big_gpu_torch_tests,
546+
run_nightly_quantization_tests,
547+
run_nightly_pipeline_level_quantization_tests,
548+
run_nightly_onnx_tests,
549+
torch_minimum_version_cuda_tests,
550+
run_flax_tpu_tests
551+
]
552+
if: always()
553+
runs-on:
554+
group: aws-general-8-plus
555+
container:
556+
image: diffusers/diffusers-pytorch-cpu
557+
steps:
558+
- name: Checkout diffusers
559+
uses: actions/checkout@v3
560+
with:
561+
fetch-depth: 2
562+
563+
- name: Create reports directory
564+
run: mkdir -p combined_reports
565+
566+
- name: Download all test reports
567+
uses: actions/download-artifact@v4
568+
with:
569+
path: artifacts
570+
571+
- name: Prepare reports
572+
run: |
573+
# Move all report files to a single directory for processing
574+
find artifacts -name "*.txt" -exec cp {} combined_reports/ \;
575+
576+
- name: Install dependencies
579577
run: |
578+
pip install -e .[test]
580579
pip install slack_sdk tabulate
581-
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
582-
580+
581+
- name: Generate consolidated report
582+
run: |
583+
python utils/consolidated_test_report.py \
584+
--reports_dir combined_reports \
585+
--output_file $CONSOLIDATED_REPORT_PATH \
586+
--slack_channel_name diffusers-ci-nightly
587+
588+
- name: Show consolidated report
589+
run: |
590+
cat $CONSOLIDATED_REPORT_PATH >> $GITHUB_STEP_SUMMARY
591+
592+
- name: Upload consolidated report
593+
uses: actions/upload-artifact@v4
594+
with:
595+
name: consolidated_test_report
596+
path: ${{ env.CONSOLIDATED_REPORT_PATH }}
597+
583598
# M1 runner currently not well supported
584599
# TODO: (Dhruv) add these back when we setup better testing for Apple Silicon
585600
# run_nightly_tests_apple_m1:

docs/source/en/_toctree.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,8 +92,6 @@
9292
title: API Reference
9393
title: Hybrid Inference
9494
- sections:
95-
- local: using-diffusers/cogvideox
96-
title: CogVideoX
9795
- local: using-diffusers/consisid
9896
title: ConsisID
9997
- local: using-diffusers/sdxl
@@ -178,6 +176,8 @@
178176
- sections:
179177
- local: optimization/fp16
180178
title: Accelerate inference
179+
- local: optimization/cache
180+
title: Caching
181181
- local: optimization/memory
182182
title: Reduce memory usage
183183
- local: optimization/xformers

docs/source/en/api/cache.md

Lines changed: 4 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -11,71 +11,19 @@ specific language governing permissions and limitations under the License. -->
1111

1212
# Caching methods
1313

14-
## Pyramid Attention Broadcast
14+
Cache methods speedup diffusion transformers by storing and reusing intermediate outputs of specific layers, such as attention and feedforward layers, instead of recalculating them at each inference step.
1515

16-
[Pyramid Attention Broadcast](https://huggingface.co/papers/2408.12588) from Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang You.
17-
18-
Pyramid Attention Broadcast (PAB) is a method that speeds up inference in diffusion models by systematically skipping attention computations between successive inference steps and reusing cached attention states. The attention states are not very different between successive inference steps. The most prominent difference is in the spatial attention blocks, not as much in the temporal attention blocks, and finally the least in the cross attention blocks. Therefore, many cross attention computation blocks can be skipped, followed by the temporal and spatial attention blocks. By combining other techniques like sequence parallelism and classifier-free guidance parallelism, PAB achieves near real-time video generation.
19-
20-
Enable PAB with [`~PyramidAttentionBroadcastConfig`] on any pipeline. For some benchmarks, refer to [this](https://github.com/huggingface/diffusers/pull/9562) pull request.
21-
22-
```python
23-
import torch
24-
from diffusers import CogVideoXPipeline, PyramidAttentionBroadcastConfig
25-
26-
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
27-
pipe.to("cuda")
28-
29-
# Increasing the value of `spatial_attention_timestep_skip_range[0]` or decreasing the value of
30-
# `spatial_attention_timestep_skip_range[1]` will decrease the interval in which pyramid attention
31-
# broadcast is active, leader to slower inference speeds. However, large intervals can lead to
32-
# poorer quality of generated videos.
33-
config = PyramidAttentionBroadcastConfig(
34-
spatial_attention_block_skip_range=2,
35-
spatial_attention_timestep_skip_range=(100, 800),
36-
current_timestep_callback=lambda: pipe.current_timestep,
37-
)
38-
pipe.transformer.enable_cache(config)
39-
```
40-
41-
## Faster Cache
42-
43-
[FasterCache](https://huggingface.co/papers/2410.19355) from Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong.
44-
45-
FasterCache is a method that speeds up inference in diffusion transformers by:
46-
- Reusing attention states between successive inference steps, due to high similarity between them
47-
- Skipping unconditional branch prediction used in classifier-free guidance by revealing redundancies between unconditional and conditional branch outputs for the same timestep, and therefore approximating the unconditional branch output using the conditional branch output
48-
49-
```python
50-
import torch
51-
from diffusers import CogVideoXPipeline, FasterCacheConfig
52-
53-
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
54-
pipe.to("cuda")
55-
56-
config = FasterCacheConfig(
57-
spatial_attention_block_skip_range=2,
58-
spatial_attention_timestep_skip_range=(-1, 681),
59-
current_timestep_callback=lambda: pipe.current_timestep,
60-
attention_weight_callback=lambda _: 0.3,
61-
unconditional_batch_skip_range=5,
62-
unconditional_batch_timestep_skip_range=(-1, 781),
63-
tensor_format="BFCHW",
64-
)
65-
pipe.transformer.enable_cache(config)
66-
```
67-
68-
### CacheMixin
16+
## CacheMixin
6917

7018
[[autodoc]] CacheMixin
7119

72-
### PyramidAttentionBroadcastConfig
20+
## PyramidAttentionBroadcastConfig
7321

7422
[[autodoc]] PyramidAttentionBroadcastConfig
7523

7624
[[autodoc]] apply_pyramid_attention_broadcast
7725

78-
### FasterCacheConfig
26+
## FasterCacheConfig
7927

8028
[[autodoc]] FasterCacheConfig
8129

docs/source/en/api/loaders/lora.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,4 +98,8 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
9898

9999
## LoraBaseMixin
100100

101-
[[autodoc]] loaders.lora_base.LoraBaseMixin
101+
[[autodoc]] loaders.lora_base.LoraBaseMixin
102+
103+
## WanLoraLoaderMixin
104+
105+
[[autodoc]] loaders.lora_pipeline.WanLoraLoaderMixin

0 commit comments

Comments
 (0)