Skip to content

Commit 8e21d99

Browse files
authored
Merge branch 'main' into fixes-issue-11060
2 parents b24c6a8 + 54dac3a commit 8e21d99

File tree

110 files changed

+7398
-588
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

110 files changed

+7398
-588
lines changed

.github/workflows/pr_tests_gpu.yml

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,51 @@ env:
2828
PIPELINE_USAGE_CUTOFF: 1000000000 # set high cutoff so that only always-test pipelines run
2929

3030
jobs:
31+
check_code_quality:
32+
runs-on: ubuntu-22.04
33+
steps:
34+
- uses: actions/checkout@v3
35+
- name: Set up Python
36+
uses: actions/setup-python@v4
37+
with:
38+
python-version: "3.8"
39+
- name: Install dependencies
40+
run: |
41+
python -m pip install --upgrade pip
42+
pip install .[quality]
43+
- name: Check quality
44+
run: make quality
45+
- name: Check if failure
46+
if: ${{ failure() }}
47+
run: |
48+
echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
49+
50+
check_repository_consistency:
51+
needs: check_code_quality
52+
runs-on: ubuntu-22.04
53+
steps:
54+
- uses: actions/checkout@v3
55+
- name: Set up Python
56+
uses: actions/setup-python@v4
57+
with:
58+
python-version: "3.8"
59+
- name: Install dependencies
60+
run: |
61+
python -m pip install --upgrade pip
62+
pip install .[quality]
63+
- name: Check repo consistency
64+
run: |
65+
python utils/check_copies.py
66+
python utils/check_dummies.py
67+
python utils/check_support_list.py
68+
make deps_table_check_updated
69+
- name: Check if failure
70+
if: ${{ failure() }}
71+
run: |
72+
echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
73+
3174
setup_torch_cuda_pipeline_matrix:
75+
needs: [check_code_quality, check_repository_consistency]
3276
name: Setup Torch Pipelines CUDA Slow Tests Matrix
3377
runs-on:
3478
group: aws-general-8-plus
@@ -133,6 +177,7 @@ jobs:
133177

134178
torch_cuda_tests:
135179
name: Torch CUDA Tests
180+
needs: [check_code_quality, check_repository_consistency]
136181
runs-on:
137182
group: aws-g4dn-2xlarge
138183
container:
@@ -201,7 +246,7 @@ jobs:
201246

202247
run_examples_tests:
203248
name: Examples PyTorch CUDA tests on Ubuntu
204-
pip uninstall transformers -y && python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
249+
needs: [check_code_quality, check_repository_consistency]
205250
runs-on:
206251
group: aws-g4dn-2xlarge
207252

@@ -220,6 +265,7 @@ jobs:
220265
- name: Install dependencies
221266
run: |
222267
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
268+
pip uninstall transformers -y && python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
223269
python -m uv pip install -e [quality,test,training]
224270
225271
- name: Environment

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -496,6 +496,8 @@
496496
title: PixArt-Σ
497497
- local: api/pipelines/sana
498498
title: Sana
499+
- local: api/pipelines/sana_sprint
500+
title: Sana Sprint
499501
- local: api/pipelines/self_attention_guidance
500502
title: Self-Attention Guidance
501503
- local: api/pipelines/semantic_stable_diffusion

docs/source/en/api/cache.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,33 @@ config = PyramidAttentionBroadcastConfig(
3838
pipe.transformer.enable_cache(config)
3939
```
4040

41+
## Faster Cache
42+
43+
[FasterCache](https://huggingface.co/papers/2410.19355) from Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong.
44+
45+
FasterCache is a method that speeds up inference in diffusion transformers by:
46+
- Reusing attention states between successive inference steps, due to high similarity between them
47+
- Skipping unconditional branch prediction used in classifier-free guidance by revealing redundancies between unconditional and conditional branch outputs for the same timestep, and therefore approximating the unconditional branch output using the conditional branch output
48+
49+
```python
50+
import torch
51+
from diffusers import CogVideoXPipeline, FasterCacheConfig
52+
53+
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
54+
pipe.to("cuda")
55+
56+
config = FasterCacheConfig(
57+
spatial_attention_block_skip_range=2,
58+
spatial_attention_timestep_skip_range=(-1, 681),
59+
current_timestep_callback=lambda: pipe.current_timestep,
60+
attention_weight_callback=lambda _: 0.3,
61+
unconditional_batch_skip_range=5,
62+
unconditional_batch_timestep_skip_range=(-1, 781),
63+
tensor_format="BFCHW",
64+
)
65+
pipe.transformer.enable_cache(config)
66+
```
67+
4168
### CacheMixin
4269

4370
[[autodoc]] CacheMixin
@@ -47,3 +74,9 @@ pipe.transformer.enable_cache(config)
4774
[[autodoc]] PyramidAttentionBroadcastConfig
4875

4976
[[autodoc]] apply_pyramid_attention_broadcast
77+
78+
### FasterCacheConfig
79+
80+
[[autodoc]] FasterCacheConfig
81+
82+
[[autodoc]] apply_faster_cache

docs/source/en/api/pipelines/hunyuan_video.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,8 @@ The following models are available for the image-to-video pipeline:
5050
| Model name | Description |
5151
|:---|:---|
5252
| [`Skywork/SkyReels-V1-Hunyuan-I2V`](https://huggingface.co/Skywork/SkyReels-V1-Hunyuan-I2V) | Skywork's custom finetune of HunyuanVideo (de-distilled). Performs best with `97x544x960` resolution. Performs best at `97x544x960` resolution, `guidance_scale=1.0`, `true_cfg_scale=6.0` and a negative prompt. |
53-
| [`hunyuanvideo-community/HunyuanVideo-I2V`](https://huggingface.co/hunyuanvideo-community/HunyuanVideo-I2V) | Tecent's official HunyuanVideo I2V model. Performs best at resolutions of 480, 720, 960, 1280. A higher `shift` value when initializing the scheduler is recommended (good values are between 7 and 20) |
53+
| [`hunyuanvideo-community/HunyuanVideo-I2V-33ch`](https://huggingface.co/hunyuanvideo-community/HunyuanVideo-I2V) | Tecent's official HunyuanVideo 33-channel I2V model. Performs best at resolutions of 480, 720, 960, 1280. A higher `shift` value when initializing the scheduler is recommended (good values are between 7 and 20). |
54+
| [`hunyuanvideo-community/HunyuanVideo-I2V`](https://huggingface.co/hunyuanvideo-community/HunyuanVideo-I2V) | Tecent's official HunyuanVideo 16-channel I2V model. Performs best at resolutions of 480, 720, 960, 1280. A higher `shift` value when initializing the scheduler is recommended (good values are between 7 and 20) |
5455

5556
## Quantization
5657

docs/source/en/api/pipelines/ltx_video.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,12 @@ export_to_video(video, "ship.mp4", fps=24)
196196
- all
197197
- __call__
198198

199+
## LTXConditionPipeline
200+
201+
[[autodoc]] LTXConditionPipeline
202+
- all
203+
- __call__
204+
199205
## LTXPipelineOutput
200206

201207
[[autodoc]] pipelines.ltx.pipeline_output.LTXPipelineOutput
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License. -->
14+
15+
# SanaSprintPipeline
16+
17+
<div class="flex flex-wrap space-x-1">
18+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
19+
</div>
20+
21+
[SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation](https://huggingface.co/papers/2503.09641) from NVIDIA, MIT HAN Lab, and Hugging Face by Junsong Chen, Shuchen Xue, Yuyang Zhao, Jincheng Yu, Sayak Paul, Junyu Chen, Han Cai, Enze Xie, Song Han
22+
23+
The abstract from the paper is:
24+
25+
*This paper presents SANA-Sprint, an efficient diffusion model for ultra-fast text-to-image (T2I) generation. SANA-Sprint is built on a pre-trained foundation model and augmented with hybrid distillation, dramatically reducing inference steps from 20 to 1-4. We introduce three key innovations: (1) We propose a training-free approach that transforms a pre-trained flow-matching model for continuous-time consistency distillation (sCM), eliminating costly training from scratch and achieving high training efficiency. Our hybrid distillation strategy combines sCM with latent adversarial distillation (LADD): sCM ensures alignment with the teacher model, while LADD enhances single-step generation fidelity. (2) SANA-Sprint is a unified step-adaptive model that achieves high-quality generation in 1-4 steps, eliminating step-specific training and improving efficiency. (3) We integrate ControlNet with SANA-Sprint for real-time interactive image generation, enabling instant visual feedback for user interaction. SANA-Sprint establishes a new Pareto frontier in speed-quality tradeoffs, achieving state-of-the-art performance with 7.59 FID and 0.74 GenEval in only 1 step — outperforming FLUX-schnell (7.94 FID / 0.71 GenEval) while being 10× faster (0.1s vs 1.1s on H100). It also achieves 0.1s (T2I) and 0.25s (ControlNet) latency for 1024×1024 images on H100, and 0.31s (T2I) on an RTX 4090, showcasing its exceptional efficiency and potential for AI-powered consumer applications (AIPC). Code and pre-trained models will be open-sourced.*
26+
27+
<Tip>
28+
29+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
30+
31+
</Tip>
32+
33+
This pipeline was contributed by [lawrence-cj](https://github.com/lawrence-cj), [shuchen Xue](https://github.com/scxue) and [Enze Xie](https://github.com/xieenze). The original codebase can be found [here](https://github.com/NVlabs/Sana). The original weights can be found under [hf.co/Efficient-Large-Model](https://huggingface.co/Efficient-Large-Model/).
34+
35+
Available models:
36+
37+
| Model | Recommended dtype |
38+
|:-------------------------------------------------------------------------------------------------------------------------------------------:|:-----------------:|
39+
| [`Efficient-Large-Model/Sana_Sprint_1.6B_1024px_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_Sprint_1.6B_1024px_diffusers) | `torch.bfloat16` |
40+
| [`Efficient-Large-Model/Sana_Sprint_0.6B_1024px_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_Sprint_0.6B_1024px_diffusers) | `torch.bfloat16` |
41+
42+
Refer to [this](https://huggingface.co/collections/Efficient-Large-Model/sana-sprint-67d6810d65235085b3b17c76) collection for more information.
43+
44+
Note: The recommended dtype mentioned is for the transformer weights. The text encoder must stay in `torch.bfloat16` and VAE weights must stay in `torch.bfloat16` or `torch.float32` for the model to work correctly. Please refer to the inference example below to see how to load the model with the recommended dtype.
45+
46+
47+
## Quantization
48+
49+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
50+
51+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`SanaSprintPipeline`] for inference with bitsandbytes.
52+
53+
```py
54+
import torch
55+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, SanaTransformer2DModel, SanaSprintPipeline
56+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, AutoModel
57+
58+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
59+
text_encoder_8bit = AutoModel.from_pretrained(
60+
"Efficient-Large-Model/Sana_Sprint_1.6B_1024px_diffusers",
61+
subfolder="text_encoder",
62+
quantization_config=quant_config,
63+
torch_dtype=torch.bfloat16,
64+
)
65+
66+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
67+
transformer_8bit = SanaTransformer2DModel.from_pretrained(
68+
"Efficient-Large-Model/Sana_Sprint_1.6B_1024px_diffusers",
69+
subfolder="transformer",
70+
quantization_config=quant_config,
71+
torch_dtype=torch.bfloat16,
72+
)
73+
74+
pipeline = SanaSprintPipeline.from_pretrained(
75+
"Efficient-Large-Model/Sana_Sprint_1.6B_1024px_diffusers",
76+
text_encoder=text_encoder_8bit,
77+
transformer=transformer_8bit,
78+
torch_dtype=torch.bfloat16,
79+
device_map="balanced",
80+
)
81+
82+
prompt = "a tiny astronaut hatching from an egg on the moon"
83+
image = pipeline(prompt).images[0]
84+
image.save("sana.png")
85+
```
86+
87+
## Setting `max_timesteps`
88+
89+
Users can tweak the `max_timesteps` value for experimenting with the visual quality of the generated outputs. The default `max_timesteps` value was obtained with an inference-time search process. For more details about it, check out the paper.
90+
91+
## SanaSprintPipeline
92+
93+
[[autodoc]] SanaSprintPipeline
94+
- all
95+
- __call__
96+
97+
98+
## SanaPipelineOutput
99+
100+
[[autodoc]] pipelines.sana.pipeline_output.SanaPipelineOutput

0 commit comments

Comments
 (0)