Skip to content

Commit 535a14e

Browse files
Merge branch 'main' into feat/mag-cache
2 parents 0a05bec + 6290fdf commit 535a14e

26 files changed

+1027
-145
lines changed

docs/source/en/_toctree.yml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -401,6 +401,8 @@
401401
title: WanAnimateTransformer3DModel
402402
- local: api/models/wan_transformer_3d
403403
title: WanTransformer3DModel
404+
- local: api/models/z_image_transformer2d
405+
title: ZImageTransformer2DModel
404406
title: Transformers
405407
- sections:
406408
- local: api/models/stable_cascade_unet
@@ -551,6 +553,8 @@
551553
title: Kandinsky 2.2
552554
- local: api/pipelines/kandinsky3
553555
title: Kandinsky 3
556+
- local: api/pipelines/kandinsky5_image
557+
title: Kandinsky 5.0 Image
554558
- local: api/pipelines/kolors
555559
title: Kolors
556560
- local: api/pipelines/latent_consistency_models
@@ -646,6 +650,8 @@
646650
title: VisualCloze
647651
- local: api/pipelines/wuerstchen
648652
title: Wuerstchen
653+
- local: api/pipelines/z_image
654+
title: Z-Image
649655
title: Image
650656
- sections:
651657
- local: api/pipelines/allegro
@@ -664,8 +670,6 @@
664670
title: HunyuanVideo1.5
665671
- local: api/pipelines/i2vgenxl
666672
title: I2VGen-XL
667-
- local: api/pipelines/kandinsky5_image
668-
title: Kandinsky 5.0 Image
669673
- local: api/pipelines/kandinsky5_video
670674
title: Kandinsky 5.0 Video
671675
- local: api/pipelines/latte

docs/source/en/api/cache.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,3 +34,9 @@ Cache methods speedup diffusion transformers by storing and reusing intermediate
3434
[[autodoc]] FirstBlockCacheConfig
3535

3636
[[autodoc]] apply_first_block_cache
37+
38+
### TaylorSeerCacheConfig
39+
40+
[[autodoc]] TaylorSeerCacheConfig
41+
42+
[[autodoc]] apply_taylorseer_cache
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# ZImageTransformer2DModel
14+
15+
A Transformer model for image-like data from [Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo).
16+
17+
## ZImageTransformer2DModel
18+
19+
[[autodoc]] ZImageTransformer2DModel

docs/source/en/api/pipelines/kandinsky5_image.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ specific language governing permissions and limitations under the License.
1111

1212
[Kandinsky 5.0](https://arxiv.org/abs/2511.14993) is a family of diffusion models for Video & Image generation.
1313

14-
Kandinsky 5.0 Image Lite is a lightweight image generation model (6B parameters)
14+
Kandinsky 5.0 Image Lite is a lightweight image generation model (6B parameters).
1515

1616
The model introduces several key innovations:
1717
- **Latent diffusion pipeline** with **Flow Matching** for improved training stability
@@ -21,10 +21,14 @@ The model introduces several key innovations:
2121

2222
The original codebase can be found at [kandinskylab/Kandinsky-5](https://github.com/kandinskylab/Kandinsky-5).
2323

24+
> [!TIP]
25+
> Check out the [Kandinsky Lab](https://huggingface.co/kandinskylab) organization on the Hub for the official model checkpoints for text-to-video generation, including pretrained, SFT, no-CFG, and distilled variants.
26+
2427

2528
## Available Models
2629

2730
Kandinsky 5.0 Image Lite:
31+
2832
| model_id | Description | Use Cases |
2933
|------------|-------------|-----------|
3034
| [**kandinskylab/Kandinsky-5.0-T2I-Lite-sft-Diffusers**](https://huggingface.co/kandinskylab/Kandinsky-5.0-T2I-Lite-sft-Diffusers) | 6B image Supervised Fine-Tuned model | Highest generation quality |

docs/source/en/api/pipelines/kandinsky5_video.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ The original codebase can be found at [kandinskylab/Kandinsky-5](https://github.
3030
## Available Models
3131

3232
Kandinsky 5.0 T2V Pro:
33+
3334
| model_id | Description | Use Cases |
3435
|------------|-------------|-----------|
3536
| **kandinskylab/Kandinsky-5.0-T2V-Pro-sft-5s-Diffusers** | 5 second Text-to-Video Pro model | High-quality text-to-video generation |
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Z-Image
14+
15+
<div class="flex flex-wrap space-x-1">
16+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
</div>
18+
19+
[Z-Image](https://huggingface.co/papers/2511.22699) is a powerful and highly efficient image generation model with 6B parameters. Currently there's only one model with two more to be released:
20+
21+
|Model|Hugging Face|
22+
|---|---|
23+
|Z-Image-Turbo|https://huggingface.co/Tongyi-MAI/Z-Image-Turbo|
24+
25+
## Z-Image-Turbo
26+
27+
Z-Image-Turbo is a distilled version of Z-Image that matches or exceeds leading competitors with only 8 NFEs (Number of Function Evaluations). It offers sub-second inference latency on enterprise-grade H800 GPUs and fits comfortably within 16G VRAM consumer devices. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.
28+
29+
## ZImagePipeline
30+
31+
[[autodoc]] ZImagePipeline
32+
- all
33+
- __call__

docs/source/en/optimization/cache.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,4 +66,35 @@ config = FasterCacheConfig(
6666
tensor_format="BFCHW",
6767
)
6868
pipeline.transformer.enable_cache(config)
69+
```
70+
71+
## TaylorSeer Cache
72+
73+
[TaylorSeer Cache](https://huggingface.co/papers/2403.06923) accelerates diffusion inference by using Taylor series expansions to approximate and cache intermediate activations across denoising steps. The method predicts future outputs based on past computations, reusing them at specified intervals to reduce redundant calculations.
74+
75+
This caching mechanism delivers strong results with minimal additional memory overhead. For detailed performance analysis, see [our findings here](https://github.com/huggingface/diffusers/pull/12648#issuecomment-3610615080).
76+
77+
To enable TaylorSeer Cache, create a [`TaylorSeerCacheConfig`] and pass it to your pipeline's transformer:
78+
79+
- `cache_interval`: Number of steps to reuse cached outputs before performing a full forward pass
80+
- `disable_cache_before_step`: Initial steps that use full computations to gather data for approximations
81+
- `max_order`: Approximation accuracy (in theory, higher values improve quality but increase memory usage but we recommend it should be set to `1`)
82+
83+
```python
84+
import torch
85+
from diffusers import FluxPipeline, TaylorSeerCacheConfig
86+
87+
pipe = FluxPipeline.from_pretrained(
88+
"black-forest-labs/FLUX.1-dev",
89+
torch_dtype=torch.bfloat16,
90+
)
91+
pipe.to("cuda")
92+
93+
config = TaylorSeerCacheConfig(
94+
cache_interval=5,
95+
max_order=1,
96+
disable_cache_before_step=10,
97+
taylor_factors_dtype=torch.bfloat16,
98+
)
99+
pipe.transformer.enable_cache(config)
69100
```

src/diffusers/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,11 +171,13 @@
171171
"FLUX_MAG_RATIOS",
172172
"PyramidAttentionBroadcastConfig",
173173
"SmoothedEnergyGuidanceConfig",
174+
"TaylorSeerCacheConfig",
174175
"apply_faster_cache",
175176
"apply_first_block_cache",
176177
"apply_layer_skip",
177178
"apply_mag_cache",
178179
"apply_pyramid_attention_broadcast",
180+
"apply_taylorseer_cache",
179181
]
180182
)
181183
_import_structure["models"].extend(
@@ -904,11 +906,13 @@
904906
MagCacheConfig,
905907
PyramidAttentionBroadcastConfig,
906908
SmoothedEnergyGuidanceConfig,
909+
TaylorSeerCacheConfig,
907910
apply_faster_cache,
908911
apply_first_block_cache,
909912
apply_layer_skip,
910913
apply_mag_cache,
911914
apply_pyramid_attention_broadcast,
915+
apply_taylorseer_cache,
912916
)
913917
from .models import (
914918
AllegroTransformer3DModel,

src/diffusers/hooks/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,4 @@
2626
from .mag_cache import FLUX_MAG_RATIOS, MagCacheConfig, apply_mag_cache
2727
from .pyramid_attention_broadcast import PyramidAttentionBroadcastConfig, apply_pyramid_attention_broadcast
2828
from .smoothed_energy_guidance_utils import SmoothedEnergyGuidanceConfig
29+
from .taylorseer_cache import TaylorSeerCacheConfig, apply_taylorseer_cache

0 commit comments

Comments
 (0)