Skip to content

Commit 5afbcce

Browse files
leffffasomozayiyixuxucbensimonsayakpaul
authored
Kandinsky 5 10 sec (NABLA suport) (#12520)
* add transformer pipeline first version * updates * fix 5sec generation * rewrite Kandinsky5T2VPipeline to diffusers style * add multiprompt support * remove prints in pipeline * add nabla attention * Wrap Transformer in Diffusers style * fix license * fix prompt type * add gradient checkpointing and peft support * add usage example * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: Álvaro Somoza <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: Álvaro Somoza <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: Álvaro Somoza <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: Álvaro Somoza <[email protected]> * Update src/diffusers/models/transformers/transformer_kandinsky.py Co-authored-by: Álvaro Somoza <[email protected]> * remove unused imports * add 10 second models support * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * remove no_grad and simplified prompt paddings * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * moved template to __init__ * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/models/transformers/transformer_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * moved sdps inside processor * remove oneline function * remove reset_dtype methods * Transformer: move all methods to forward * separated prompt encoding * Update src/diffusers/models/transformers/transformer_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * refactoring * Update src/diffusers/models/transformers/transformer_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * refactoring acording to acabbc0 * Update src/diffusers/models/transformers/transformer_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/models/transformers/transformer_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/models/transformers/transformer_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/models/transformers/transformer_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/models/transformers/transformer_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/models/transformers/transformer_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/models/transformers/transformer_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/models/transformers/transformer_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/models/transformers/transformer_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py Co-authored-by: YiYi Xu <[email protected]> * fixed * style +copies * Update src/diffusers/models/transformers/transformer_kandinsky.py Co-authored-by: Charles <[email protected]> * more * Apply suggestions from code review * add lora loader doc * add compiled Nabla Attention * all needed changes for 10 sec models are added! * add docs * Apply style fixes * update docs * add kandinsky5 to toctree * add tests * fix tests * Apply style fixes * update tests --------- Co-authored-by: Álvaro Somoza <[email protected]> Co-authored-by: YiYi Xu <[email protected]> Co-authored-by: Charles <[email protected]> Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent 6d1a648 commit 5afbcce

File tree

7 files changed

+468
-2
lines changed

7 files changed

+468
-2
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -525,6 +525,8 @@
525525
title: Kandinsky 2.2
526526
- local: api/pipelines/kandinsky3
527527
title: Kandinsky 3
528+
- local: api/pipelines/kandinsky5
529+
title: Kandinsky 5
528530
- local: api/pipelines/kolors
529531
title: Kolors
530532
- local: api/pipelines/latent_consistency_models
Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
3+
the License. You may obtain a copy of the License at
4+
http://www.apache.org/licenses/LICENSE-2.0
5+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
6+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
7+
specific language governing permissions and limitations under the License.
8+
-->
9+
10+
# Kandinsky 5.0
11+
12+
Kandinsky 5.0 is created by the Kandinsky team: Alexey Letunovskiy, Maria Kovaleva, Ivan Kirillov, Lev Novitskiy, Denis Koposov, Dmitrii Mikhailov, Anna Averchenkova, Andrey Shutkin, Julia Agafonova, Olga Kim, Anastasiia Kargapoltseva, Nikita Kiselev, Anna Dmitrienko, Anastasia Maltseva, Kirill Chernyshev, Ilia Vasiliev, Viacheslav Vasilev, Vladimir Polovnikov, Yury Kolabushin, Alexander Belykh, Mikhail Mamaev, Anastasia Aliaskina, Tatiana Nikulina, Polina Gavrilova, Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko, Denis Dimitrov
13+
14+
15+
Kandinsky 5.0 is a family of diffusion models for Video & Image generation. Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger models and offers the best understanding of Russian concepts in the open-source ecosystem.
16+
17+
The model introduces several key innovations:
18+
- **Latent diffusion pipeline** with **Flow Matching** for improved training stability
19+
- **Diffusion Transformer (DiT)** as the main generative backbone with cross-attention to text embeddings
20+
- Dual text encoding using **Qwen2.5-VL** and **CLIP** for comprehensive text understanding
21+
- **HunyuanVideo 3D VAE** for efficient video encoding and decoding
22+
- **Sparse attention mechanisms** (NABLA) for efficient long-sequence processing
23+
24+
The original codebase can be found at [ai-forever/Kandinsky-5](https://github.com/ai-forever/Kandinsky-5).
25+
26+
> [!TIP]
27+
> Check out the [AI Forever](https://huggingface.co/ai-forever) organization on the Hub for the official model checkpoints for text-to-video generation, including pretrained, SFT, no-CFG, and distilled variants.
28+
29+
## Available Models
30+
31+
Kandinsky 5.0 T2V Lite comes in several variants optimized for different use cases:
32+
33+
| model_id | Description | Use Cases |
34+
|------------|-------------|-----------|
35+
| **ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers** | 5 second Supervised Fine-Tuned model | Highest generation quality |
36+
| **ai-forever/Kandinsky-5.0-T2V-Lite-sft-10s-Diffusers** | 10 second Supervised Fine-Tuned model | Highest generation quality |
37+
| **ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-5s-Diffusers** | 5 second Classifier-Free Guidance distilled | 2× faster inference |
38+
| **ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-10s-Diffusers** | 10 second Classifier-Free Guidance distilled | 2× faster inference |
39+
| **ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers** | 5 second Diffusion distilled to 16 steps | 6× faster inference, minimal quality loss |
40+
| **ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-10s-Diffusers** | 10 second Diffusion distilled to 16 steps | 6× faster inference, minimal quality loss |
41+
| **ai-forever/Kandinsky-5.0-T2V-Lite-pretrain-5s-Diffusers** | 5 second Base pretrained model | Research and fine-tuning |
42+
| **ai-forever/Kandinsky-5.0-T2V-Lite-pretrain-10s-Diffusers** | 10 second Base pretrained model | Research and fine-tuning |
43+
44+
All models are available in 5-second and 10-second video generation versions.
45+
46+
## Kandinsky5T2VPipeline
47+
48+
[[autodoc]] Kandinsky5T2VPipeline
49+
- all
50+
- __call__
51+
52+
## Usage Examples
53+
54+
### Basic Text-to-Video Generation
55+
56+
```python
57+
import torch
58+
from diffusers import Kandinsky5T2VPipeline
59+
from diffusers.utils import export_to_video
60+
61+
# Load the pipeline
62+
model_id = "ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers"
63+
pipe = Kandinsky5T2VPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16)
64+
pipe = pipe.to("cuda")
65+
66+
# Generate video
67+
prompt = "A cat and a dog baking a cake together in a kitchen."
68+
negative_prompt = "Static, 2D cartoon, cartoon, 2d animation, paintings, images, worst quality, low quality, ugly, deformed, walking backwards"
69+
70+
output = pipe(
71+
prompt=prompt,
72+
negative_prompt=negative_prompt,
73+
height=512,
74+
width=768,
75+
num_frames=121, # ~5 seconds at 24fps
76+
num_inference_steps=50,
77+
guidance_scale=5.0,
78+
).frames[0]
79+
80+
export_to_video(output, "output.mp4", fps=24, quality=9)
81+
```
82+
83+
### 10 second Models
84+
**⚠️ Warning!** all 10 second models should be used with Flex attention and max-autotune-no-cudagraphs compilation:
85+
86+
```python
87+
pipe = Kandinsky5T2VPipeline.from_pretrained(
88+
"ai-forever/Kandinsky-5.0-T2V-Lite-sft-10s-Diffusers",
89+
torch_dtype=torch.bfloat16
90+
)
91+
pipe = pipe.to("cuda")
92+
93+
pipe.transformer.set_attention_backend(
94+
"flex"
95+
) # <--- Sett attention bakend to Flex
96+
pipe.transformer.compile(
97+
mode="max-autotune-no-cudagraphs",
98+
dynamic=True
99+
) # <--- Compile with max-autotune-no-cudagraphs
100+
101+
prompt = "A cat and a dog baking a cake together in a kitchen."
102+
negative_prompt = "Static, 2D cartoon, cartoon, 2d animation, paintings, images, worst quality, low quality, ugly, deformed, walking backwards"
103+
104+
output = pipe(
105+
prompt=prompt,
106+
negative_prompt=negative_prompt,
107+
height=512,
108+
width=768,
109+
num_frames=241,
110+
num_inference_steps=50,
111+
guidance_scale=5.0,
112+
).frames[0]
113+
114+
export_to_video(output, "output.mp4", fps=24, quality=9)
115+
```
116+
117+
### Diffusion Distilled model
118+
**⚠️ Warning!** all nocfg and diffusion distilled models should be infered wothout CFG (```guidance_scale=1.0```):
119+
120+
```python
121+
model_id = "ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers"
122+
pipe = Kandinsky5T2VPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16)
123+
pipe = pipe.to("cuda")
124+
125+
output = pipe(
126+
prompt="A beautiful sunset over mountains",
127+
num_inference_steps=16, # <--- Model is distilled in 16 steps
128+
guidance_scale=1.0, # <--- no CFG
129+
).frames[0]
130+
131+
export_to_video(output, "output.mp4", fps=24, quality=9)
132+
```
133+
134+
135+
## Citation
136+
```bibtex
137+
@misc{kandinsky2025,
138+
author = {Alexey Letunovskiy and Maria Kovaleva and Ivan Kirillov and Lev Novitskiy and Denis Koposov and
139+
Dmitrii Mikhailov and Anna Averchenkova and Andrey Shutkin and Julia Agafonova and Olga Kim and
140+
Anastasiia Kargapoltseva and Nikita Kiselev and Vladimir Arkhipkin and Vladimir Korviakov and
141+
Nikolai Gerasimenko and Denis Parkhomenko and Anna Dmitrienko and Anastasia Maltseva and
142+
Kirill Chernyshev and Ilia Vasiliev and Viacheslav Vasilev and Vladimir Polovnikov and
143+
Yury Kolabushin and Alexander Belykh and Mikhail Mamaev and Anastasia Aliaskina and
144+
Tatiana Nikulina and Polina Gavrilova and Denis Dimitrov},
145+
title = {Kandinsky 5.0: A family of diffusion models for Video & Image generation},
146+
howpublished = {\url{https://github.com/ai-forever/Kandinsky-5}},
147+
year = 2025
148+
}
149+
```

src/diffusers/models/transformers/transformer_kandinsky.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -324,6 +324,7 @@ def apply_rotary(x, rope):
324324
sparse_params["sta_mask"],
325325
thr=sparse_params["P"],
326326
)
327+
327328
else:
328329
attn_mask = None
329330

@@ -335,6 +336,7 @@ def apply_rotary(x, rope):
335336
backend=self._attention_backend,
336337
parallel_config=self._parallel_config,
337338
)
339+
338340
hidden_states = hidden_states.flatten(-2, -1)
339341

340342
attn_out = attn.out_layer(hidden_states)

src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -173,8 +173,10 @@ def __init__(
173173
)
174174
self.prompt_template_encode_start_idx = 129
175175

176-
self.vae_scale_factor_temporal = vae.config.temporal_compression_ratio
177-
self.vae_scale_factor_spatial = vae.config.spatial_compression_ratio
176+
self.vae_scale_factor_temporal = (
177+
self.vae.config.temporal_compression_ratio if getattr(self, "vae", None) else 4
178+
)
179+
self.vae_scale_factor_spatial = self.vae.config.spatial_compression_ratio if getattr(self, "vae", None) else 8
178180
self.video_processor = VideoProcessor(vae_scale_factor=self.vae_scale_factor_spatial)
179181

180182
@staticmethod
@@ -384,6 +386,9 @@ def encode_prompt(
384386
device = device or self._execution_device
385387
dtype = dtype or self.text_encoder.dtype
386388

389+
if not isinstance(prompt, list):
390+
prompt = [prompt]
391+
387392
batch_size = len(prompt)
388393

389394
prompt = [prompt_clean(p) for p in prompt]

tests/pipelines/kandinsky5/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)