-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Kandinsky 5 10 sec (NABLA suport) #12520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
leffff
wants to merge
101
commits into
huggingface:main
Choose a base branch
from
leffff:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+153
−0
Open
Changes from 95 commits
Commits
Show all changes
101 commits
Select commit
Hold shift + click to select a range
d53f848
add transformer pipeline first version
leffff 7db6093
updates
leffff a0cf07f
fix 5sec generation
leffff 0bd738f
Merge branch 'huggingface:main' into main
leffff c8f3a36
rewrite Kandinsky5T2VPipeline to diffusers style
leffff 86b6c2b
Merge branch 'huggingface:main' into main
leffff 723d149
add multiprompt support
leffff 22e14bd
remove prints in pipeline
leffff 70fa62b
add nabla attention
leffff 07e11b2
Merge branch 'huggingface:main' into main
leffff 45240a7
Wrap Transformer in Diffusers style
leffff 43bd1e8
fix license
leffff f35c279
Merge branch 'huggingface:main' into main
leffff 149fd53
fix prompt type
leffff e3a3e9d
Merge branch 'main' of https://github.com/leffff/diffusers
leffff 7af80e9
add gradient checkpointing and peft support
leffff 04efb19
add usage example
leffff 4aa22f3
Merge branch 'main' into main
leffff 235f0d5
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 88a8eea
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff f52f3b4
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 0190e55
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff d62dffc
Update src/diffusers/models/transformers/transformer_kandinsky.py
leffff 7084106
remove unused imports
leffff d5dcd94
Merge branch 'huggingface:main' into main
leffff b615d5c
add 10 second models support
leffff 6a0233e
Merge branch 'main' of https://github.com/leffff/diffusers
leffff 588c12a
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 327ab84
remove no_grad and simplified prompt paddings
leffff 9b06afb
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 8fd22c0
merge
leffff 28458d0
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff e7b91ed
merge suggestions
leffff cd3cc61
moved template to __init__
leffff 4450265
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff b9a3be2
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 78a23b9
Update src/diffusers/models/transformers/transformer_kandinsky.py
leffff 56b90b1
moved sdps inside processor
leffff 600e9d6
Merge branch 'main' of https://github.com/leffff/diffusers
leffff 31a1474
remove oneline function
leffff 894aa98
remove reset_dtype methods
leffff c8be081
Transformer: move all methods to forward
leffff 3ffdf7f
separated prompt encoding
leffff b0e1b86
Merge branch 'main' into main
leffff 9f52335
Update src/diffusers/models/transformers/transformer_kandinsky.py
leffff cc46e2d
refactoring
leffff 573b966
Merge branch 'main' of https://github.com/leffff/diffusers
leffff 9672c6b
Update src/diffusers/models/transformers/transformer_kandinsky.py
leffff 1e597cb
Merge branch 'main' of https://github.com/leffff/diffusers
leffff 900feba
refactoring acording to https://github.com/huggingface/diffusers/comm…
leffff 3839f5e
Merge branch 'main' into main
yiyixuxu 226bbf8
Update src/diffusers/models/transformers/transformer_kandinsky.py
leffff 9504fb0
Update src/diffusers/models/transformers/transformer_kandinsky.py
leffff f0eca08
Update src/diffusers/models/transformers/transformer_kandinsky.py
leffff cc74c1e
Update src/diffusers/models/transformers/transformer_kandinsky.py
leffff cb915d7
Update src/diffusers/models/transformers/transformer_kandinsky.py
leffff 9aa3c2e
Update src/diffusers/models/transformers/transformer_kandinsky.py
leffff feac8f0
Update src/diffusers/models/transformers/transformer_kandinsky.py
leffff d3b9597
Update src/diffusers/models/transformers/transformer_kandinsky.py
leffff 693b9aa
Update src/diffusers/models/transformers/transformer_kandinsky.py
leffff e2ed6ec
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 2925447
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff b02ad82
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff dc67c2b
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff d0fc426
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 222ba4c
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 3a49505
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 1e12017
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 5a30079
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 0d96ecf
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff aadafc1
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 54cf03c
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 22c503f
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 211d3dd
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 70cfb9e
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 6e83133
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 7ad87f3
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff bf229af
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff 06afd9b
Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py
leffff e1a635e
fixed
leffff e4856e5
Merge branch 'main' into main
leffff 1bf19f0
style +copies
yiyixuxu 1746f6d
Update src/diffusers/models/transformers/transformer_kandinsky.py
yiyixuxu 5bb1657
more
yiyixuxu a26300f
Apply suggestions from code review
yiyixuxu ecbe522
add lora loader doc
yiyixuxu 11200b4
Merge branch 'huggingface:main' into main
leffff b35445c
add compiled Nabla Attention
leffff 51b078c
Merge branch 'huggingface:main' into main
leffff 4ed2f53
Merge branch 'main' into main
sayakpaul 54e7757
all needed changes for 10 sec models are added!
leffff 939f7d0
Merge branch 'main' of https://github.com/leffff/diffusers
leffff 91133e0
Merge branch 'huggingface:main' into main
leffff 25f2e9c
add docs
leffff e45c036
Merge branch 'huggingface:main' into main
leffff 3bbc232
Apply style fixes
github-actions[bot] e181f13
Merge branch 'huggingface:main' into main
leffff dd6bf39
update docs
leffff add757b
Merge branch 'main' into main
yiyixuxu 5fb528b
add kandinsky5 to toctree
leffff c9c1190
Merge branch 'main' of https://github.com/leffff/diffusers
leffff File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,109 @@ | ||
| <!--Copyright 2025 The HuggingFace Team. All rights reserved. | ||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
| --> | ||
|
|
||
| # Kandinsky 5.0 | ||
|
|
||
| Kandinsky 5.0 is created by the Kandinsky team: Alexey Letunovskiy, Maria Kovaleva, Ivan Kirillov, Lev Novitskiy, Denis Koposov, Dmitrii Mikhailov, Anna Averchenkova, Andrey Shutkin, Julia Agafonova, Olga Kim, Anastasiia Kargapoltseva, Nikita Kiselev, Anna Dmitrienko, Anastasia Maltseva, Kirill Chernyshev, Ilia Vasiliev, Viacheslav Vasilev, Vladimir Polovnikov, Yury Kolabushin, Alexander Belykh, Mikhail Mamaev, Anastasia Aliaskina, Tatiana Nikulina, Polina Gavrilova, Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko, Denis Dimitrov | ||
|
|
||
|
|
||
| Kandinsky 5.0 is a family of diffusion models for Video & Image generation. Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger models and offers the best understanding of Russian concepts in the open-source ecosystem. | ||
|
|
||
| The model introduces several key innovations: | ||
| - **Latent diffusion pipeline** with **Flow Matching** for improved training stability | ||
| - **Diffusion Transformer (DiT)** as the main generative backbone with cross-attention to text embeddings | ||
| - Dual text encoding using **Qwen2.5-VL** and **CLIP** for comprehensive text understanding | ||
| - **HunyuanVideo 3D VAE** for efficient video encoding and decoding | ||
| - **Sparse attention mechanisms** (NABLA) for efficient long-sequence processing | ||
|
|
||
| The original codebase can be found at [ai-forever/Kandinsky-5](https://github.com/ai-forever/Kandinsky-5). | ||
|
|
||
| > [!TIP] | ||
| > Check out the [AI Forever](https://huggingface.co/ai-forever) organization on the Hub for the official model checkpoints for text-to-video generation, including pretrained, SFT, no-CFG, and distilled variants. | ||
|
|
||
| ## Available Models | ||
|
|
||
| Kandinsky 5.0 T2V Lite comes in several variants optimized for different use cases: | ||
|
|
||
| | Model Type | Description | Use Cases | | ||
| |------------|-------------|-----------| | ||
| | **SFT** | Supervised Fine-Tuned model | Highest generation quality | | ||
| | **no-CFG** | Classifier-Free Guidance distilled | 2× faster inference | | ||
| | **Distilled** | Diffusion distilled to 16 steps | 6× faster inference, minimal quality loss | | ||
| | **Pretrain** | Base pretrained model | Research and fine-tuning | | ||
|
|
||
| All models are available in 5-second and 10-second video generation versions. | ||
|
|
||
| ## Kandinsky5T2VPipeline | ||
|
|
||
| [[autodoc]] Kandinsky5T2VPipeline | ||
| - all | ||
| - __call__ | ||
|
|
||
| ## Usage Examples | ||
|
|
||
| ### Basic Text-to-Video Generation | ||
|
|
||
| ```python | ||
| import torch | ||
| from diffusers import Kandinsky5T2VPipeline | ||
| from diffusers.utils import export_to_video | ||
|
|
||
| # Load the pipeline | ||
| model_id = "ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers" | ||
| pipe = Kandinsky5T2VPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16) | ||
| pipe = pipe.to("cuda") | ||
|
|
||
| # Generate video | ||
| prompt = "A cat and a dog baking a cake together in a kitchen." | ||
| negative_prompt = "Static, 2D cartoon, cartoon, 2d animation, paintings, images, worst quality, low quality, ugly, deformed, walking backwards" | ||
|
|
||
| output = pipe( | ||
| prompt=prompt, | ||
| negative_prompt=negative_prompt, | ||
| height=512, | ||
| width=768, | ||
| num_frames=121, # ~5 seconds at 24fps | ||
| num_inference_steps=50, | ||
| guidance_scale=5.0, | ||
| ).frames[0] | ||
|
|
||
| export_to_video(output, "output.mp4", fps=24, quality=9) | ||
| ``` | ||
|
|
||
|
|
||
| ### Using Different Model Variants | ||
leffff marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ```python | ||
| # For faster generation with distilled model | ||
| model_id = "ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers" | ||
| pipe = Kandinsky5T2VPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16) | ||
| pipe = pipe.to("cuda") | ||
|
|
||
| # Generate with fewer steps | ||
| output = pipe( | ||
| prompt="A beautiful sunset over mountains", | ||
| num_inference_steps=16, # Only 16 steps needed for distilled model | ||
| guidance_scale=1.0, | ||
| ).frames[0] | ||
| ``` | ||
|
|
||
| ## Citation | ||
| ```bibtex | ||
| @misc{kandinsky2025, | ||
| author = {Alexey Letunovskiy and Maria Kovaleva and Ivan Kirillov and Lev Novitskiy and Denis Koposov and | ||
| Dmitrii Mikhailov and Anna Averchenkova and Andrey Shutkin and Julia Agafonova and Olga Kim and | ||
| Anastasiia Kargapoltseva and Nikita Kiselev and Vladimir Arkhipkin and Vladimir Korviakov and | ||
| Nikolai Gerasimenko and Denis Parkhomenko and Anna Dmitrienko and Anastasia Maltseva and | ||
| Kirill Chernyshev and Ilia Vasiliev and Viacheslav Vasilev and Vladimir Polovnikov and | ||
| Yury Kolabushin and Alexander Belykh and Mikhail Mamaev and Anastasia Aliaskina and | ||
| Tatiana Nikulina and Polina Gavrilova and Denis Dimitrov}, | ||
| title = {Kandinsky 5.0: A family of diffusion models for Video & Image generation}, | ||
| howpublished = {\url{https://github.com/ai-forever/Kandinsky-5}}, | ||
| year = 2025 | ||
| } | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.