-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Kandinsky 5 is finally in Diffusers! #12478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Could you please update the PR with test code and some example outputs? |
|
Sure! |
|
The example is here: Just like in Wan:
|
|
Dear @sayakpaul @yiyixuxu @DN6 |
import torch
from diffusers import Kandinsky5T2VPipeline
from diffusers.utils import export_to_video
pipe = Kandinsky5T2VPipeline.from_pretrained(
"ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers",
torch_dtype=torch.bfloat16
)
pipe = pipe.to("cuda")
negative_prompt = [
"Static, 2D cartoon, cartoon, 2d animation, paintings, images, worst quality, low quality, ugly, deformed, walking backwards",
]
prompt = [
"A cat and a dog baking a cake together in a kitchen.",
]
output = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
height=512,
width=768,
num_frames=121,
num_inference_steps=50,
guidance_scale=5.0,
num_videos_per_prompt=1,
generator=torch.Generator(42)
)output.10.mp4prompt = [
"A monkey ridign a skateboard",
]output.10.mp4prompt = [
"Several giant wooly mammoths threading through the meadow",
]output.10.mp4 |
|
Great, thanks for providing the examples! Does the model also do realistic generations? 👀 @linoytsaban @apolinario @asomoza in case you wanna test it? |
|
Yes of course! A stylish woman struts confidently down a rain-drenched Tokyo street, where vibrant neon signs flicker and pulse with electric color. She wears a sleek black leather jacket over a flowing red dress, paired with polished black boots and a matching black purse. Her sunglasses reflect the glowing cityscape as she moves with a calm, assured demeanor, red lipstick adding a bold contrast to her look. The wet pavement mirrors the dazzling lights, doubling the intensity of the urban glow around her. Pedestrians bustle along the sidewalks, their silhouettes blending into the dynamic, cinematic atmosphere of the neon-lit metropolis. output.10.mp4A cinematic movie trailer unfolds with a 30-year-old space man traversing a vast salt desert beneath a brilliant blue sky. He wears a uniquely styled red wool knitted motorcycle helmet, adding an eccentric yet rugged charm to his spacefaring look. As he rides a retro-futuristic vehicle across the shimmering white terrain, the wind kicks up clouds of glittering salt, creating a surreal atmosphere. The scene is captured in a vivid, cinematic style, shot on 35mm film to enhance the nostalgic and dramatic grain. Explosions of color and dynamic camera movements highlight the space man's daring escape from a collapsing alien base in the distance. output.11.mp4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, looks cool! left some suggestions for unused imports
Co-authored-by: Álvaro Somoza <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
|
@yiyixuxu |
|
@leffff just want to let you know that I've been testing the 10s model and I'm really impressed with it, I like it a lot, congrats to the team. Can't wait for when you release the I2V one. kangaroo.mp4 |
|
@asomoza Great! Gonna add them in the next iteration! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will merge once CI is green!
|
Hurrah!!! |
|
@leffff |
|
Hi! |
|
You need to add a page like: https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/kandinsky.md |
|
Great! Thanks! |
Yes, you are right. I tried doing pipe.transformer.set_attention_backend("flex")and it almost worked. You see, when I made separate processors, I did this: class Kandinsky5NablaAttentionProcessor(nn.Module):
"""Custom attention processor for Nabla attention"""
@torch.compile(mode="max-autotune-no-cudagraphs", dynamic=True)
def __call__(
self,
attn,
query,
key,
value,
sparse_params=None,
**kwargs,
):
if sparse_params is None:
raise ValueError("sparse_params is required for Nabla attention")
query = query.transpose(1, 2).contiguous()
key = key.transpose(1, 2).contiguous()
value = value.transpose(1, 2).contiguous()
block_mask = nablaT_v2(
query,
key,
sparse_params["sta_mask"],
thr=sparse_params["P"],
)
out = (
flex_attention(query, key, value, block_mask=block_mask)
.transpose(1, 2)
.contiguous()
)
out = out.flatten(-2, -1)
return out |
What do you mean? It didn't work as expected or are we good? 👀 |
|
It worked as expected, yet it's not everything. Flex requires additional compilation. Please see #12520 |
|
I will reply to that PR. |
What does this PR do?
This PR adds Kandinsky5T2VPipeline and Kandinsky5Transformer3DModel as well as several layer classes neede for Kandinsky 5.0 Lite T2V model
@sayakpaul Please review