support Wan-FLF2V #11353

yiyixuxu · 2025-04-17T08:51:33Z

will merge once we upload our checkpoint to an official repo :)

import numpy as np
import torch
import torchvision.transforms.functional as TF
from diffusers import AutoencoderKLWan, WanImageToVideoPipeline
from diffusers.utils import export_to_video, load_image
from transformers import CLIPVisionModel

# model_id = "Wan-AI/Wan2.1-FLF2V-14B-720P-diffusers"
# using this for testing for now, will move to official repo
model_id = "YiYiXu/Wan2.1-FLF2V-14B-720P-Diffusers"

image_encoder = CLIPVisionModel.from_pretrained(model_id, subfolder="image_encoder", torch_dtype=torch.float32)
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanImageToVideoPipeline.from_pretrained(
    model_id, vae=vae, image_encoder=image_encoder, torch_dtype=torch.bfloat16
)
pipe.to("cuda")

first_frame = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_first_frame.png")
last_frame = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_last_frame.png")

def aspect_ratio_resize(image, pipe, max_area=720 * 1280):
    aspect_ratio = image.height / image.width
    mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
    height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
    width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
    image = image.resize((width, height))
    return image, height, width

def center_crop_resize(image, height, width):
    # Calculate resize ratio to match first frame dimensions
    resize_ratio = max(width / image.width, height / image.height)
    
    # Resize the image
    width = round(image.width * resize_ratio)
    height = round(image.height * resize_ratio)
    size = [width, height]
    image = TF.center_crop(image, size)
    
    return image, height, width

first_frame, height, width = aspect_ratio_resize(first_frame, pipe)
if last_frame.size != first_frame.size:
    last_frame, _, _ = center_crop_resize(last_frame, height, width)

prompt = "CG animation style, a small blue bird takes off from the ground, flapping its wings. The bird's feathers are delicate, with a unique pattern on its chest. The background shows a blue sky with white clouds under bright sunshine. The camera follows the bird upward, capturing its flight and the vastness of the sky from a close-up, low-angle perspective."

output = pipe(
    image=first_frame, last_image=last_frame, prompt=prompt, height=height, width=width, guidance_scale=5.5
).frames[0]
export_to_video(output, f"yiyi_test_7_wan-ff2v.mp4", fps=16)

yiyi_test_5_out.mp4

HuggingFaceDocBuilderDev · 2025-04-17T09:01:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

scripts/convert_wan_to_diffusers.py

a-r-r-o-w

Thanks!

a-r-r-o-w · 2025-04-17T11:08:04Z

src/diffusers/models/transformers/transformer_wan.py

-            encoder_hidden_states_img = encoder_hidden_states[:, :257]
-            encoder_hidden_states = encoder_hidden_states[:, 257:]
+            # 512 is the context length of the text encoder, hardcoded for now
+            image_context_length = encoder_hidden_states.shape[1] - 512
+            encoder_hidden_states_img = encoder_hidden_states[:, :image_context_length]
+            encoder_hidden_states = encoder_hidden_states[:, image_context_length:]


Is this not backwards breaking? 👀

i will test it out :)

src/diffusers/models/transformers/transformer_wan.py

a-r-r-o-w · 2025-04-17T11:14:34Z

LMK if you'd like me to help add the tests and docs!

yiyixuxu · 2025-04-17T11:20:21Z

@a-r-r-o-w sounds good!

docs/source/en/api/pipelines/wan.md

yiyixuxu added 3 commits April 17, 2025 05:02

update transformer

ff6f0e6

pipeine update

c5fbcee

style

2ce7646

yiyixuxu commented Apr 17, 2025

View reviewed changes

scripts/convert_wan_to_diffusers.py Outdated Show resolved Hide resolved

yiyixuxu added 2 commits April 17, 2025 12:49

mask last frame

80a7df5

Update scripts/convert_wan_to_diffusers.py

0cf4b40

yiyixuxu requested a review from a-r-r-o-w April 17, 2025 10:55

a-r-r-o-w approved these changes Apr 17, 2025

View reviewed changes

a-r-r-o-w added 2 commits April 17, 2025 14:11

add test

1998a09

update docs

d42faa6

yiyixuxu commented Apr 17, 2025

View reviewed changes

docs/source/en/api/pipelines/wan.md Outdated Show resolved Hide resolved

yiyixuxu commented Apr 17, 2025

View reviewed changes

docs/source/en/api/pipelines/wan.md Outdated Show resolved Hide resolved

yiyixuxu commented Apr 17, 2025

View reviewed changes

docs/source/en/api/pipelines/wan.md Outdated Show resolved Hide resolved

yiyixuxu commented Apr 17, 2025

View reviewed changes

docs/source/en/api/pipelines/wan.md Outdated Show resolved Hide resolved

yiyixuxu commented Apr 17, 2025

View reviewed changes

docs/source/en/api/pipelines/wan.md Outdated Show resolved Hide resolved

Apply suggestions from code review

4ffef90

yiyixuxu added the close-to-merge label Apr 17, 2025

yiyixuxu commented Apr 18, 2025

View reviewed changes

docs/source/en/api/pipelines/wan.md Outdated Show resolved Hide resolved

Update docs/source/en/api/pipelines/wan.md

9431806

yiyixuxu merged commit 0021bfa into main Apr 18, 2025
12 of 15 checks passed

yiyixuxu deleted the wan-last-frame branch April 18, 2025 20:27

yiyixuxu removed the close-to-merge label Apr 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

support Wan-FLF2V #11353

support Wan-FLF2V #11353

Uh oh!

yiyixuxu commented Apr 17, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 17, 2025

Uh oh!

Uh oh!

a-r-r-o-w left a comment

Uh oh!

a-r-r-o-w Apr 17, 2025

Uh oh!

yiyixuxu Apr 17, 2025

Uh oh!

Uh oh!

a-r-r-o-w commented Apr 17, 2025

Uh oh!

yiyixuxu commented Apr 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

support Wan-FLF2V #11353

support Wan-FLF2V #11353

Uh oh!

Conversation

yiyixuxu commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 17, 2025

Uh oh!

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

a-r-r-o-w commented Apr 17, 2025

Uh oh!

yiyixuxu commented Apr 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yiyixuxu commented Apr 17, 2025 •

edited

Loading