[wan2.2] add 5b i2v #12006

yiyixuxu · 2025-07-29T00:37:07Z

import torch
import numpy as np
from diffusers import WanImageToVideoPipeline, AutoencoderKLWan, ModularPipeline
from diffusers.utils import export_to_video


model_id = "Wan-AI/Wan2.2-TI2V-5B-Diffusers"
dtype = torch.bfloat16
device = "cuda:2"

vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanImageToVideoPipeline.from_pretrained(model_id, vae=vae, torch_dtype=dtype)
pipe.enable_model_cpu_offload(device=device)

# use default wan image processor to resize and crop the image
image_processor = ModularPipeline.from_pretrained("YiYiXu/WanImageProcessor", trust_remote_code=True)
image = image_processor(
    image="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/wan_i2v_input.JPG",
    max_area=1280*704, output="processed_image")

height, width = image.height, image.width
print(f"height: {height}, width: {width}")
num_frames = 121
num_inference_steps = 50
guidance_scale = 5.0

prompt = "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."

negative_prompt = "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走"

output = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=height,
    width=width,
    num_frames=num_frames,
    guidance_scale=guidance_scale,
    num_inference_steps=num_inference_steps,
).frames[0]
export_to_video(output, "yiyi_test_6_ti2v_5b_output.mp4", fps=24)

HuggingFaceDocBuilderDev · 2025-07-29T00:49:16Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

nitinmukesh · 2025-07-29T11:33:45Z

Thank you @yiyixuxu .

1 query, is this fixed value or need to be reversed as per input image,
max_area=1280*704

portrait=1280*704 
landscape=704*1280

a-r-r-o-w

Thanks!

src/diffusers/pipelines/wan/pipeline_wan_i2v.py

Co-authored-by: Aryan <[email protected]>

src/diffusers/pipelines/wan/pipeline_wan_i2v.py

yiyixuxu · 2025-07-30T00:45:55Z

@nitinmukesh
it's atuallyint input so it's fixed value

zhaoyun0071 · 2025-07-30T13:09:08Z

import torch
import numpy as np
from diffusers import WanImageToVideoPipeline, AutoencoderKLWan, ModularPipeline
from diffusers.utils import export_to_video


model_id = "Wan-AI/Wan2.2-TI2V-5B-Diffusers"
dtype = torch.bfloat16
device = "cuda:2"

vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanImageToVideoPipeline.from_pretrained(model_id, vae=vae, torch_dtype=dtype)
pipe.enable_model_cpu_offload(device=device)

# use default wan image processor to resize and crop the image
image_processor = ModularPipeline.from_pretrained("YiYiXu/WanImageProcessor", trust_remote_code=True)
image = image_processor(
    image="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/wan_i2v_input.JPG",
    max_area=1280*704, output="processed_image")

height, width = image.height, image.width
print(f"height: {height}, width: {width}")
num_frames = 121
num_inference_steps = 50
guidance_scale = 5.0

prompt = "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."

negative_prompt = "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走"

output = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=height,
    width=width,
    num_frames=num_frames,
    guidance_scale=guidance_scale,
    num_inference_steps=num_inference_steps,
).frames[0]
export_to_video(output, "yiyi_test_6_ti2v_5b_output.mp4", fps=24)

进度跑满后，最后关头占用显存直接飙升到30G左右，导致非常慢，我是3090显卡

JoeGaffney · 2025-07-31T12:38:52Z

Hey,

With Wan2.1 we was able to pass just a RGB PIl image. With 2.2 i get

def _conv_forward(self, input: Tensor, weight: Tensor, bias: Optional[Tensor]):
        if self.padding_mode != "zeros":
            return F.conv3d(
                F.pad(
                    input, self._reversed_padding_repeated_twice, mode=self.padding_mode
                ),
                weight,
                bias,
                self.stride,
                _triple(0),
                self.dilation,
                self.groups,
            )
>       return F.conv3d(
            input, weight, bias, self.stride, self.padding, self.dilation, self.groups
        )
E       RuntimeError: Given groups=1, weight of size [160, 12, 3, 3, 3], expected input[1, 3, 3, 258, 258] to have 12 channels, but got 3 channels instead

/opt/conda/lib/python3.11/site-packages/torch/nn/modules/conv.py:720: RuntimeError

Cheers,
Joe

yiyixuxu · 2025-08-01T09:20:30Z

hi @JoeGaffney
there is no code attached so i'm not 100% sure I understand the issue here, but the error might be caused by the fact that with wan 2.2 there is a patchify/unpatchify step

diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py

Line 1182 in 20e0740

x = patchify(x, patch_size=self.config.patch_size)

chensongkui · 2025-08-05T10:00:55Z

import torch
import numpy as np
from diffusers import WanImageToVideoPipeline, AutoencoderKLWan, ModularPipeline
from diffusers.utils import export_to_video


model_id = "Wan-AI/Wan2.2-TI2V-5B-Diffusers"
dtype = torch.bfloat16
device = "cuda:2"

vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanImageToVideoPipeline.from_pretrained(model_id, vae=vae, torch_dtype=dtype)
pipe.enable_model_cpu_offload(device=device)

# use default wan image processor to resize and crop the image
image_processor = ModularPipeline.from_pretrained("YiYiXu/WanImageProcessor", trust_remote_code=True)
image = image_processor(
    image="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/wan_i2v_input.JPG",
    max_area=1280*704, output="processed_image")

height, width = image.height, image.width
print(f"height: {height}, width: {width}")
num_frames = 121
num_inference_steps = 50
guidance_scale = 5.0

prompt = "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."

negative_prompt = "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走"

output = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=height,
    width=width,
    num_frames=num_frames,
    guidance_scale=guidance_scale,
    num_inference_steps=num_inference_steps,
).frames[0]
export_to_video(output, "yiyi_test_6_ti2v_5b_output.mp4", fps=24)

do you run this code successfully? I run this code fail. I met some problems, like lacking image processor and image encoder, i have solved these problems. now i met a new problem, as the picture showed above. I think this phenomenon is unreasonable. Theoretically, this code should be run successfully.

JoeGaffney · 2025-08-05T10:21:08Z

hi @JoeGaffney there is no code attached so i'm not 100% sure I understand the issue here, but the error might be caused by the fact that with wan 2.2 there is a patchify/unpatchify step

diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py

Line 1182 in 20e0740

x = patchify(x, patch_size=self.config.patch_size)

Hey @yiyixuxu it was resolved in the other ticket it was vae.enable_tiling()

nitinmukesh · 2025-08-05T16:00:38Z

@JoeGaffney

Which ticket fixed enable_tiling. I'm getting OOM even after installing diffusers from source.

* add 5b ti2v * remove a copy * Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py Co-authored-by: Aryan <[email protected]> * Apply suggestions from code review --------- Co-authored-by: Aryan <[email protected]>

ares89 · 2025-08-19T07:47:09Z

import torch
import numpy as np
from diffusers import WanImageToVideoPipeline, AutoencoderKLWan, ModularPipeline
from diffusers.utils import export_to_video


model_id = "Wan-AI/Wan2.2-TI2V-5B-Diffusers"
dtype = torch.bfloat16
device = "cuda:2"

vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanImageToVideoPipeline.from_pretrained(model_id, vae=vae, torch_dtype=dtype)
pipe.enable_model_cpu_offload(device=device)

# use default wan image processor to resize and crop the image
image_processor = ModularPipeline.from_pretrained("YiYiXu/WanImageProcessor", trust_remote_code=True)
image = image_processor(
    image="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/wan_i2v_input.JPG",
    max_area=1280*704, output="processed_image")

height, width = image.height, image.width
print(f"height: {height}, width: {width}")
num_frames = 121
num_inference_steps = 50
guidance_scale = 5.0

prompt = "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."

negative_prompt = "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走"

output = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=height,
    width=width,
    num_frames=num_frames,
    guidance_scale=guidance_scale,
    num_inference_steps=num_inference_steps,
).frames[0]
export_to_video(output, "yiyi_test_6_ti2v_5b_output.mp4", fps=24)

进度跑满后，最后关头占用显存直接飙升到30G左右，导致非常慢，我是3090显卡

have you solved this problem?你解决这个问题了吗？怎么解决的？

jerry2102 · 2025-09-15T14:14:30Z

I encounter this error when I use (width,height)=(1280, 720) calling generate pipe. the diffusers is version 0.35.1.
anyone can help with me?

from user code:
File "/usr/local/lib/python3.10/site-packages/diffusers/models/transformers/transformer_wan.py", line 478, in forward
norm_hidden_states = (self.norm1(hidden_states.float()) * (1 + scale_msa) + shift_msa).type_as(hidden_states)

dg845 · 2025-10-11T04:50:40Z

For the issue where certain resolutions such as $1280 \times 720$ lead to an error, see #12348 for more details - a mitigation is to ensure that both height and width are multiples of $32$, which is why the $1280 \times 704$ resolution works.

add 5b ti2v

613f618

remove a copy

d6b5614

yiyixuxu requested a review from a-r-r-o-w July 29, 2025 08:43

luke14free mentioned this pull request Jul 29, 2025

apply_first_block_cache with Wan 2.2 causes ValueError: No context is set. Please set a context before retrieving the state #12012

Closed

a-r-r-o-w approved these changes Jul 29, 2025

View reviewed changes

src/diffusers/pipelines/wan/pipeline_wan_i2v.py Show resolved Hide resolved

src/diffusers/pipelines/wan/pipeline_wan_i2v.py Outdated Show resolved Hide resolved

Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py

0ba5e3e

Co-authored-by: Aryan <[email protected]>

yiyixuxu commented Jul 29, 2025

View reviewed changes

src/diffusers/pipelines/wan/pipeline_wan_i2v.py Outdated Show resolved Hide resolved

yiyixuxu commented Jul 29, 2025

View reviewed changes

src/diffusers/pipelines/wan/pipeline_wan_i2v.py Outdated Show resolved Hide resolved

yiyixuxu commented Jul 29, 2025

View reviewed changes

src/diffusers/pipelines/wan/pipeline_wan_i2v.py Outdated Show resolved Hide resolved

Apply suggestions from code review

659a272

yiyixuxu mentioned this pull request Jul 30, 2025

wan2.2 i2v FirstBlockCache fix #12013

Merged

6 tasks

yiyixuxu merged commit d8854b8 into main Jul 30, 2025
13 of 15 checks passed

yiyixuxu deleted the wan5bi2v branch July 30, 2025 03:34

okaris mentioned this pull request Jul 31, 2025

Wan 2.2 5b i2v results poor quality compared to official Wan HF Space #12034

Closed

a-r-r-o-w mentioned this pull request Aug 1, 2025

Wan 2.2 VAE forward fails #12039

Closed

Uh oh!

[wan2.2] add 5b i2v #12006

[wan2.2] add 5b i2v #12006

Conversation

yiyixuxu commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 29, 2025

Uh oh!

nitinmukesh commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiyixuxu commented Jul 30, 2025

Uh oh!

Uh oh!

zhaoyun0071 commented Jul 30, 2025

Uh oh!

JoeGaffney commented Jul 31, 2025

Uh oh!

yiyixuxu commented Aug 1, 2025

Uh oh!

chensongkui commented Aug 5, 2025

Uh oh!

JoeGaffney commented Aug 5, 2025

Uh oh!

nitinmukesh commented Aug 5, 2025

Uh oh!

ares89 commented Aug 19, 2025

Uh oh!

jerry2102 commented Sep 15, 2025

Uh oh!

dg845 commented Oct 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

yiyixuxu commented Jul 29, 2025 •

edited

Loading

nitinmukesh commented Jul 29, 2025 •

edited

Loading