fix crash if tiling mode is enabled #12521

sywangyi · 2025-10-21T13:40:54Z

@sayakpaul @dg845 please help review, test script

import torch
import numpy as np
from diffusers import WanPipeline, AutoencoderKLWan, WanTransformer3DModel, UniPCMultistepScheduler
from diffusers.utils import export_to_video, load_image

dtype = torch.bfloat16
device = "xpu"
access_token = "hf_xxxxxxxxxxxxxxxxxxxxxx"

model_id = "Wan-AI/Wan2.2-TI2V-5B-Diffusers"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32, token=access_token)
pipe = WanPipeline.from_pretrained(model_id, vae=vae, torch_dtype=dtype, token=access_token)
pipe.enable_model_cpu_offload()
print(torch.xpu.max_memory_allocated())
pipe.vae.enable_tiling(tile_sample_min_height=480,tile_sample_min_width=960,tile_sample_stride_height=352,tile_sample_stride_width=640)
height = 704
width = 1280
num_frames = 20
num_inference_steps = 50
guidance_scale = 5.0


prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
negative_prompt = "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发>灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸
形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背>景人很多，倒着走"
output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=height,
    width=width,
    num_frames=num_frames,
    guidance_scale=guidance_scale,
    num_inference_steps=num_inference_steps,
).frames[0]
export_to_video(output, "5bit2v_output.mp4", fps=24)
print(torch.xpu.max_memory_allocated())

Signed-off-by: Wang, Yi A <[email protected]>

sywangyi · 2025-10-21T13:42:00Z

cuda should have similar issue

sayakpaul

Thanks for your PR.

But before we go on reviewing it, could you please:

Include an error trace that you get without the changes from this PR?
Include an output with the changes from this PR?
Additionally, the changes introduced in this PR seem non-intrusive to me. So, if you add comments to explain those changes, that'd be super nice.

Signed-off-by: Wang, Yi A <[email protected]>

HuggingFaceDocBuilderDev · 2025-10-21T13:52:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sywangyi · 2025-10-21T13:56:20Z

wo the change, crash like
Traceback (most recent call last):
File "/workspace/test.py", line 27, in
output = pipe(
^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/utils/contextlib.py", line 120, in decorate context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/pipelines/wan/pipeline_wan.py", line 645, in call
video = self.vae.decode(latents, return_dict=False)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/utils/accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 1248, in decode
decoded = self._decode(z).sample
^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 1204, in _decode
return self.tiled_decode(z, return_dict=return_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 1374, in tiled_decode
decoded = self.decoder(tile, feat_cache=self._feat_map, feat_idx=self._conv_idx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped _call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_im pl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 892, i n forward
x = up_block(x, feat_cache, feat_idx, first_chunk=first_chunk)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped _call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_im pl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 708, i n forward
x = x + self.avg_shortcut(x_copy, first_chunk=first_chunk)
~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimens ion 2

sywangyi · 2025-10-21T13:58:36Z

these lines aim to fix the crash. however, there's another crash after this crash is fixed.

sayakpaul · 2025-10-21T13:58:46Z

Thanks! What about the outputs? Cc: @asomoza if you wanna help test it out a bit?

sayakpaul · 2025-10-21T13:59:47Z

however, there's another crash after this crash is fixed.

So, it doesn't work yet?

sywangyi · 2025-10-21T14:03:30Z

however, there's another crash after this crash is fixed.

So, it doesn't work yet?

it works, the other crash is because patch_size is not considered in tiling mode. in this model, it's 2. and this PR fix it.

sywangyi · 2025-10-21T14:04:17Z

crash like
Traceback (most recent call last):
File "/workspace/test.py", line 36, in
export_to_video(output, "5bit2v_output.mp4", fps=24)
File "/workspace/diffusers/src/diffusers/utils/export_utils.py", line 177, in export_to_video
return _legacy_export_to_video(video_frames, output_video_path, fps)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/utils/export_utils.py", line 135, in _legacy_export_to_video
img = cv2.cvtColor(video_frames[i], cv2.COLOR_RGB2BGR)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cv2.error: OpenCV(4.11.0) /io/opencv/modules/imgproc/src/color.simd_helpers.hpp:92: error: (-15:Bad number of channels) in function 'cv::impl::{anonymous}::CvtHelper<VScn, VDcn, VDepth, sizePolicy>::CvtHelper(cv::InputArray, cv::OutputArray, int) [with VScn = cv::impl::{anonymous}::Set<3, 4>; VDcn = cv::impl::{anonymous}::Set<3, 4>; VDepth = cv::impl::{anonymous}::Set<0, 2, 5>; cv::impl::{anonymous}::SizePolicy sizePolicy = cv::impl::::NONE; cv::InputArray = const cv::_InputArray&; cv::OutputArray = const cv::_OutputArray&]'

Invalid number of channels in input image:
'VScn::contains(scn)'
where
'scn' is 12
this PR also fix it

sayakpaul · 2025-10-22T16:22:09Z

@sywangyi would you be able to post some outputs after applying the fix?

asomoza · 2025-10-22T21:42:08Z

tested it with a simple pipe.vae.enable_tiling() over the example code:

~~in fact, it doesn't work with main, but this PR also doesn't fix it, still got:~~

RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 2

edit: I correct myself, I did a silly mistake, this PR does fix the issue for the 5B, I'll do a comparison with main

asomoza · 2025-10-23T01:43:11Z

here they are:

main (without tiling)

5bit2v__main_output.mp4

PR with pipe.vae.enable_tiling()

5bit2v__pr_output.mp4

fix crash in tiling mode is enabled

0cc20ee

Signed-off-by: Wang, Yi A <[email protected]>

sywangyi changed the title ~~fix crash in tiling mode is enabled~~ fix crash if tiling mode is enabled Oct 21, 2025

sayakpaul reviewed Oct 21, 2025

View reviewed changes

fmt

d777895

Signed-off-by: Wang, Yi A <[email protected]>

sayakpaul mentioned this pull request Oct 22, 2025

Wan2.2 TI2V-5B Tiled VAE Tensor size mismatch #12529

Open

vladmandic mentioned this pull request Oct 24, 2025

[Issue]: WAN 2.2 5B TI2V Tiled VAE Error vladmandic/sdnext#4289

Open

2 tasks

fix crash if tiling mode is enabled #12521

Are you sure you want to change the base?

fix crash if tiling mode is enabled #12521

Uh oh!

Conversation

sywangyi commented Oct 21, 2025 • edited by sayakpaul Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sywangyi commented Oct 21, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Oct 21, 2025

Uh oh!

sywangyi commented Oct 21, 2025

Uh oh!

sywangyi commented Oct 21, 2025

Uh oh!

sayakpaul commented Oct 21, 2025

Uh oh!

sayakpaul commented Oct 21, 2025

Uh oh!

sywangyi commented Oct 21, 2025

Uh oh!

sywangyi commented Oct 21, 2025

Uh oh!

sayakpaul commented Oct 22, 2025

Uh oh!

asomoza commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asomoza commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sywangyi commented Oct 21, 2025 •

edited by sayakpaul

Loading

asomoza commented Oct 22, 2025 •

edited

Loading