Skip to content

Conversation

@sywangyi
Copy link
Contributor

@sywangyi sywangyi commented Oct 21, 2025

@sayakpaul @dg845 please help review, test script

import torch
import numpy as np
from diffusers import WanPipeline, AutoencoderKLWan, WanTransformer3DModel, UniPCMultistepScheduler
from diffusers.utils import export_to_video, load_image

dtype = torch.bfloat16
device = "xpu"
access_token = "hf_xxxxxxxxxxxxxxxxxxxxxx"

model_id = "Wan-AI/Wan2.2-TI2V-5B-Diffusers"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32, token=access_token)
pipe = WanPipeline.from_pretrained(model_id, vae=vae, torch_dtype=dtype, token=access_token)
pipe.enable_model_cpu_offload()
print(torch.xpu.max_memory_allocated())
pipe.vae.enable_tiling(tile_sample_min_height=480,tile_sample_min_width=960,tile_sample_stride_height=352,tile_sample_stride_width=640)
height = 704
width = 1280
num_frames = 20
num_inference_steps = 50
guidance_scale = 5.0


prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
negative_prompt = "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发>灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸
形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背>景人很多,倒着走"
output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=height,
    width=width,
    num_frames=num_frames,
    guidance_scale=guidance_scale,
    num_inference_steps=num_inference_steps,
).frames[0]
export_to_video(output, "5bit2v_output.mp4", fps=24)
print(torch.xpu.max_memory_allocated())

@sywangyi
Copy link
Contributor Author

cuda should have similar issue

@sywangyi sywangyi changed the title fix crash in tiling mode is enabled fix crash if tiling mode is enabled Oct 21, 2025
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR.

But before we go on reviewing it, could you please:

  • Include an error trace that you get without the changes from this PR?
  • Include an output with the changes from this PR?
  • Additionally, the changes introduced in this PR seem non-intrusive to me. So, if you add comments to explain those changes, that'd be super nice.

Signed-off-by: Wang, Yi A <[email protected]>
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sywangyi
Copy link
Contributor Author

wo the change, crash like
Traceback (most recent call last):
File "/workspace/test.py", line 27, in
output = pipe(
^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/utils/contextlib.py", line 120, in decorate context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/pipelines/wan/pipeline_wan.py", line 645, in call
video = self.vae.decode(latents, return_dict=False)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/utils/accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 1248, in decode
decoded = self._decode(z).sample
^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 1204, in _decode
return self.tiled_decode(z, return_dict=return_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 1374, in tiled_decode
decoded = self.decoder(tile, feat_cache=self._feat_map, feat_idx=self._conv_idx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped _call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_im pl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 892, i n forward
x = up_block(x, feat_cache, feat_idx, first_chunk=first_chunk)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped _call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_im pl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl_wan.py", line 708, i n forward
x = x + self.avg_shortcut(x_copy, first_chunk=first_chunk)
~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimens ion 2

@sywangyi
Copy link
Contributor Author

image these lines aim to fix the crash. however, there's another crash after this crash is fixed.

@sayakpaul
Copy link
Member

Thanks! What about the outputs? Cc: @asomoza if you wanna help test it out a bit?

@sayakpaul
Copy link
Member

however, there's another crash after this crash is fixed.

So, it doesn't work yet?

@sywangyi
Copy link
Contributor Author

however, there's another crash after this crash is fixed.

So, it doesn't work yet?

it works, the other crash is because patch_size is not considered in tiling mode. in this model, it's 2. and this PR fix it.

@sywangyi
Copy link
Contributor Author

crash like
Traceback (most recent call last):
File "/workspace/test.py", line 36, in
export_to_video(output, "5bit2v_output.mp4", fps=24)
File "/workspace/diffusers/src/diffusers/utils/export_utils.py", line 177, in export_to_video
return _legacy_export_to_video(video_frames, output_video_path, fps)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/diffusers/src/diffusers/utils/export_utils.py", line 135, in _legacy_export_to_video
img = cv2.cvtColor(video_frames[i], cv2.COLOR_RGB2BGR)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cv2.error: OpenCV(4.11.0) /io/opencv/modules/imgproc/src/color.simd_helpers.hpp:92: error: (-15:Bad number of channels) in function 'cv::impl::{anonymous}::CvtHelper<VScn, VDcn, VDepth, sizePolicy>::CvtHelper(cv::InputArray, cv::OutputArray, int) [with VScn = cv::impl::{anonymous}::Set<3, 4>; VDcn = cv::impl::{anonymous}::Set<3, 4>; VDepth = cv::impl::{anonymous}::Set<0, 2, 5>; cv::impl::{anonymous}::SizePolicy sizePolicy = cv::impl::::NONE; cv::InputArray = const cv::_InputArray&; cv::OutputArray = const cv::_OutputArray&]'

Invalid number of channels in input image:
'VScn::contains(scn)'
where
'scn' is 12
this PR also fix it

@sayakpaul
Copy link
Member

@sywangyi would you be able to post some outputs after applying the fix?

@asomoza
Copy link
Member

asomoza commented Oct 22, 2025

tested it with a simple pipe.vae.enable_tiling() over the example code:

in fact, it doesn't work with main, but this PR also doesn't fix it, still got:

RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 2

edit: I correct myself, I did a silly mistake, this PR does fix the issue for the 5B, I'll do a comparison with main

@asomoza
Copy link
Member

asomoza commented Oct 23, 2025

here they are:

main (without tiling)

5bit2v__main_output.mp4

PR with pipe.vae.enable_tiling()

5bit2v__pr_output.mp4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants