WanImageToVideo, WanFirstLastFrameToVideo: Add `vae_tile_size` optional arg #10238

alexheretic · 2025-10-06T17:05:29Z

I experience slow VAE performance on my AMD RX 7900 GRE gpu and can usually improve this by opting for the tiled VAE nodes. However, WanImageToVideo does VAE encoding and is currently not configurable. This leads to wan workflows being slow for me, see benchmarks.

I propose we add a vae_tile_size optional argument to WanImageToVideo (and similar). By default this will be 0 to mean untiled, ie acting as it did previously. If set the value will be used as the x & y tile size. This allows users, like me, a way to workaround poor wan VAE untiled encode performance.

As the default behaviour is unchanged this should be backward compatible.

Alternatives

Add new "tiled" variant nodes for wan, e.g. TiledWanImageToVideo.
Automatically pick tiled encoding for certain GPUs, e.g. my gpu -> 256x256 tiled encoding.

Wan 2.1 VAE benchmarks (480x832 * 81 frames)

System info

MIOPEN_FIND_MODE=FAST

Total VRAM 16368 MB, total RAM 64217 MB
pytorch version: 2.9.0.dev20250827+rocm6.4
AMD arch: gfx1100
ROCm version: (6, 4)
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 7900 GRE : native
Using Flash Attention
Python version: 3.12.11 (main, Jun  4 2025, 10:32:37) [GCC 15.1.1 20250425]
ComfyUI version: 0.3.62
ComfyUI frontend version: 1.27.7
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.float16

VAE Encode

Benches show significant improvement using tiled vae encoding. On my setup 256x256 performed best. 589s -> 25s.

Untiled vs 512 vs 384 vs 256 vs 128

2 runs each.

untiled

Yes really 10 minutes 😞

[WanImageToVideo]: 608.79s
[WanImageToVideo]: 588.72s

tiled 512,512,32,256,8

[WanImageToVideo]: 41.86s
[WanImageToVideo]: 43.68s

tiled 384,384,32,256,8

[WanImageToVideo]: 30.41s
[WanImageToVideo]: 28.89s

tiled 256,256,32,256,8

[WanImageToVideo]: 25.00s
[WanImageToVideo]: 25.35s

tiled 128,128,32,256,8

[WanImageToVideo]: 45.57s
[WanImageToVideo]: 45.31s

VAE Decode

Benches also show significant improvement using tiled vae decoding. On my setup 256x256 performed best.
Note: Decoding is already a separate node so no code changes required, this is just kinda related and perhaps interesting.

Untiled vs 512 vs 384 vs 256 vs 128

4 runs each (where possible).

untiled

OOM 😢

tiled 512,512,32,124,8

OOM 😢

tiled 384,384,32,124,8

[VAEDecodeTiled]: 73.94s
[VAEDecodeTiled]: 99.03s
[VAEDecodeTiled]: 62.71s
[VAEDecodeTiled]: 66.34s

tiled 256,256,32,124,8

[VAEDecodeTiled]: 60.79s
[VAEDecodeTiled]: 61.21s
[VAEDecodeTiled]: 54.53s
[VAEDecodeTiled]: 47.72s

tiled 128,128,32,124,8

[VAEDecodeTiled]: 72.18s
[VAEDecodeTiled]: 71.70s
[VAEDecodeTiled]: 71.47s
[VAEDecodeTiled]: 71.29s

… arg

WanImageToVideo, WanFirstLastFrameToVideo: Add vae_tile_size optional…

cc33701

… arg

alexheretic requested a review from Kosinkadink as a code owner October 6, 2025 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WanImageToVideo, WanFirstLastFrameToVideo: Add `vae_tile_size` optional arg #10238

WanImageToVideo, WanFirstLastFrameToVideo: Add `vae_tile_size` optional arg #10238

alexheretic commented Oct 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

WanImageToVideo, WanFirstLastFrameToVideo: Add vae_tile_size optional arg #10238

Are you sure you want to change the base?

WanImageToVideo, WanFirstLastFrameToVideo: Add vae_tile_size optional arg #10238

Conversation

alexheretic commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Alternatives

Wan 2.1 VAE benchmarks (480x832 * 81 frames)

VAE Encode

untiled

tiled 512,512,32,256,8

tiled 384,384,32,256,8

tiled 256,256,32,256,8

tiled 128,128,32,256,8

VAE Decode

untiled

tiled 512,512,32,124,8

tiled 384,384,32,124,8

tiled 256,256,32,124,8

tiled 128,128,32,124,8

Uh oh!

Uh oh!

WanImageToVideo, WanFirstLastFrameToVideo: Add `vae_tile_size` optional arg #10238

WanImageToVideo, WanFirstLastFrameToVideo: Add `vae_tile_size` optional arg #10238

alexheretic commented Oct 6, 2025 •

edited

Loading