WanImageToVideo, WanFirstLastFrameToVideo: Add vae_tile_size
optional arg
#10238
+14
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I experience slow VAE performance on my AMD RX 7900 GRE gpu and can usually improve this by opting for the tiled VAE nodes. However,
WanImageToVideo
does VAE encoding and is currently not configurable. This leads to wan workflows being slow for me, see benchmarks.I propose we add a
vae_tile_size
optional argument toWanImageToVideo
(and similar). By default this will be0
to mean untiled, ie acting as it did previously. If set the value will be used as the x & y tile size. This allows users, like me, a way to workaround poor wan VAE untiled encode performance.As the default behaviour is unchanged this should be backward compatible.
Alternatives
TiledWanImageToVideo
.Wan 2.1 VAE benchmarks (480x832 * 81 frames)
System info
MIOPEN_FIND_MODE=FAST
VAE Encode
Benches show significant improvement using tiled vae encoding. On my setup 256x256 performed best. 589s -> 25s.
Untiled vs 512 vs 384 vs 256 vs 128
2 runs each.
untiled
Yes really 10 minutes 😞
tiled 512,512,32,256,8
tiled 384,384,32,256,8
tiled 256,256,32,256,8
tiled 128,128,32,256,8
VAE Decode
Benches also show significant improvement using tiled vae decoding. On my setup 256x256 performed best.
Note: Decoding is already a separate node so no code changes required, this is just kinda related and perhaps interesting.
Untiled vs 512 vs 384 vs 256 vs 128
4 runs each (where possible).
untiled
OOM 😢
tiled 512,512,32,124,8
OOM 😢
tiled 384,384,32,124,8
tiled 256,256,32,124,8
tiled 128,128,32,124,8