-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Hello
Thank you for your work
I'm trying to use nodes with model https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8-v2 according to suggested workflow
Hunyan instruct loader -> Hunyan Instruct Generate -> Save image
with offloading.
I hit an interesting problem: if I'm offloading too much (e.g. block_to_swap ~28, I tried to increase it to test bigger prompt/bigger VRAM consumption during inference), after inference finished and it's time to do VAE, the generate node decides that it has a lot of free VRAM:
PRE-VAE DECODE: Clearing KV cache before decode
Post-generation cleanup: Cleared 0 cache items. VRAM: 14.41GB used, 33.58GB free
VRAM after cleanup: 33.6GB free / 48.0GB total
CUDA Out of Memory during generation: Allocation on device
Prompt executed in 183.88 seconds
and fails.
If I instead offload less (block_to_swap ~20)
PRE-VAE DECODE: Clearing KV cache before decode
Post-generation cleanup: Cleared 0 cache items. VRAM: 33.63GB used, 14.35GB free
VRAM after cleanup: 14.4GB free / 48.0GB total
Low VRAM (14.4GB) - enabling VAE tiling for decode
Generation complete: 130.5s
it detects that it has low current VRAM and decides for another offloading VAE behaviour, and succeeds.
It is not very impactful, but may potentially harm while testing other modes.
Could you perhaps give an advice how to get deterministic controllable VAE behaviour?