Skip to content

Unstable OOM due to automatic load VAE behaviour #22

@Vesemir

Description

@Vesemir

Hello

Thank you for your work
I'm trying to use nodes with model https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8-v2 according to suggested workflow

Hunyan instruct loader -> Hunyan Instruct Generate -> Save image

with offloading.

I hit an interesting problem: if I'm offloading too much (e.g. block_to_swap ~28, I tried to increase it to test bigger prompt/bigger VRAM consumption during inference), after inference finished and it's time to do VAE, the generate node decides that it has a lot of free VRAM:

PRE-VAE DECODE: Clearing KV cache before decode
Post-generation cleanup: Cleared 0 cache items. VRAM: 14.41GB used, 33.58GB free
  VRAM after cleanup: 33.6GB free / 48.0GB total
CUDA Out of Memory during generation: Allocation on device
Prompt executed in 183.88 seconds

and fails.

If I instead offload less (block_to_swap ~20)

PRE-VAE DECODE: Clearing KV cache before decode
Post-generation cleanup: Cleared 0 cache items. VRAM: 33.63GB used, 14.35GB free
  VRAM after cleanup: 14.4GB free / 48.0GB total
  Low VRAM (14.4GB) - enabling VAE tiling for decode
Generation complete: 130.5s

it detects that it has low current VRAM and decides for another offloading VAE behaviour, and succeeds.

It is not very impactful, but may potentially harm while testing other modes.

Could you perhaps give an advice how to get deterministic controllable VAE behaviour?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions