Unstable OOM due to automatic load VAE behaviour

Hello

Thank you for your work
I'm trying to use nodes with model https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8-v2 according to suggested workflow
```
Hunyan instruct loader -> Hunyan Instruct Generate -> Save image
```

with offloading.

I hit an interesting problem: if I'm offloading too much (e.g. block_to_swap ~28, I tried to increase it to test bigger prompt/bigger VRAM consumption during inference), after inference finished and it's time to do VAE, the generate node decides that it has a lot of free VRAM:

```
PRE-VAE DECODE: Clearing KV cache before decode
Post-generation cleanup: Cleared 0 cache items. VRAM: 14.41GB used, 33.58GB free
  VRAM after cleanup: 33.6GB free / 48.0GB total
CUDA Out of Memory during generation: Allocation on device
Prompt executed in 183.88 seconds
```

and fails.

If I instead offload less  (block_to_swap ~20)

```
PRE-VAE DECODE: Clearing KV cache before decode
Post-generation cleanup: Cleared 0 cache items. VRAM: 33.63GB used, 14.35GB free
  VRAM after cleanup: 14.4GB free / 48.0GB total
  Low VRAM (14.4GB) - enabling VAE tiling for decode
Generation complete: 130.5s
```

it detects that it has low current VRAM and decides for another offloading VAE behaviour, and succeeds.

It is not very impactful, but may potentially harm while testing other modes.

Could you perhaps give an advice how to get deterministic controllable VAE behaviour? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unstable OOM due to automatic load VAE behaviour #22

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Unstable OOM due to automatic load VAE behaviour #22

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions