Skip to content

Fix FLUX2 Klein load-time VRAM spikes on low-memory GPUs.#726

Open
mhofer1976 wants to merge 1 commit intoostris:mainfrom
mhofer1976:fix/flux2-klein-low-vram-loading
Open

Fix FLUX2 Klein load-time VRAM spikes on low-memory GPUs.#726
mhofer1976 wants to merge 1 commit intoostris:mainfrom
mhofer1976:fix/flux2-klein-low-vram-loading

Conversation

@mhofer1976
Copy link

Keep the transformer and Qwen text encoder off CUDA during initial load/quantization in low-VRAM mode so model startup avoids full-model OOM before offloading and quantization can take effect.

Summary

  • avoid moving the full FLUX2 transformer to CUDA before quantization, which caused startup OOM on low-memory GPUs
  • keep the transformer/text-encoder on CPU during low-VRAM model preparation and only move as needed
  • use qtype_te for Klein Qwen text-encoder quantization instead of the transformer qtype

Test plan

  • Reproduce prior failure on FLUX2 Klein 9B at Loading transformer with low VRAM settings
  • Verify loader no longer calls full-model CUDA move before quantization in the FLUX2 path
  • Verify Klein TE path no longer eagerly loads full TE to CUDA and uses qtype_te
  • Run a full training smoke test on a low-VRAM GPU and confirm model loads and begins training

Keep the transformer and Qwen text encoder off CUDA during initial load/quantization in low-VRAM mode so model startup avoids full-model OOM before offloading and quantization can take effect.

Co-authored-by: Cursor <cursoragent@cursor.com>
@mhofer1976 mhofer1976 force-pushed the fix/flux2-klein-low-vram-loading branch from 280fb32 to 02dd161 Compare February 25, 2026 06:01
@inflamously
Copy link

Omg yes please, anything that fixes these annoying VRAM spikes between sampling and at random steps. I have a 5090 RTX that can run this flux 2 klein 9b without any problems but the longer the run the more it spikes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants