You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix FLUX2 Klein load-time VRAM spikes on low-memory GPUs.
Keep the transformer and Qwen text encoder off CUDA during initial load/quantization in low-VRAM mode so model startup avoids full-model OOM before offloading and quantization can take effect.
Co-authored-by: Cursor <cursoragent@cursor.com>
0 commit comments