You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Flux is a very large model and requires ~50GB of RAM. Enable some of the optimizations below to lower the memory requirements.
360
+
Flux is a very large model and requires ~50GB of RAM/VRAM to load all the modeling components. Enable some of the optimizations below to lower the memory requirements.
361
361
362
362
### Group offloading
363
363
364
-
[Group offloading](../../optimization/memory#group-offloading) saves memory by offloading groups of internal layers rather than the whole model or weights. Use [`~hooks.apply_group_offloading`] on a model and you can optionally specify the `offload_type`. Setting it to `leaf_level` offloads the lowest leaf-level parameters to the CPU instead of offloading at the module-level.
364
+
[Group offloading](../../optimization/memory#group-offloading) lowers VRAM usage by offloading groups of internal layers rather than the whole model or weights. You need to use [`~hooks.apply_group_offloading`] on all the model components of a pipeline. The `offload_type` parameter allows you to toggle between block and leaf-level offloading. Setting it to `leaf_level` offloads the lowest leaf-level parameters to the CPU instead of offloading at the module-level.
365
+
366
+
On CUDA devices that support asynchronous data streaming, set `use_stream=True` to overlap data transfer and computation to accelerate inference.
367
+
368
+
> [!TIP]
369
+
> It is possible to mix block and leaf-level offloading for different components in a pipeline.
365
370
366
371
```py
367
372
import torch
@@ -380,34 +385,38 @@ apply_group_offloading(
380
385
offload_type="leaf_level",
381
386
offload_device=torch.device("cpu"),
382
387
onload_device=torch.device("cuda"),
388
+
use_stream=True,
383
389
)
384
390
apply_group_offloading(
385
391
pipe.text_encoder,
386
392
offload_device=torch.device("cpu"),
387
393
onload_device=torch.device("cuda"),
388
-
offload_type="leaf_level"
394
+
offload_type="leaf_level",
395
+
use_stream=True,
389
396
)
390
397
apply_group_offloading(
391
398
pipe.text_encoder_2,
392
399
offload_device=torch.device("cpu"),
393
400
onload_device=torch.device("cuda"),
394
-
offload_type="leaf_level"
401
+
offload_type="leaf_level",
402
+
use_stream=True,
395
403
)
396
404
apply_group_offloading(
397
405
pipe.vae,
398
406
offload_device=torch.device("cpu"),
399
407
onload_device=torch.device("cuda"),
400
-
offload_type="leaf_level"
408
+
offload_type="leaf_level",
409
+
use_stream=True,
401
410
)
402
411
403
412
prompt="A cat wearing sunglasses and working as a lifeguard at pool."
0 commit comments