[`WIP`] Add support for Group Offloading #416

tolgacangoz · 2025-06-13T11:18:07Z

This PR will propose to fix #376.

I have just started adding support for group offloading and plan to complete it.

…n during training - Added support for group offloading in the training pipeline to reduce GPU memory usage with minimal speed impact. - Introduced new arguments in `BaseArgs` for enabling group offloading and configuring its parameters. - Updated relevant model specifications and training classes to handle group offloading. - Created documentation on memory optimization techniques, including usage instructions for group offloading. - Added tests to validate the functionality and constraints of group offloading. - Updated `requirements.txt` to require `diffusers` version 0.33.0 or higher for compatibility.

Refactors the group offloading utility to handle conditional arguments more cleanly. The `num_blocks_per_group` parameter is now only passed when the `offload_type` is set to `block_level`. Improves the group offloading integration tests by using real, loadable tiny models instead of extensively mocking the pipeline. This provides a more realistic test environment. The test setup is also updated to better handle different model architectures and system configurations (CUDA vs. CPU).

Expands the group offloading integration test to include HunyuanVideo, CogVideoX, and LTXVideo models. The test is refactored to dynamically determine the correct logger path for patching based on the model specification class. This removes the previous hardcoded path that only supported FLUX models, allowing the test to run against multiple model architectures.

Transitions the trainer offloading tests from `unittest` to `pytest` to enable parametrization and improve test structure. The refactored tests now use realistic dummy models from the Hub instead of extensive mocking. This allows for more robust validation by initializing actual model and pipeline components. Test coverage is expanded to include multiple model architectures (Flux, Hunyuan-Video, CogVideoX, LTX-Video) and a wider range of scenarios, such as different offload types, edge cases, and interactions with other memory optimizations.

Gracefully handle CPU offloading when an accelerator is not present by issuing a warning instead of raising an error. This improves behavior in test environments. Update the trainer to correctly initialize models loaded with meta tensors by using `to_empty()` instead of `to()`. Make offloading tests more robust by skipping them when the required hardware (e.g., CUDA) is unavailable.

tolgacangoz marked this pull request as draft June 13, 2025 11:18

tolgacangoz added 8 commits June 28, 2025 20:10

style

1b23667

style

99563eb

Update news date format for Group Offloading support announcement

5a9651d

add offloading to disk

8cc63f8

tolgacangoz closed this Oct 30, 2025

tolgacangoz deleted the Add-support-for-Group-Offloading branch October 30, 2025 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[`WIP`] Add support for Group Offloading #416

[`WIP`] Add support for Group Offloading #416

Uh oh!

tolgacangoz commented Jun 13, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[WIP] Add support for Group Offloading #416

[WIP] Add support for Group Offloading #416

Uh oh!

Conversation

tolgacangoz commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[`WIP`] Add support for Group Offloading #416

[`WIP`] Add support for Group Offloading #416

tolgacangoz commented Jun 13, 2025 •

edited

Loading