Skip to content

Conversation

@tolgacangoz
Copy link

@tolgacangoz tolgacangoz commented Jun 13, 2025

This PR will propose to fix #376.

I have just started adding support for group offloading and plan to complete it.

…n during training

- Added support for group offloading in the training pipeline to reduce GPU memory usage with minimal speed impact.
- Introduced new arguments in `BaseArgs` for enabling group offloading and configuring its parameters.
- Updated relevant model specifications and training classes to handle group offloading.
- Created documentation on memory optimization techniques, including usage instructions for group offloading.
- Added tests to validate the functionality and constraints of group offloading.
- Updated `requirements.txt` to require `diffusers` version 0.33.0 or higher for compatibility.
@tolgacangoz tolgacangoz marked this pull request as draft June 13, 2025 11:18
Refactors the group offloading utility to handle conditional arguments more cleanly. The `num_blocks_per_group` parameter is now only passed when the `offload_type` is set to `block_level`.

Improves the group offloading integration tests by using real, loadable tiny models instead of extensively mocking the pipeline. This provides a more realistic test environment. The test setup is also updated to better handle different model architectures and system configurations (CUDA vs. CPU).
Expands the group offloading integration test to include HunyuanVideo, CogVideoX, and LTXVideo models.

The test is refactored to dynamically determine the correct logger path for patching based on the model specification class. This removes the previous hardcoded path that only supported FLUX models, allowing the test to run against multiple model architectures.
Transitions the trainer offloading tests from `unittest` to `pytest` to enable parametrization and improve test structure.

The refactored tests now use realistic dummy models from the Hub instead of extensive mocking. This allows for more robust validation by initializing actual model and pipeline components.

Test coverage is expanded to include multiple model architectures (Flux, Hunyuan-Video, CogVideoX, LTX-Video) and a wider range of scenarios, such as different offload types, edge cases, and interactions with other memory optimizations.
Gracefully handle CPU offloading when an accelerator is not present by issuing a warning instead of raising an error. This improves behavior in test environments.

Update the trainer to correctly initialize models loaded with meta tensors by using `to_empty()` instead of `to()`.

Make offloading tests more robust by skipping them when the required hardware (e.g., CUDA) is unavailable.
@tolgacangoz tolgacangoz deleted the Add-support-for-Group-Offloading branch October 30, 2025 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for Group Offloading

1 participant