Skip to content

Conversation

@SrijanUpadhyay
Copy link

When using FreeU with half-precision (torch.float16) models, PyTorch may emit
UserWarnings about experimental ComplexHalf support during FFT operations.
This change locally suppresses that specific warning in the fourier_filter
function to avoid flooding user logs while preserving behavior.

This commit adds configurations and setup scripts to resolve NCCL timeout
issues during DeepSpeed ZeRO-2 training on H200 GPUs. The changes include:

- Extended NCCL and DeepSpeed timeouts
- Optimized bucket sizes for gradient communication
- CPU and dataloader optimizations
- System shared memory improvements
- Enhanced debugging capabilities

The implementation provides:
1. DeepSpeed ZeRO-2 configuration (ds_config_zero2.json)
2. Environment setup script (setup_training_env.sh)
3. Accelerate configuration (accelerate_config.yaml)

These changes improve training stability on H200 GPUs with high-resolution
data and aggressive configurations.
…ations

When using FreeU with half-precision (torch.float16) models, PyTorch may emit
UserWarnings about experimental ComplexHalf support during FFT operations.
This change locally suppresses that specific warning in the fourier_filter
function to avoid flooding user logs while preserving behavior.

- Added warnings import
- Added local warning suppression around fftn/ifftn calls when dtype is float16
- Only suppresses the specific ComplexHalf experimental warning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Why use torch.repeat instead of torch.repeat_interleave in train_dreambooth_lora_sdxl

1 participant