-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
bugSomething isn't workingSomething isn't workingstrategy: ddpDistributedDataParallelDistributedDataParallelver: 2.3.x
Description
Bug description
I just adapt my training into lightning framework for convenient ddp model training. But I got almost 10 times slower than my previous manually torch ddp training, the speed is shown below. I have no idea what is wrong here, could anyone help me figure out what may caus this problem and how to fix it?

I have set:
pl.Trainer(
accelerator='gpu',
devices=#GPUS,
strategy='ddp',
sync_batchnorm=True,
deterministic=True,
gradient_clip_val=$CLIP_VALUE
)
What version are you seeing the problem on?
v2.3
How to reproduce the bug
Error messages and logs
# Error messages and logs here please
Environment
Current environment
#- PyTorch Lightning Version (e.g., 2.5.0):
#- PyTorch Version (e.g., 2.5):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
More info
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstrategy: ddpDistributedDataParallelDistributedDataParallelver: 2.3.x