-
Notifications
You must be signed in to change notification settings - Fork 40
Open
Description
@1ytic
Hi,
So far I have been able to use the loss with DDP on a single GPU , it behaves more or less as expected.
But when I use more than 1 device, the following happens:
- On
GPU-0loss is calculated properly - On
GPU-1loss is close to zero for each batch
I checked the input tensors, devices, tensor values, etc - so far everything seems to be identical for GPU-0 and other GPUs.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels