[DeepSpeed] Multi-GPU can't converge #8019
Unanswered
thomas-happify
asked this question in
DDP / multi-GPU / multi-node
Replies: 2 comments 1 reply
-
@thomas-happify Thanks for your question 😃 Are you able to provide a reproducible example showing this in practice? Without that it may be difficult to say why the performances are different. |
Beta Was this translation helpful? Give feedback.
1 reply
This comment has been minimized.
This comment has been minimized.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
shouldn't these two args have similar training results?
when I use
gpus=2
andaccumulate_grad_batches=8
, the model can't converge.Beta Was this translation helpful? Give feedback.
All reactions