set with model.no_sync() in lightning training step #10792
Replies: 1 comment 3 replies
-
yes, lightning handles it for you. A little more on the context: In your code if self.is_ddp:
with self.pytorch_model.no_sync():
self.manual_backward(loss)
else:
self.manual_backward(loss)
optimizer.zero_grad()
optimizer.step() the gradients will never sync up and thus your model will end up having different weights on each device right after the first |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Trying to run some experiments using SAM optimizer with multiple GPUs and ddp. It is recommended that when using multiple gpus, we want to compute the gradients for each gpu separately, and the example code given is
My question is: is
with model.no_sync()
already handled when usingddp
strategy? i found something possible related in this discussion on using no_sync with ddpif not, how do i ensure that the model is not syncing the gradients? I have something like
do i need to call
with self.hf_model.no_sync():
in thetraining_step
before callingself.manual_backward(loss)
? so something likeBeta Was this translation helpful? Give feedback.
All reactions