Is it possible to call dist.all_reduce manually in train_step? #7693
Answered
by
SeanNaren
sandylaker
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
-
In my code, I would like to synchronize a tensor across all the gpus in |
Beta Was this translation helpful? Give feedback.
Answered by
SeanNaren
May 26, 2021
Replies: 1 comment
-
Hey @sandylaker! You can use x = self.trainer.accelerator.training_type_plugin.reduce(x) |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
Borda
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hey @sandylaker! You can use
torch.distributed.all_reduce
. There is also within theLightningModule
this function, however it may be better to expose this within lightning to make it easier to access: