Is it possible to call dist.all_reduce manually in train_step? #7693

sandylaker · 2021-05-24T23:57:32Z

sandylaker
May 24, 2021

In my code, I would like to synchronize a tensor across all the gpus in train_step, which is a temporary variable. Is it allowed to call torch.distributed.all_reduce in this case? Or there is a specific function in pytorch_lightning that does the job?

Answered by SeanNaren

May 26, 2021

Hey @sandylaker! You can use torch.distributed.all_reduce. There is also within the LightningModule this function, however it may be better to expose this within lightning to make it easier to access:

x = self.trainer.accelerator.training_type_plugin.reduce(x)

View full answer

SeanNaren · 2021-05-26T09:57:38Z

SeanNaren
May 26, 2021

Hey @sandylaker! You can use torch.distributed.all_reduce. There is also within the LightningModule this function, however it may be better to expose this within lightning to make it easier to access:

x = self.trainer.accelerator.training_type_plugin.reduce(x)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is it possible to call dist.all_reduce manually in train_step? #7693

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is it possible to call dist.all_reduce manually in train_step? #7693

Uh oh!

sandylaker May 24, 2021

Replies: 1 comment

Uh oh!

SeanNaren May 26, 2021

sandylaker
May 24, 2021

SeanNaren
May 26, 2021