How to sync buffers in multi-gpu training #8545

MeteorsHub · 2021-07-26T00:39:50Z

MeteorsHub
Jul 26, 2021

I know that parameters are indirectly synced in multi-gpu via grad-syncing. But how to sync buffers that are not updated via gradient?

I find that I can use all_reduce() or all_gather() method manually in ddp doc, but what pytorch-lightning does under the hood? Does it sync buffers automatically or offer an interface to do that?

Considering pl handles single and multi-gpu without any need to change Modules, do it manually will result in ugly judgement code whether the training is in ddp mode inside an independent module.

Answered by tchaton

Jul 26, 2021

Dear @MeteorsHub,

Great question. Lightning doesn't do any magic there. It relies on PyTorch DistributedDataParallel to handle multi-gpu via grad-syncing.

The buffers are being synced on start using _sync_params_and_buffers: https://github.com/pytorch/pytorch/blob/3e3acf8a9ac005db3094f23bd41a5fbc0c3c154b/torch/nn/parallel/distributed.py#L570

If you need to access this private function for any reason, you can do the following:


trainer.training_type_plugin.model._sync_params_and_buffers(authoritative_rank=0)

View full answer

tchaton · 2021-07-26T09:15:34Z

tchaton
Jul 26, 2021
Maintainer

Dear @MeteorsHub,

Great question. Lightning doesn't do any magic there. It relies on PyTorch DistributedDataParallel to handle multi-gpu via grad-syncing.

The buffers are being synced on start using _sync_params_and_buffers: https://github.com/pytorch/pytorch/blob/3e3acf8a9ac005db3094f23bd41a5fbc0c3c154b/torch/nn/parallel/distributed.py#L570

If you need to access this private function for any reason, you can do the following:


trainer.training_type_plugin.model._sync_params_and_buffers(authoritative_rank=0)

1 reply

MeteorsHub Jul 26, 2021
Author

OK, thanks for the advice. I'll do it manually

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to sync buffers in multi-gpu training #8545

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to sync buffers in multi-gpu training #8545

Uh oh!

Uh oh!

MeteorsHub Jul 26, 2021

Replies: 1 comment · 1 reply

Uh oh!

tchaton Jul 26, 2021 Maintainer

Uh oh!

MeteorsHub Jul 26, 2021 Author

MeteorsHub
Jul 26, 2021

Replies: 1 comment 1 reply

tchaton
Jul 26, 2021
Maintainer

MeteorsHub Jul 26, 2021
Author