How to sync buffers in multi-gpu training #8545
-
I know that parameters are indirectly synced in multi-gpu via grad-syncing. But how to sync buffers that are not updated via gradient? I find that I can use all_reduce() or all_gather() method manually in ddp doc, but what pytorch-lightning does under the hood? Does it sync buffers automatically or offer an interface to do that? Considering pl handles single and multi-gpu without any need to change Modules, do it manually will result in ugly judgement code whether the training is in ddp mode inside an independent module. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Dear @MeteorsHub, Great question. Lightning doesn't do any magic there. It relies on PyTorch DistributedDataParallel to handle multi-gpu via grad-syncing. The buffers are being synced on start using If you need to access this private function for any reason, you can do the following:
|
Beta Was this translation helpful? Give feedback.
Dear @MeteorsHub,
Great question. Lightning doesn't do any magic there. It relies on PyTorch DistributedDataParallel to handle multi-gpu via grad-syncing.
The buffers are being synced on start using
_sync_params_and_buffers
: https://github.com/pytorch/pytorch/blob/3e3acf8a9ac005db3094f23bd41a5fbc0c3c154b/torch/nn/parallel/distributed.py#L570If you need to access this private function for any reason, you can do the following: