Sync output dir between DDP processes #6176
Unanswered
rubencart
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment 4 replies
-
The accelerators also have a broadcast method, which we use to send around stuff like that by byte-encoding it and then use |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm looking for a good way to sync my output dir name (which contains a timestamp etc) between DDP processes. For now, I'm doing something like this:
Is this OK or does someone have a better solution?
I've tried to use
torch.distributed.send
andtorch.distributed.recv
, but these only work for tensors.I'm also using the
WandBLogger
, so I have considered having all processes save output towandb_logger.experiment.dir
, but that doesn't work because the logger returns a dummy experiment in all but the main process (link).Beta Was this translation helpful? Give feedback.
All reactions