Horovod distributed data #7264
Unanswered
njgre6
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there!
I am trying to set up a multi-node horovod cluster, running some proprietary code that is wrapped up in a LightningModule. I am wondering, how does the data in a lightning module get communicated between nodes, when the accelerator arg in the Trainer class is set to 'horovod'. Specifically, does each instance of the cluster need a local copy of the data?
Beta Was this translation helpful? Give feedback.
All reactions