-
I have a question about using MPI. Specifically, the documentation says: "Note that the parallelism mode is data parallelism, so it is not expected to see the training time per batch decreases." When I run a job (with "--mpi-log=workers") on 2 GPUs, I get something like this:
How do I have to interpret this? As stated above, the training time stays the same with 1 or 2 GPUs. The batch, here 8100, also appears twice. Is something set up incorrectly in this case as I don't want to calculate the same batch twice or is this the expected behaviour? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Two ranks are initialized using different random seeds, so their training data in each batch is different. |
Beta Was this translation helpful? Give feedback.
Two ranks are initialized using different random seeds, so their training data in each batch is different.