Understanding data parallelism in DeePMD #3341

matr1x-1 · 2024-02-26T18:54:43Z

matr1x-1
Feb 26, 2024

I have a question about using MPI. Specifically, the documentation says: "Note that the parallelism mode is data parallelism, so it is not expected to see the training time per batch decreases." When I run a job (with "--mpi-log=workers") on 2 GPUs, I get something like this:

DEEPMD rank:1 INFO batch 8100 training time 8.88 s, testing time 0.00 s, total wall time 8.90 s
DEEPMD rank:0 INFO batch 8100 training time 8.70 s, testing time 0.05 s, total wall time 8.90 s

How do I have to interpret this? As stated above, the training time stays the same with 1 or 2 GPUs. The batch, here 8100, also appears twice. Is something set up incorrectly in this case as I don't want to calculate the same batch twice or is this the expected behaviour?
Thanks for your help!

Answered by njzjz

Feb 26, 2024

Two ranks are initialized using different random seeds, so their training data in each batch is different.

View full answer

njzjz · 2024-02-26T21:44:57Z

njzjz
Feb 26, 2024
Maintainer

Two ranks are initialized using different random seeds, so their training data in each batch is different.

3 replies

matr1x-1 Feb 26, 2024
Author

Ok thanks, but doesn't this mean that when I set e.g. numb_steps=10000 and use 2 GPUs then it trains it on 20000 batches?

njzjz Feb 26, 2024
Maintainer

Yes, it's the expected behavior.

matr1x-1 Feb 27, 2024
Author

I cannot set the seed myself, right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Understanding data parallelism in DeePMD #3341

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Understanding data parallelism in DeePMD #3341

Uh oh!

Uh oh!

matr1x-1 Feb 26, 2024

Replies: 1 comment · 3 replies

Uh oh!

njzjz Feb 26, 2024 Maintainer

Uh oh!

matr1x-1 Feb 26, 2024 Author

Uh oh!

njzjz Feb 26, 2024 Maintainer

Uh oh!

matr1x-1 Feb 27, 2024 Author

matr1x-1
Feb 26, 2024

Replies: 1 comment 3 replies

njzjz
Feb 26, 2024
Maintainer

matr1x-1 Feb 26, 2024
Author

njzjz Feb 26, 2024
Maintainer

matr1x-1 Feb 27, 2024
Author