CPU Parallel Training Unusual Performance #1470

hu-lm · 2022-02-10T23:44:27Z

hu-lm
Feb 10, 2022

Hello all.
I am writing this discussion because when I was doing the parallel training, each training step time would be surprisingly longer as I increased the mpi tasks assigned node. But isn't that true that more nodes would increase the training speed so the time each step it takes would be shorter?

Here is the info about the HPC hardware: Intel Xeon CPU E5-2690 v2 @ 3.00GHz; Mem 124GB; 20 cores per node available; 49 nodes available.
DeepMD-Kit version: DeePMD-kit v2.0.4.dev1+g96e623d5; Installed with miniconda3 CPU version.

The dataset is the same, and the batch size is all set to be 1 for testing.
When using four nodes 20cpu each, the time for each training step is around 35s;
When the nodes increase to be 8 with 20cpu each, this time increase to 56s;

I have attached the two log files and input scripts, and I really appreciate your help!

log8nodes-20coreseach-btch1.txt
log4nodes-20coreseach-btch1.txt

HPC_Script.txt
input.txt

Answered by njzjz

Feb 11, 2022

MPI training applies data parallelism. The time for each step will not be reduced but the batch size will be increased.

The performance of multiple nodes may be limited by your network.

View full answer

njzjz · 2022-02-11T22:13:34Z

njzjz
Feb 11, 2022
Maintainer

MPI training applies data parallelism. The time for each step will not be reduced but the batch size will be increased.

The performance of multiple nodes may be limited by your network.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CPU Parallel Training Unusual Performance #1470

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

CPU Parallel Training Unusual Performance #1470

Uh oh!

hu-lm Feb 10, 2022

Replies: 1 comment

Uh oh!

njzjz Feb 11, 2022 Maintainer

hu-lm
Feb 10, 2022

njzjz
Feb 11, 2022
Maintainer