Whether can use 4 GPU and 1/4 stop_batchs to reduce training time and achieve same accuracy. #4949

roger13231 · 2025-09-02T08:45:10Z

roger13231
Sep 2, 2025

Dear Maintainers,
Similar discussion have been mentioned in #3655. I have a further questions.
I want to reduce the training time of DP potential as I have 100+ iterations in DPGEN and mainly training cost most of time. In #3655, Dr. Zeng replied that, double GPU means double epochs, while training time is the same. In this way, if I use stop_batchs of 400k and 1GPU, will it be the same "accuracy" as use stop_batchs of 100k and 4 GPUs

Answered by wanghan-iapcm

Sep 5, 2025

using 4 GPU effective increase the batch size by 4 times, but it does not always leads to faster decay of the error by 4 times. One may have to test case by case

for dpa-1 and se_a descriptors, smaller batch size (like auto:32) is usually preferred, while for dpa-2 and dpa-3, larger batch size may speedup the training.

View full answer

wanghan-iapcm · 2025-09-03T01:03:51Z

wanghan-iapcm
Sep 3, 2025
Maintainer

You may increase the number of fp data collected in each iteration to reduce the total number of iterations, thus reducing the total training cost.

3 replies

roger13231 Sep 3, 2025
Author

Thank you so much for your reply, Dr. Wang.
I will try to double or even triple the number of fp data.

I still have doubt for that question I mentioned. I am wondering whether stop_batchs of 400k and 1GPU be the same or similar with 100k and 4 GPUs.

wanghan-iapcm Sep 5, 2025
Maintainer

using 4 GPU effective increase the batch size by 4 times, but it does not always leads to faster decay of the error by 4 times. One may have to test case by case

for dpa-1 and se_a descriptors, smaller batch size (like auto:32) is usually preferred, while for dpa-2 and dpa-3, larger batch size may speedup the training.

Answer selected by roger13231

roger13231 Sep 5, 2025
Author

Thank you, Dr. Wang,
I will test my case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Whether can use 4 GPU and 1/4 stop_batchs to reduce training time and achieve same accuracy. #4949

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Whether can use 4 GPU and 1/4 stop_batchs to reduce training time and achieve same accuracy. #4949

Uh oh!

roger13231 Sep 2, 2025

Replies: 1 comment · 3 replies

Uh oh!

wanghan-iapcm Sep 3, 2025 Maintainer

Uh oh!

roger13231 Sep 3, 2025 Author

Uh oh!

wanghan-iapcm Sep 5, 2025 Maintainer

Uh oh!

roger13231 Sep 5, 2025 Author

roger13231
Sep 2, 2025

Replies: 1 comment 3 replies

wanghan-iapcm
Sep 3, 2025
Maintainer

roger13231 Sep 3, 2025
Author

wanghan-iapcm Sep 5, 2025
Maintainer

roger13231 Sep 5, 2025
Author