Fix: fix a deadlock bug in CUDA version of gint #5210

dzzz2001 · 2024-10-09T07:11:00Z

Background

I find that in some examples if the device is set to gpu and the number of MPI processes is set too high, the program would hang indefinitely. Upon investigation, it was found that the deadlock was caused by an implicit barrier in MPI.

Due to the condition max_atom > 0, some processes do not execute the gint_vl_gpu function. However, the gint_vl_gpu function contains an implicit MPI barrier within the set_device_by_rank function, which leads to a deadlock. This PR fixes this bug.

fix a deadlock bug

02d70b3

dzzz2001 requested review from goodchong and mohanchen October 9, 2024 07:11

caic99 approved these changes Oct 9, 2024

View reviewed changes

goodchong approved these changes Oct 10, 2024

View reviewed changes

mohanchen added the Bugs Bugs that only solvable with sufficient knowledge of DFT label Oct 10, 2024

mohanchen approved these changes Oct 10, 2024

View reviewed changes

mohanchen merged commit 39df3b9 into deepmodeling:develop Oct 10, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: fix a deadlock bug in CUDA version of gint #5210

Fix: fix a deadlock bug in CUDA version of gint #5210

Uh oh!

dzzz2001 commented Oct 9, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix: fix a deadlock bug in CUDA version of gint #5210

Fix: fix a deadlock bug in CUDA version of gint #5210

Uh oh!

Conversation

dzzz2001 commented Oct 9, 2024

Background

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants