forked from abacusmodeling/abacus-develop
-
Notifications
You must be signed in to change notification settings - Fork 145
Fix: Fix crash in Debug build with multi-GPU due to forced cudaSetDevice(0) #6498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
mohanchen
merged 1 commit into
deepmodeling:develop
from
wangtianxiang:fix_print_device_info_bug
Sep 10, 2025
Merged
Fix: Fix crash in Debug build with multi-GPU due to forced cudaSetDevice(0) #6498
mohanchen
merged 1 commit into
deepmodeling:develop
from
wangtianxiang:fix_print_device_info_bug
Sep 10, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ice(0) Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd.
Collaborator
|
Good catch! Thanks a lot for your contribution! |
mohanchen
approved these changes
Sep 10, 2025
Wuming-HUST
pushed a commit
to Wuming-HUST/abacus-develop
that referenced
this pull request
Sep 12, 2025
…ice(0) (deepmodeling#6498) Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd.
mohanchen
added a commit
that referenced
this pull request
Sep 20, 2025
…w. (#6490) * Feature: add DFT-1/2 and shell DFT-1/2, currently only support PW esolver_ks_pw. * Added Sep, Sep_Cell, and VSep to organize the self-energy potential of DFT-1/2 * Added a new effective potential pot_sep for calculating the self-energy potential * Added initialization of the self-energy potential in the esolver_ks_pw control flow * Added the keyword SEP_FILES in the STRU file for reading self-energy potential files * Added the dfthalf_type keyword in INPUT to enable DFT-1/2 and shell DFT-1/2 * Fix: Compilation error in DeepKS unit tests after adding DFT-1/2 * Fix: Add the additional files to Makefile.Objects * Build(deps): Bump actions/setup-python from 5 to 6 (#6492) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v5...v6) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [Refactor] Move hardware initializer out from esolver code (#6494) * Move hardware initializer out from esolver * Remove useless codes * Remove finalize code out * Feature: support NVTX profiling via timer_enable_nvtx flag (#6495) * Feature: support NVTX profiling via timer_enable_nvtx flag Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Add timer_enable_nvtx section in markdown Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix: Use __USE_NVTX macro to avoid NVTX linking errors in tests. Clarify in docs that timer_enable_nvtx parameter only takes effect on CUDA platforms. Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Perf: Optimize Davidson by fusing operators, offloading CPU computation to GPU, and reducing memory transfers (#6493) * Perf: Optimize Diago_DavSubspace with GPU operators by adding and fusing custom kernels. Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Perf: reduce memory allocation and copy in Diago_DavSubspace::diag_zhegvx Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Perf: Replace loop-based 2D copy and memset with memcpy_2d_op, memset_2d_op Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Perf: use warp reduce instead of shared memory for better efficiency Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix compilation error Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix: resolve compile error with USE_ELPA=OFF + BUILD_TESTING=ON and switch to nvtx3 headers when CUDA_VERSION >= 12090 (#6497) * Fix: switch to nvtx3 headers when CUDA_VERSION >= 12090 Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix: resolve compile error with USE_ELPA=OFF + BUILD_TESTING=ON Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix dsp compilation problem (#6499) * Fix: Fix crash in Debug build with multi-GPU due to forced cudaSetDevice(0) (#6498) Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Removed the temporary variable DMRGint_full when transitioning from 2D block parallelism to serial in Hcontainer(develop) (#6489) * delete tem Hcontainer to reduce memory usage * simplify the compute code * change DM2D_tmp to dm2d_tmp, use vector instead of new * Update version to 3.9.0.14 (#6504) * Refactor: Remove the GlobalC from sep_cell and vsep_cell * Removed GlobalC::sep_cell and GlobalC::vsep_cell from GlobalC * Integrated sep_cell into UnitCell * Integrated vsep_cell into esolver_ks_pw * Added empty constructors and destructors for Sep_Pot and Sep_Cell to facilitate unit testing compilation --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Critsium <[email protected]> Co-authored-by: Tianxiang Wang <[email protected]> Co-authored-by: zgn-26714 <[email protected]> Co-authored-by: Erjie Wu <[email protected]> Co-authored-by: Mohan Chen <[email protected]>
kluonj
pushed a commit
to kluonj/abacus-develop
that referenced
this pull request
Sep 28, 2025
…ice(0) (deepmodeling#6498) Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd.
kluonj
pushed a commit
to kluonj/abacus-develop
that referenced
this pull request
Sep 28, 2025
…w. (deepmodeling#6490) * Feature: add DFT-1/2 and shell DFT-1/2, currently only support PW esolver_ks_pw. * Added Sep, Sep_Cell, and VSep to organize the self-energy potential of DFT-1/2 * Added a new effective potential pot_sep for calculating the self-energy potential * Added initialization of the self-energy potential in the esolver_ks_pw control flow * Added the keyword SEP_FILES in the STRU file for reading self-energy potential files * Added the dfthalf_type keyword in INPUT to enable DFT-1/2 and shell DFT-1/2 * Fix: Compilation error in DeepKS unit tests after adding DFT-1/2 * Fix: Add the additional files to Makefile.Objects * Build(deps): Bump actions/setup-python from 5 to 6 (deepmodeling#6492) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v5...v6) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [Refactor] Move hardware initializer out from esolver code (deepmodeling#6494) * Move hardware initializer out from esolver * Remove useless codes * Remove finalize code out * Feature: support NVTX profiling via timer_enable_nvtx flag (deepmodeling#6495) * Feature: support NVTX profiling via timer_enable_nvtx flag Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Add timer_enable_nvtx section in markdown Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix: Use __USE_NVTX macro to avoid NVTX linking errors in tests. Clarify in docs that timer_enable_nvtx parameter only takes effect on CUDA platforms. Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Perf: Optimize Davidson by fusing operators, offloading CPU computation to GPU, and reducing memory transfers (deepmodeling#6493) * Perf: Optimize Diago_DavSubspace with GPU operators by adding and fusing custom kernels. Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Perf: reduce memory allocation and copy in Diago_DavSubspace::diag_zhegvx Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Perf: Replace loop-based 2D copy and memset with memcpy_2d_op, memset_2d_op Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Perf: use warp reduce instead of shared memory for better efficiency Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix compilation error Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix: resolve compile error with USE_ELPA=OFF + BUILD_TESTING=ON and switch to nvtx3 headers when CUDA_VERSION >= 12090 (deepmodeling#6497) * Fix: switch to nvtx3 headers when CUDA_VERSION >= 12090 Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix: resolve compile error with USE_ELPA=OFF + BUILD_TESTING=ON Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Fix dsp compilation problem (deepmodeling#6499) * Fix: Fix crash in Debug build with multi-GPU due to forced cudaSetDevice(0) (deepmodeling#6498) Signed-off-by:Tianxiang Wang<[email protected]>, Contributed under MetaX Integrated Circuits (Shanghai) Co., Ltd. * Removed the temporary variable DMRGint_full when transitioning from 2D block parallelism to serial in Hcontainer(develop) (deepmodeling#6489) * delete tem Hcontainer to reduce memory usage * simplify the compute code * change DM2D_tmp to dm2d_tmp, use vector instead of new * Update version to 3.9.0.14 (deepmodeling#6504) * Refactor: Remove the GlobalC from sep_cell and vsep_cell * Removed GlobalC::sep_cell and GlobalC::vsep_cell from GlobalC * Integrated sep_cell into UnitCell * Integrated vsep_cell into esolver_ks_pw * Added empty constructors and destructors for Sep_Pot and Sep_Cell to facilitate unit testing compilation --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Critsium <[email protected]> Co-authored-by: Tianxiang Wang <[email protected]> Co-authored-by: zgn-26714 <[email protected]> Co-authored-by: Erjie Wu <[email protected]> Co-authored-by: Mohan Chen <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Bugs
Bugs that only solvable with sufficient knowledge of DFT
GPU & DCU & HPC
GPU and DCU and HPC related any issues
Refactor
Refactor ABACUS codes
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🐞 Bug Behavior
When compiling with

CMAKE_BUILD_TYPE=Debugand running any GPU test case viampirun -np 2 abacususing 2 GPUs, the program crashes at acudaMemsetcall withcudaErrorInvalidValue: invalid argument:Additionally, even when not using
CMAKE_BUILD_TYPE=Debug, although the program runs without crashing, no device information is printed indevice.log.🔍 Root Cause
Through debugging, we found that the constructor
Psi<T, Device>::Psicallsbase_device::information::print_device_info, which internally invokescudaSetDevice(0)— forcing all MPI ranks to use GPU 0.This creates a critical inconsistency in multi-GPU runs:
print_device_infois called (e.g., for logging), rank 1 is forcibly switched to GPU 0 viacudaSetDevice(0).cudaMemcpyis called on rank 1, it attempts to set memory that resides on GPU 1, while the current device context is GPU 0 → resulting incudaErrorInvalidValue.The root issue: Hard-coded
cudaSetDevice(0)inside a shared utility function breaks multi-GPU context isolation in MPI environments.Additionally, in non-Debug builds (Release with
-O3), the program does not crash — but produces no output indevice.log. This is due to an undefined behavior caused by missing template specialization declaration:print_device_info<DEVICE_GPU>is defined inoutput_device.cpp, but not declared indevice.h.-O0), the linker often resolves to the specialization →cudaSetDevice(0)is called → crash.-O3), the compiler aggressively inlines/optimizes and often picks the empty primary template → nocudaSetDevice(0)→ no crash, but no logging output.🛠️ Solution
To fix this issue, we apply two key changes:
cudaSetDevice(0)withcudaGetDevice()inprint_device_infodevice.h