Skip to content

Fix bug in dsp compute #6433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from
Open

Fix bug in dsp compute #6433

wants to merge 13 commits into from

Conversation

A-006
Copy link
Collaborator

@A-006 A-006 commented Aug 4, 2025

Linked Issue

Fix #6429

What's changed?

  • Fixed a bug that occurred when compiling the DSP version and using CPU FFT in certain functions. In the examples 01_PW/057_PW_SO_IW,01_PW,01_PW/105_PW_W90,109_PW_PBE0,, CPU FFT functions were used even though the device was set to DSP. This caused segmentation faults in recip2real and real2recip functions. To resolve this, we have modified these functions to use the DSP device instead of the CPU device.
image
  • Fix bug on the zgemm which may cause segment fault.
  • Fix bug in sdft compute which may appear as below
image

Refactor

  • This update impacts seven modules: cal_ldos, cal_mkedf, wf_lcao, wf2rho, wannier90, overlap_pw, and stress_func_exx.
  • DSP Acceleration Support (6 modules):Refactored cal_ldos, cal_mkedf, wf_lcao, wf2rho, wannier90, and overlap_pw to enable DSP-accelerated computations,Implemented RAII-managed accelerator kernel allocation to reduce frequent malloc operations
  • GPU Device Support (1 module):Added native GPU device support to the stress_func_exx module
    img_v3_02p2_7f3c9f4d-5544-48d0-a213-8a1bde7f1bcg

Attention

  • The recip_to_real template function is designed to be compatible with floating-point types and device types, enabling a heterogeneous framework. Moving forward, if device-specific or parameter-type requirements arise, we can directly utilize this templated function for seamless execution.
    For pw_basis and pw_basis_k, template parameters are used to determine whether functions operate in real or reciprocal space. This approach clearly indicates whether calculations are performed in real or reciprocal space, improving code readability.
    The main drawback is increased complexity in invocation, which demands higher programming expertise. When writing templated code, if the FFT function’s template parameter is T but operates on complex at runtime, programmers must fully understand the context. Type mismatches will be caught at compile time, preventing runtime errors.

  • This seems to mean that the DSP does not support grids that are powers of two (2^n grid points). We should express this in standard technical terms.

  • The DSP FFT could only use the kpar parallel way, but we can't use mpi solve the gamma only

@mohanchen mohanchen added the Bugs Bugs that only solvable with sufficient knowledge of DFT label Aug 6, 2025
@Flying-dragon-boxing
Copy link
Collaborator

stress_func_exx currently doesn't support heterogeneous computing. I'll fix this later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bugs Bugs that only solvable with sufficient knowledge of DFT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segment fault on DSP computing
3 participants