Skip to content

Conversation

@A-006
Copy link
Collaborator

@A-006 A-006 commented Jan 23, 2025

ACHIEVEMENT

When compiling ABACUS with the DSP=ON option, the software automatically configures the FFT device to utilize DSP. This configuration demonstrates significant performance improvements, achieving over 30 times the speed compared to the FT cluster. Additionally, it exhibits strong scalability.I would like to acknowledge the contributions of the author of the mt_3d_fft library and extend my gratitude to @Qianruipku for their valuable input. Special thanks are also due to @mohanchen for significant support .Due to the check of the compuer ,i temporarially add the first stage of the DSP FFT.

Attention

1.FUNCTION:This is not the final step in porting DSP computations. After specifying MT_FFT_LIBRARY, MT_FFT_INCLUDE, and MT_HRHREAD, the DSP can be utilized to compute FFT. Currently, simultaneous use of DSP FFT and BLAS is not supported. Features for this will be promptly added.
2.PERFOMANCE: At present, the DSP FFT does not achieve its fastest performance, as some memory copies have not yet been optimized. This aspect will also be optimized in the future.Some functions in real_to_recip do not utilize memcpy or zapy_. These functions could be optimized or potentially moved to the DSP for improved performance.
图片

What's changed?

  • Implement DSP FFT functionality within the fft_base class.
  • Add the control flow in the fft_bundle
  • Update the CMakeLists.txt configuration to enable compilation of the DSP FFT library.
  • Thanks for @Critsium-xy ,the mt-blas part has been added to the part.
  • Currently, the convolution function can allocate and deallocate the b_id and thread_id by 2*nbands. However, the others should load these values one at a time.

Attention

  • When compiling the file, ScaLAPACK utilizes the dynamic libraries installed on the computer. Therefore, it is necessary to add the directory path of the ScaLAPACK dynamic libraries.
  • If you intend to use the DSP files, set the directories for MT_THREADS_DIR and MT_BLAS_FFT_DIR.
  • Ensure that kpar is configured and that nx, ny, and nz are set as integer powers of 2.
  • Currently, b_id and thread_id are allocated and destroyed within each fft_3d call. This approach may be revised in future updates.The task has been completed.
  • Now,we should not use mutable in the fft_dsp,while the b_id need to be allocated and destoryied in the fft_3d,thus,we will set the mutable for the b_id and thread_id.

@mohanchen mohanchen added GPU & DCU & HPC GPU and DCU and HPC related any issues Features Needed The features are indeed needed, and developers should have sophisticated knowledge labels Jan 26, 2025
@A-006 A-006 marked this pull request as draft February 13, 2025 13:48
@A-006 A-006 marked this pull request as ready for review February 24, 2025 03:50
@mohanchen mohanchen merged commit 8fa625c into deepmodeling:develop Feb 26, 2025
14 checks passed
Fisherd99 pushed a commit to Fisherd99/abacus-BSE that referenced this pull request Mar 31, 2025
* set fft_dsp

* add information in map

* update Global_rank

* update control flow

* [pre-commit.ci lite] apply automatic fixes

* add the fft_dsp in the fft_bundle

* change teh cmake file

* modify back scalapck

* set the dsp ig2ixyz_k_cpu

* modify the pw_basis

* add the namespace

* remove mutable

* fix fft_dsp

* add the convolution and allocate or destroy the b_id

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Features Needed The features are indeed needed, and developers should have sophisticated knowledge GPU & DCU & HPC GPU and DCU and HPC related any issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants