Skip to content

Conversation

@jieli-matrix
Copy link

This PR introduces specialized GPU operators to accelerate the sincos computation bottlenecks in force calculations. The implementation targets the most computationally intensive loops in cal_force_loc and cal_force_ew functions, where ModuleBase::libm::sincos has been identified as the primary CPU hotspot.

Done:

  • Operator interface design
  • CPU reference implementations
  • CUDA/HIP GPU kernels
  • Code Integration and Calling Interface

ToDos:

  • AtomicAdd Optimization
  • Comment in English

optimization in davidson-subspcae algorithm

- add k continuity initialization strategy in planewave basis
- implement heterogenous computation branching between CPU and DCU
- implement optimized eigenvalue operations for GPU & DCU
- implement optimized preconditioner for GPU & DCU
- implement optimized normalization op for GPU & DCU
@mohanchen mohanchen added Refactor Refactor ABACUS codes GPU & DCU & HPC GPU and DCU and HPC related any issues labels Jun 3, 2025
@jieli-matrix
Copy link
Author

The PR is closed since #6265 submitted a more clean solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GPU & DCU & HPC GPU and DCU and HPC related any issues Refactor Refactor ABACUS codes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants