-
Notifications
You must be signed in to change notification settings - Fork 145
RT-TDDFT GPU Acceleration: RT-TD now fully support GPU computation #5773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RT-TDDFT GPU Acceleration: RT-TD now fully support GPU computation #5773
Conversation
|
The current program has some bugs that cause the data in Useful information:
|
…assignment operator overload) instead
TensorTensor on CPU and refactoring linear algebra operations in TDDFT
…pport for Tensor on CPU and refactoring linear algebra operations in TDDFT
|
LGTM👍, a good example showing the possibility of using tensor. |
…velop into TDDFT_GPU_phase_1
…er code organization
…s it as an input parameter instead
…eepmodeling#5773) * Phase 1 of RT-TDDFT GPU Acceleration: Rewriting existing code using Tensor * [pre-commit.ci lite] apply automatic fixes * Initialize int info in bandenergy.cpp * Initialize double aa, bb in bandenergy.cpp * Fix a bug where CopyFrom caused shared data between tensors, using =(assignment operator overload) instead * RT-TDDFT GPU Acceleration (Phase 2): Adding needed BLAS and LAPACK support for Tensor on CPU and refactoring linear algebra operations in TDDFT * LAPACK wrapper functions: change const basic-type input parameters from pass-by-reference to pass-by-value * Did nothing, just formatting esolver.cpp * Core algorithm: RT-TD now has preliminary support for GPU computation * Fix GitHub CI CUDA build bug due to deleted variable * Refactor some files * Getting ready for gathering MPI processes * MPI multi-process compatibility * Fix GitHub CI MPI compilation bug * Minor fix and refactor * Initialize double aa, bb and one line for one variable * Rename bandenergy.cpp to band_energy.cpp and corresponding adjustments * Fix compile error and change CMakeLists accordingly * Initialize int naroc * Initialize MPI related variables: myid, num_procs and root_proc * Refactor Propagator class implementation into multiple files for better code organization * Remove all GlobalV::ofs_running from RT-TDDFT core algorithms and pass it as an input parameter instead * Add assert in some places and optimize redundant index calculations in nested loops --------- Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>

Phase 1: Rewriting existing code using
Tensor(complete)This is merely a draft and does not represent the final code. Since
Tensorcan effectively support heterogeneous computing, the goal of the first phase is to rewrite the existing algorithms usingTensor. Currently, all memory is still explicitly allocated on the CPU (the parameter of theTensorconstructor iscontainer::DeviceType::CpuDevice).Phase 2: Adding needed BLAS and LAPACK support for
Tensoron CPU and refactoring linear algebra operations in TDDFT (complete)Key Changes:
lapack_getrfandlapack_getriinmodule_base/module_container/ATen/kernels/lapack.hto support matrix LU factorization (getrf) and matrix inversion (getri) operations forTensorobjects.zgetrf_andzgetri_) declarations inmodule_base/lapack_connector.hto comply with standard conventions.Tensoroperations in TDDFT. These linear algebra operations incontainer::kernelsmodule frommodule_base/module_container/ATeninclude aDeviceparameter, enabling seamless support for heterogeneous computing (GPU acceleration in future phases).Phase 3: RT-TDDFT GPU acceleration core algorithm (complete)
Added linear solver interfaces:
getrs) using LAPACK.getrf) and linear solver (getrs) using cuSOLVER.Refactored RT-TDDFT I/O and parameters:
td_force_dt,td_vext,td_vext_dire_case,out_dipole,out_efield) from theEvolve_elecclass.PARAM.inpinput interface to simplify template class usage withDeviceparameter.Heterogeneous computing support:
Devicetemplate parameter to RT-TDDFT core algorithm classes and functions.base_device::memory::synchronize_memory_op) to ensure proper data handling across devices.BlasConnector::copyoperations with memory synchronization functions.GPU acceleration for RT-TDDFT:
Phase 4: MPI multi-process compatibility (complete)
ctxparameters in memory synchronization operations.device=gpu.