Skip to content

Commit f16364d

Browse files
Flying-dragon-boxingWHUweiqingzhoukirk0830haozhihanhongriTianqi
authored
Feature: EXX PW GPU Support (#6407)
* Works * adapt to the new container * Turn off USE_PEXSI * Update LibRI to 553c91c * modify include files * namespace-ize * new inputs added * Configure Makefile Compiling, fix typos * Fix Makefile Intel toolchains compile errors * Fix even more PEXSI related Makefile compiling issues * Modify inputs and update to latest version (#2) * run INPUT.Default() in every process in InputParaTest (#3490) Co-authored-by: kirk0830 <[email protected]> * add blas support for FindLAPACK.cmake (#3497) * more unittest of QO: towards orbital selection (#3499) * Fix: fix bug in mulliken charge calculation (#3503) * fix phase * fix case test * Refactor: namespace Conv_Coulomb_Pot_K (#3446) * Refactor: namespace Conv_Coulomb_Pot_K * Refactor: namespace Conv_Coulomb_Pot_K --------- Co-authored-by: wqzhou <[email protected]> * enable the computation of all zeros in one function call (#3449) Co-authored-by: wqzhou <[email protected]> * replace ios.eof() by ios.good() to avoid meeting badbit and failbit in reading STRU (#3506) * Build: add ccache to accelerate the testing process (#3509) * Build: add ccache to accelerate the testing process * Update test.yml * Update test.yml * Update test.yml * Docs: to avoid the misunderstanding in docs (#3518) * to avoid the misunderstanding in docs * Update docs/quick_start/hands_on.md Co-authored-by: Chun Cai <[email protected]> --------- Co-authored-by: Chun Cai <[email protected]> * Docs: fix a missing depencency in conda build env (#3508) * Feature: Add ENABLE_RAPIDJSON option to control the output of abacus.json (#3519) Add ENABLE_RAPIDJSON option to control the output of abacus.json * Feature: add python wrapper for math sphbes (#3475) * recommit for review * add python wrapper * remove timer since performace tests add * Feature: support segment split in kline mode in KPT file and `out_band` band output precision control, `8` as default (#3493) * add precision control * correct serial version of nscf_band function * fix issue 3482 * update unit and integrated test * update document * correct unittest and make compatible with false and true * fix: bug in Autotest.sh when result.ref has no totaltimeref (#3523) * Fix : unit test of module_xc (#3524) * Fix: omit small magnetic moments to avoid numerical instability (#3530) * update deltalambda * avoid numerical error in orbMulP * add constrain on Mi * change case reference value * Fix: fix multiple compiler warnings (#3515) * Fix: add noreturn attribute to warning_quit * Add type conversion * fix string literal * fix small number trunctuation * Fix system call returned value not checked * fix missing braket * Refactor parameter_pool.cpp and parameter_pool.h * remove duplicated return statements * Change WARNING_QUIT occurances in tests * Add warning message to help debug UT * output the default precision flag (#3496) Co-authored-by: kirk0830 <[email protected]> * Build: Improving CMake performance for finding LibXC and ELPA (#3478) * Fix for finding LibXC and ELPA * For compatibility to previous routines * syntax fix for FindELPA.cmake * Update cmake/FindELPA.cmake Co-authored-by: Chun Cai <[email protected]> * Using CMake interface as default for finding LibXC * update docs * fix for FindLibxc: changing imcompatible if statement * fix for FindLibxc: changing imcompatible if statement * fix for FindLibxc: changing imcompatible if statement * update docs for installing pkg-config * Update FindLibxc.cmake * Update FindLibxc.cmake * remove previous LibXC routine in CMakeLists.txt Co-authored-by: Chun Cai <[email protected]> * Update easy_install.md with Makefile-built LibXC supported * Update easy_install.md to include different behavior in different version on finding ELPA --------- Co-authored-by: Chun Cai <[email protected]> * Docs: correct some docs about mp2 smearing method (#3533) * correct some docs about mp2 smearing method * add docs about mv method * Feature : printing band density (#3501) Co-authored-by: wenfei-li <[email protected]> Co-authored-by: wqzhou <[email protected]> * add some docs for PR#3501 (#3537) * Feature: enable restart charge density mixing during SCF (#3542) * add a new parameter mixing_restart * do not update rho if iter==mixing_restart * do not update rho if iter==mixing_restart-1 * reset mix and rho_mdata if iter==mixing_restart * fix SCF exit directly since drho=0 if iter=GlobalV::MIXING_RESTART * re-set_mixing in eachiterinit for PW and LCAO * enable SCF restarts in esolver_ks::RUN * add some UnitTests * add some Docs * new inputs added * Update input-main.md (#3551) Solve the format problem mentioned in issue 3543 * Build: fix compatibility issue against toolchain install (#3540) * Fix for finding LibXC and ELPA * For compatibility to previous routines * syntax fix for FindELPA.cmake * Update cmake/FindELPA.cmake Co-authored-by: Chun Cai <[email protected]> * Using CMake interface as default for finding LibXC * update docs * fix for FindLibxc: changing imcompatible if statement * fix for FindLibxc: changing imcompatible if statement * fix for FindLibxc: changing imcompatible if statement * update docs for installing pkg-config * Update FindLibxc.cmake * Update FindLibxc.cmake * remove previous LibXC routine in CMakeLists.txt Co-authored-by: Chun Cai <[email protected]> * Update easy_install.md with Makefile-built LibXC supported * Update easy_install.md to include different behavior in different version on finding ELPA * fix compatibility issue against toolchain * Change default ELPA install routine to old one --------- Co-authored-by: Chun Cai <[email protected]> * Test: Configure performance tests for math libraries (#3511) * add performace test of sphbes functions. * fix benchmark cmake errors * add dependencies for docker * update docs * add performance tests for sphbes * add google benchmark * rewrite benchmark tests in fixtures * disable internal testing in benchmark * merge benchmark into integration test --------- Co-authored-by: StarGrys <[email protected]> * Configure Makefile Compiling, fix typos * Fix Makefile Intel toolchains compile errors * Fix even more PEXSI related Makefile compiling issues * Update hsolver_pw.cpp (#3556) when use_uspp==false, overlap matrix should be E. * Fix: cuda build target (#3276) * Fix: cuda buid target * Update CMakeLists.txt --------- Co-authored-by: Denghui Lu <[email protected]> --------- Co-authored-by: wqzhou <[email protected]> Co-authored-by: kirk0830 <[email protected]> Co-authored-by: Haozhi Han <[email protected]> Co-authored-by: Zhao Tianqi <[email protected]> Co-authored-by: PeizeLin <[email protected]> Co-authored-by: jinzx10 <[email protected]> Co-authored-by: Chun Cai <[email protected]> Co-authored-by: Peng Xingliang <[email protected]> Co-authored-by: Jie Li <[email protected]> Co-authored-by: Wenfei Li <[email protected]> Co-authored-by: Denghui Lu <[email protected]> Co-authored-by: YI Zeping <[email protected]> Co-authored-by: wenfei-li <[email protected]> Co-authored-by: jingan-181 <[email protected]> Co-authored-by: StarGrys <[email protected]> Co-authored-by: Haozhi Han <[email protected]> * Revert "Modify inputs and update to latest version" * Update FindPEXSI.cmake to fix Comments * Fix CI errors * Fix CI Errors and Merge with Upstream * Resolve Pull Request Reviews * Fix parallel communication related issue * Fix vars in Makefile.vars, add input tests and comments for pexsi vars * Fix nspin > 1 cases * Improvement: take calculated mu as new initial guess, may slightly improve performance * Fix mistakes in the last commit * Fix: params and features - set default pexsi_temp - fix md in pexsi * fix empty lines * Fix: move params to pexsi_solver, rename USE_PEXSI to ENABLE_PEXSI * Tests: Modify Dockerfile and GitHub Workflows * Fix: wrong abacus link for dockerfile * Docs: added docs for pexsi inputs * Tests: three tests added for pexsi * Fix unit test issues in input_conv * Very good unit test, making my laptop fan spin * Change default pexsi_npole from 80 to 40 * Place pexsi_EDM in DensityMatrix, set size of pexsi_dm = 1 when GlobalV::NSPIN==4, and add comments for dmToRho * An unit test added for DiagoPexsi * modify for changed gint interface * correct nspin related behaviors * add efermi passthrough * Revert "add efermi passthrough" This reverts commit d7b402d. * commits to resolve conversations related to codes * DM and EDM pointers in pexsi now handled by diagopexsi, and copying h s matrices no longer needed * add pexsi examples * fix pexsi unit test (original version shouldn't run) * add building docs for pexsi * set cxx standard to c++14, which is required in make_unique * Fix: Fix typo related to pexsi * update to PPEXSIDFTDriver2 * default npoints to 1, so single core pexsi will work * Feature: exx operator for pw basis, single kpt * apply pexsi changes(?) * q-e style exx_div * Correct exxdiv * Fix Compile errors * refactor to abandon `pdiagh` * Fix mu_buffer and nspin * HSE examples * Feature: Multi-K exx * Feature: Multi-K exx * Updates with latest * Remove redundant global vars * Update to v3.9.0 * Update to v3.9.0, now code works * Remove Redundant cal_exx_energy in esolver_ks_pw.cpp * Some mess * Minor Fixes * Fix separate loop and screening * Add EXX stress * EXX Energy??? * Multi-K is broken??? * Fix: Multi-K and stress * Feature: ACE for single-K * Feature: ACE should work for multi-K, but not for sure * Feature: ACE works. Next step is ACE energy. * Fix: adapt to the latest instruction for variable `conv_esolver` * Reconstruct: move exx_helper to hamilt_pwdft * Refactor: in ESolver_KS_PW, calculate deband in iter_finish, not in hamilt2density * Fix: make files in consistent with upstream * Fix: Now EXX PW doesn't depend on LibRI * Fix: Add input constraints for EXX PW * Fix: Remove redundant mpi barrier * Fix: Clean irrelevant files * Fix: Clean irrelevant files * Feature: add ace flag, exit on using gpu * Refactor: Phase 1 for refactoring exx energy * Feature: now ace calculates energy * Feature: enable exx energy * Fix: fix makefile compilation error * Fix: One minor fix for a segmentation fault * Tests: one integrate test for exx pw, only for verifying whether exx pw works * Revert "Tests: one integrate test for exx pw, only for verifying whether exx pw works" This reverts commit e7b606f. * Fix: EXX PW ACE open only when separate_loop is on * add timer * Feature: Double Grid method of EXX PW * Feature: Double Grid method of EXX PW Stress * Fix: Double Grid method of EXX PW Stress * Feature: add double grid variable * Feature: add double grid variable * Fis: HSE stress * Fix: HSE Stress * Fix: Timer * Fix: Timer * For non mp sampling, disable extrapolation * Modify test * Modify mp * Format * Format * Feature: nspin == 2 scf * Fix: nspin == 2 scf * Docs: EXX PW Docs * Feature: EXX PW for nspin=2 * Docs: EXX PW Docs * Docs: EXX PW Docs * Docs: EXX PW Docs, minor fixes * Refactor * Refactor * Refactor * Refactor * Refactor * Refactor: fix unit test * Refactor: fix unit test * Refactor: fix unit test * Refactor: fix unit test * Bump version v3.9.0.7 * Refactor: Remove set kvec funcs in `K_Vectors` * Refactor: Remove final_scf * Refactor: Fix kvecc2d/d2c * Fix: Tests * Fix: Tests * Fix: Tests * Fix: Tests * Refactor: Final? * Fix * Fix * Fix * Fix * GPU EXX PW Support * Fix: Compile Error on CUDA > 12.9 * Fix: Compile Error on CUDA > 12.9 * NVTX3 * F***ing new version * Feature: Support linear combination of coulomb_param for EXX PW * Fix: Fix compile issue * F***ing new version * F***ing new version * F***ing new version * Uploading hybrid gauge tddft (#6369) * hybrid gague * update tests * update * update * update * update * update unit test * fix tests * update tests * fix read_wfc * fix catch_properties.sh * fix restart * update gpu test * update tests * fix * fix input_conv * Improve md calculation stress output in running log (#6366) * Improve md calculation stress output in running log * Module_IO Unittest modify * ModuleMD Unittests modify * modify code comment in fire_test.cpp * maintain setprecision(8) for md stress output * Refactor: Remove redundant Input_para from ESolver Class (#6370) * Refactor: Replace PARAM.inp with inp in ESolver classes for consistency * Refactor: Replace local input parameters with PARAM.inp in ESolver classes for consistency * Refactor: Use PARAM.inp.scf_ene_thr in ESolver_KS_LCAO iter_finish method * Revert "Refactor: Use PARAM.inp.scf_ene_thr in ESolver_KS_LCAO iter_finish method" This reverts commit b1bd0fd. * Revert "Refactor: Replace local input parameters with PARAM.inp in ESolver classes for consistency" This reverts commit f4f81e3. * Fix: Fix memory leak introduced by new gint module (#6375) * fix memory leak * delete copy assignment * refactor Exx_Opt_Orb (#6378) Co-authored-by: linpz <[email protected]> * Add use sw and fix Floating point exception (#6372) * remove float error in sunway * fix ig=0 * add the sw * change the make_dir * unify the gg use * fix compile bug * add init * temporarily remove the sunway define * add the pesduo * fix compile bug * fix bug in the betar * modify the test * Update the output formats of rt-TDDFT (#6381) * update the output formats of rt-TDDFT * update the output formats of rt-TDDFT * fix a bug * update initialized velocities * found some output information is still lacking in MD module * [Refactor] Rename grid to module_grid and genelpa to module_genelpa (#6386) * Rename grid to module_grid * Rename genelpa to module_genelpa * Fix cmake * Update the outputs of geometry relaxation (#6387) * update the output formats of rt-TDDFT * update the output formats of rt-TDDFT * fix a bug * update initialized velocities * found some output information is still lacking in MD module * update output information * remove some global variables in relax_driver * update outputs * update relaxation outputs * update relaxation output messages * update tests of print info * fix a test * fix cg outputs * udpate cg test * update relax tests * update LCAO output stress format * change update_cell.cpp algorithm, when the ion move is larger than the cell length, it is fine to proceed the relaxation calculations * fix tests for unitcells * update cell * Feature: support the output of matrix representation of symm_ops (#6390) * Feature: support output the matrix representation of symmetry operation * Feature: support the output of matrix representation of symm_ops * update the document * Feature: Output real space wavefunction and partial charge density when `device=gpu` (#6391) * Fix GPU output of out_pchg and out_wfc_norm, out_wfc_re_im * GPU integrate test is functional again * Optimize RT-TDDFT dipole output (#6393) * Perf: support GPU version of cal_force_cc with LCAO basis (#6392) * support GPU version of cal_force_cc with LCAO basis * fix a bug * [Refactor] Move module_lr to source_lcao and add a new folder module_external in source_base (#6388) * Move module_lr to source_lcao * Fix test build * Move blas_connector to module_external * Fix header use * Fix internal header use * A fierce battle with Makefile😡 * Move blacs_connector.h to module_external * Move lapack_connector.h and lapack_wrapper.h to module_external * Fix header usage * Move scalapack_connector.h to module_external * Fix a bug for the output information after relaxation (#6395) * update the output formats of rt-TDDFT * update the output formats of rt-TDDFT * fix a bug * update initialized velocities * found some output information is still lacking in MD module * update output information * remove some global variables in relax_driver * update outputs * update relaxation outputs * update relaxation output messages * update tests of print info * fix a test * fix cg outputs * udpate cg test * update relax tests * update LCAO output stress format * change update_cell.cpp algorithm, when the ion move is larger than the cell length, it is fine to proceed the relaxation calculations * fix tests for unitcells * update cell * update some function names, update output A to Angstrom * change eV/A to eV/Angstrom * bump version to 3.9.0.10 (#6397) Co-authored-by: Liang Sun <[email protected]> * Fix: fix exx_gamma_extrapolation error in MPI * Fix: fix exx_gamma_extrapolation error in MPI * Update lapack.cu * Refactor: Use LAPACK interfaces from ATen * Fix: Integrate test * Fix: implement devinfo for potrf * Fix: MPI and Makefile * Fix: get_potential * Fix: ace --------- Co-authored-by: wqzhou <[email protected]> Co-authored-by: kirk0830 <[email protected]> Co-authored-by: Haozhi Han <[email protected]> Co-authored-by: Zhao Tianqi <[email protected]> Co-authored-by: PeizeLin <[email protected]> Co-authored-by: jinzx10 <[email protected]> Co-authored-by: Chun Cai <[email protected]> Co-authored-by: Peng Xingliang <[email protected]> Co-authored-by: Jie Li <[email protected]> Co-authored-by: Wenfei Li <[email protected]> Co-authored-by: Denghui Lu <[email protected]> Co-authored-by: YI Zeping <[email protected]> Co-authored-by: wenfei-li <[email protected]> Co-authored-by: jingan-181 <[email protected]> Co-authored-by: StarGrys <[email protected]> Co-authored-by: Haozhi Han <[email protected]> Co-authored-by: Mohan Chen <[email protected]> Co-authored-by: HTZhao <[email protected]> Co-authored-by: lanshuyue <[email protected]> Co-authored-by: Liang Sun <[email protected]> Co-authored-by: dzzz2001 <[email protected]> Co-authored-by: linpeize <[email protected]> Co-authored-by: linpz <[email protected]> Co-authored-by: liiutao <[email protected]> Co-authored-by: Mohan Chen <[email protected]> Co-authored-by: Critsium <[email protected]> Co-authored-by: Taoni Bao <[email protected]> Co-authored-by: Chen Nuo <[email protected]>
1 parent 3f7e7a4 commit f16364d

22 files changed

+651
-212
lines changed

source/CMakeLists.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,10 @@ list(APPEND device_srcs
5454
source_pw/module_pwdft/kernels/vnl_op.cpp
5555
source_base/kernels/math_ylm_op.cpp
5656
source_hamilt/module_xc/kernels/xc_functional_op.cpp
57+
source_pw/module_pwdft/kernels/cal_density_real_op.cpp
58+
source_pw/module_pwdft/kernels/mul_potential_op.cpp
59+
source_pw/module_pwdft/kernels/vec_mul_vec_complex_op.cpp
60+
source_pw/module_pwdft/kernels/exx_cal_energy_op.cpp
5761
)
5862

5963
if(USE_CUDA)
@@ -80,6 +84,10 @@ if(USE_CUDA)
8084
source_base/kernels/cuda/math_kernel_op.cu
8185
source_base/kernels/cuda/math_kernel_op_vec.cu
8286
source_hamilt/module_xc/kernels/cuda/xc_functional_op.cu
87+
source_pw/module_pwdft/kernels/cuda/cal_density_real_op.cu
88+
source_pw/module_pwdft/kernels/cuda/mul_potential_op.cu
89+
source_pw/module_pwdft/kernels/cuda/vec_mul_vec_complex.cu
90+
source_pw/module_pwdft/kernels/cuda/exx_cal_energy_op.cu
8391
)
8492
endif()
8593

source/Makefile.Objects

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -342,6 +342,10 @@ OBJS_HAMILT=hamilt_pw.o\
342342
velocity_pw.o\
343343
radial_proj.o\
344344
exx_helper.o\
345+
vec_mul_vec_complex_op.o\
346+
exx_cal_energy_op.o\
347+
cal_density_real_op.o\
348+
mul_potential_op.o\
345349

346350
OBJS_HAMILT_OF=kedf_tf.o\
347351
kedf_vw.o\

source/source_base/module_container/ATen/kernels/cuda/lapack.cu

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,8 +70,6 @@ struct lapack_trtri<T, DEVICE_GPU> {
7070
{
7171
// TODO: trtri is not implemented in this method yet
7272
// Cause the trtri in cuSolver is not stable for ABACUS!
73-
// But why?! trtri and potri are different routines for different job!
74-
// How can BPCG work without using a proper routine?
7573
cuSolverConnector::trtri(cusolver_handle, uplo, diag, dim, Mat, lda);
7674
// cuSolverConnector::potri(cusolver_handle, uplo, diag, dim, Mat, lda);
7775
}

source/source_base/module_container/base/third_party/cusolver.h

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -87,45 +87,57 @@ static inline
8787
void potrf (cusolverDnHandle_t& cusolver_handle, const char& uplo, const int& n, float * A, const int& lda)
8888
{
8989
int lwork;
90+
int *info = nullptr;
91+
cudaErrcheck(cudaMalloc((void**)&info, 1 * sizeof(int)));
9092
cusolverErrcheck(cusolverDnSpotrf_bufferSize(cusolver_handle, cublas_fill_mode(uplo), n, A, n, &lwork));
9193
float* work;
9294
cudaErrcheck(cudaMalloc((void**)&work, lwork * sizeof(float)));
9395
// Perform Cholesky decomposition
94-
cusolverErrcheck(cusolverDnSpotrf(cusolver_handle, cublas_fill_mode(uplo), n, A, n, work, lwork, nullptr));
96+
cusolverErrcheck(cusolverDnSpotrf(cusolver_handle, cublas_fill_mode(uplo), n, A, n, work, lwork, info));
9597
cudaErrcheck(cudaFree(work));
98+
cudaErrcheck(cudaFree(info));
9699
}
97100
static inline
98101
void potrf (cusolverDnHandle_t& cusolver_handle, const char& uplo, const int& n, double * A, const int& lda)
99102
{
100103
int lwork;
104+
int *info = nullptr;
105+
cudaErrcheck(cudaMalloc((void**)&info, 1 * sizeof(int)));
101106
cusolverErrcheck(cusolverDnDpotrf_bufferSize(cusolver_handle, cublas_fill_mode(uplo), n, A, n, &lwork));
102107
double* work;
103108
cudaErrcheck(cudaMalloc((void**)&work, lwork * sizeof(double)));
104109
// Perform Cholesky decomposition
105-
cusolverErrcheck(cusolverDnDpotrf(cusolver_handle, cublas_fill_mode(uplo), n, A, n, work, lwork, nullptr));
110+
cusolverErrcheck(cusolverDnDpotrf(cusolver_handle, cublas_fill_mode(uplo), n, A, n, work, lwork, info));
106111
cudaErrcheck(cudaFree(work));
112+
cudaErrcheck(cudaFree(info));
107113
}
108114
static inline
109115
void potrf (cusolverDnHandle_t& cusolver_handle, const char& uplo, const int& n, std::complex<float> * A, const int& lda)
110116
{
111117
int lwork;
112-
cusolverErrcheck(cusolverDnCpotrf_bufferSize(cusolver_handle, cublas_fill_mode(uplo), n, reinterpret_cast<cuComplex*>(A), n, &lwork));
118+
int *info = nullptr;
119+
cudaErrcheck(cudaMalloc((void**)&info, 1 * sizeof(int)));
120+
cusolverErrcheck(cusolverDnCpotrf_bufferSize(cusolver_handle, cublas_fill_mode(uplo), n, reinterpret_cast<cuComplex*>(A), lda, &lwork));
113121
cuComplex* work;
114122
cudaErrcheck(cudaMalloc((void**)&work, lwork * sizeof(cuComplex)));
115123
// Perform Cholesky decomposition
116-
cusolverErrcheck(cusolverDnCpotrf(cusolver_handle, cublas_fill_mode(uplo), n, reinterpret_cast<cuComplex*>(A), n, work, lwork, nullptr));
124+
cusolverErrcheck(cusolverDnCpotrf(cusolver_handle, cublas_fill_mode(uplo), n, reinterpret_cast<cuComplex*>(A), lda, work, lwork, info));
117125
cudaErrcheck(cudaFree(work));
126+
cudaErrcheck(cudaFree(info));
118127
}
119128
static inline
120129
void potrf (cusolverDnHandle_t& cusolver_handle, const char& uplo, const int& n, std::complex<double> * A, const int& lda)
121130
{
122131
int lwork;
123-
cusolverErrcheck(cusolverDnZpotrf_bufferSize(cusolver_handle, cublas_fill_mode(uplo), n, reinterpret_cast<cuDoubleComplex*>(A), n, &lwork));
132+
int *info = nullptr;
133+
cudaErrcheck(cudaMalloc((void**)&info, 1 * sizeof(int)));
134+
cusolverErrcheck(cusolverDnZpotrf_bufferSize(cusolver_handle, cublas_fill_mode(uplo), n, reinterpret_cast<cuDoubleComplex*>(A), lda, &lwork));
124135
cuDoubleComplex* work;
125136
cudaErrcheck(cudaMalloc((void**)&work, lwork * sizeof(cuDoubleComplex)));
126137
// Perform Cholesky decomposition
127-
cusolverErrcheck(cusolverDnZpotrf(cusolver_handle, cublas_fill_mode(uplo), n, reinterpret_cast<cuDoubleComplex*>(A), n, work, lwork, nullptr));
138+
cusolverErrcheck(cusolverDnZpotrf(cusolver_handle, cublas_fill_mode(uplo), n, reinterpret_cast<cuDoubleComplex*>(A), lda, work, lwork, info));
128139
cudaErrcheck(cudaFree(work));
140+
cudaErrcheck(cudaFree(info));
129141
}
130142

131143

source/source_basis/module_pw/pw_basis.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -432,6 +432,9 @@ class PW_Basis
432432
void set_device(std::string device_);
433433
void set_precision(std::string precision_);
434434

435+
std::string get_device() const { return device; }
436+
std::string get_precision() const { return precision; }
437+
435438
protected:
436439

437440
std::string device = "cpu"; ///< cpu or gpu

source/source_esolver/esolver_ks_pw.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -619,6 +619,7 @@ void ESolver_KS_PW<T, Device>::iter_finish(UnitCell& ucell, const int istep, int
619619
{
620620
auto start = std::chrono::high_resolution_clock::now();
621621
exx_helper.set_firstiter(false);
622+
exx_helper.op_exx->first_iter = false;
622623
exx_helper.set_psi(this->kspw_psi);
623624

624625
conv_esolver = exx_helper.exx_after_converge(iter);

source/source_io/input_conv.cpp

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -488,19 +488,15 @@ void Input_Conv::Convert()
488488
{
489489
if (ModuleSymmetry::Symmetry::symm_flag != -1)
490490
{
491-
ModuleBase::WARNING("Input_Conv", "EXX PW works only with symmetry=-1");
492-
ModuleSymmetry::Symmetry::symm_flag = -1;
491+
ModuleBase::WARNING_QUIT("Input_Conv", "EXX PW works only with symmetry=-1");
492+
// ModuleSymmetry::Symmetry::symm_flag = -1;
493493
}
494494

495495
if (PARAM.inp.nspin != 1 && PARAM.inp.nspin != 2)
496496
{
497497
ModuleBase::WARNING_QUIT("Input_Conv", "EXX PW works only with nspin=1 and 2");
498498
}
499499

500-
if (PARAM.inp.device != "cpu")
501-
{
502-
ModuleBase::WARNING_QUIT("Input_Conv", "EXX PW works only with device=cpu");
503-
}
504500
}
505501

506502
//----------------------------------------------------------

source/source_pw/module_pwdft/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ list(APPEND objects
2626
stress_func_nl.cpp
2727
stress_func_us.cpp
2828
stress_func_onsite.cpp
29+
stress_func_exx.cpp
2930
stress_pw.cpp
3031
VL_in_pw.cpp
3132
VNL_in_pw.cpp
@@ -47,7 +48,6 @@ add_library(
4748
module_pwdft
4849
OBJECT
4950
${objects}
50-
stress_func_exx.cpp
5151
)
5252

5353
if(ENABLE_COVERAGE)
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
#include "source_pw/module_pwdft/kernels/cal_density_real_op.h"
2+
#include "source_psi/psi.h"
3+
namespace hamilt
4+
{
5+
template <typename T>
6+
struct cal_density_real_op<T, base_device::DEVICE_CPU>
7+
{
8+
void operator()(const T *in1, const T *in2, T *out, double omega, int nrxx)
9+
{
10+
#ifdef _OPENMP
11+
#pragma omp parallel for schedule(static)
12+
#endif
13+
for (int ir = 0; ir < nrxx; ir++)
14+
{
15+
// assert(is_finite(psi_nk_real[ir]));
16+
// assert(is_finite(psi_mq_real[ir]));
17+
out[ir] = in1[ir] * std::conj(in2[ir]) / static_cast<T>(omega); // Phase e^(i(q-k)r)
18+
}
19+
}
20+
21+
};
22+
23+
template struct cal_density_real_op<std::complex<float>, base_device::DEVICE_CPU>;
24+
template struct cal_density_real_op<std::complex<double>, base_device::DEVICE_CPU>;
25+
}
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
#include "source_base/macros.h"
2+
3+
#ifndef CAL_DENSITY_REAL_OP_H
4+
#define CAL_DENSITY_REAL_OP_H
5+
namespace hamilt
6+
{
7+
template <typename T, typename Device>
8+
struct cal_density_real_op
9+
{
10+
using Real = typename GetTypeReal<T>::type;
11+
void operator()(const T *psi1, const T* psi2, T *out, double omega, int nrxx);
12+
};
13+
}
14+
#endif //CAL_DENSITY_REAL_OP_H

0 commit comments

Comments
 (0)