Feature: EXX PW GPU Support (deepmodeling#6407)

Flying-dragon-boxing · WHUweiqingzhou · kirk0830 · kluophysics · commit 7a5cdd7e764d · 2025-08-10T08:26:45.000+08:00
* Works * adapt to the new container * Turn off USE_PEXSI * Update LibRI to 553c91c * modify include files * namespace-ize * new inputs added * Configure Makefile Compiling, fix typos * Fix Makefile Intel toolchains compile errors * Fix even more PEXSI related Makefile compiling issues * Modify inputs and update to latest version (deepmodeling#2) * run INPUT.Default() in every process in InputParaTest (deepmodeling#3490) Co-authored-by: kirk0830 <67682086+kirk0830@users.noreply.github.com> * add blas support for FindLAPACK.cmake (deepmodeling#3497) * more unittest of QO: towards orbital selection (deepmodeling#3499) * Fix: fix bug in mulliken charge calculation (deepmodeling#3503) * fix phase * fix case test * Refactor: namespace Conv_Coulomb_Pot_K (deepmodeling#3446) * Refactor: namespace Conv_Coulomb_Pot_K * Refactor: namespace Conv_Coulomb_Pot_K --------- Co-authored-by: wqzhou <33364058+WHUweiqingzhou@users.noreply.github.com> * enable the computation of all zeros in one function call (deepmodeling#3449) Co-authored-by: wqzhou <33364058+WHUweiqingzhou@users.noreply.github.com> * replace ios.eof() by ios.good() to avoid meeting badbit and failbit in reading STRU (deepmodeling#3506) * Build: add ccache to accelerate the testing process (deepmodeling#3509) * Build: add ccache to accelerate the testing process * Update test.yml * Update test.yml * Update test.yml * Docs: to avoid the misunderstanding in docs (deepmodeling#3518) * to avoid the misunderstanding in docs * Update docs/quick_start/hands_on.md Co-authored-by: Chun Cai <amoycaic@gmail.com> --------- Co-authored-by: Chun Cai <amoycaic@gmail.com> * Docs: fix a missing depencency in conda build env (deepmodeling#3508) * Feature: Add ENABLE_RAPIDJSON option to control the output of abacus.json (deepmodeling#3519) Add ENABLE_RAPIDJSON option to control the output of abacus.json * Feature: add python wrapper for math sphbes (deepmodeling#3475) * recommit for review * add python wrapper * remove timer since performace tests add * Feature: support segment split in kline mode in KPT file and `out_band` band output precision control, `8` as default (deepmodeling#3493) * add precision control * correct serial version of nscf_band function * fix issue 3482 * update unit and integrated test * update document * correct unittest and make compatible with false and true * fix: bug in Autotest.sh when result.ref has no totaltimeref (deepmodeling#3523) * Fix : unit test of module_xc (deepmodeling#3524) * Fix: omit small magnetic moments to avoid numerical instability (deepmodeling#3530) * update deltalambda * avoid numerical error in orbMulP * add constrain on Mi * change case reference value * Fix: fix multiple compiler warnings (deepmodeling#3515) * Fix: add noreturn attribute to warning_quit * Add type conversion * fix string literal * fix small number trunctuation * Fix system call returned value not checked * fix missing braket * Refactor parameter_pool.cpp and parameter_pool.h * remove duplicated return statements * Change WARNING_QUIT occurances in tests * Add warning message to help debug UT * output the default precision flag (deepmodeling#3496) Co-authored-by: kirk0830 <67682086+kirk0830@users.noreply.github.com> * Build: Improving CMake performance for finding LibXC and ELPA (deepmodeling#3478) * Fix for finding LibXC and ELPA * For compatibility to previous routines * syntax fix for FindELPA.cmake * Update cmake/FindELPA.cmake Co-authored-by: Chun Cai <amoycaic@gmail.com> * Using CMake interface as default for finding LibXC * update docs * fix for FindLibxc: changing imcompatible if statement * fix for FindLibxc: changing imcompatible if statement * fix for FindLibxc: changing imcompatible if statement * update docs for installing pkg-config * Update FindLibxc.cmake * Update FindLibxc.cmake * remove previous LibXC routine in CMakeLists.txt Co-authored-by: Chun Cai <amoycaic@gmail.com> * Update easy_install.md with Makefile-built LibXC supported * Update easy_install.md to include different behavior in different version on finding ELPA --------- Co-authored-by: Chun Cai <amoycaic@gmail.com> * Docs: correct some docs about mp2 smearing method (deepmodeling#3533) * correct some docs about mp2 smearing method * add docs about mv method * Feature : printing band density (deepmodeling#3501) Co-authored-by: wenfei-li <liwenfei@gmail.com> Co-authored-by: wqzhou <33364058+WHUweiqingzhou@users.noreply.github.com> * add some docs for PR#3501 (deepmodeling#3537) * Feature: enable restart charge density mixing during SCF (deepmodeling#3542) * add a new parameter mixing_restart * do not update rho if iter==mixing_restart * do not update rho if iter==mixing_restart-1 * reset mix and rho_mdata if iter==mixing_restart * fix SCF exit directly since drho=0 if iter=GlobalV::MIXING_RESTART * re-set_mixing in eachiterinit for PW and LCAO * enable SCF restarts in esolver_ks::RUN * add some UnitTests * add some Docs * new inputs added * Update input-main.md (deepmodeling#3551) Solve the format problem mentioned in issue 3543 * Build: fix compatibility issue against toolchain install (deepmodeling#3540) * Fix for finding LibXC and ELPA * For compatibility to previous routines * syntax fix for FindELPA.cmake * Update cmake/FindELPA.cmake Co-authored-by: Chun Cai <amoycaic@gmail.com> * Using CMake interface as default for finding LibXC * update docs * fix for FindLibxc: changing imcompatible if statement * fix for FindLibxc: changing imcompatible if statement * fix for FindLibxc: changing imcompatible if statement * update docs for installing pkg-config * Update FindLibxc.cmake * Update FindLibxc.cmake * remove previous LibXC routine in CMakeLists.txt Co-authored-by: Chun Cai <amoycaic@gmail.com> * Update easy_install.md with Makefile-built LibXC supported * Update easy_install.md to include different behavior in different version on finding ELPA * fix compatibility issue against toolchain * Change default ELPA install routine to old one --------- Co-authored-by: Chun Cai <amoycaic@gmail.com> * Test: Configure performance tests for math libraries (deepmodeling#3511) * add performace test of sphbes functions. * fix benchmark cmake errors * add dependencies for docker * update docs * add performance tests for sphbes * add google benchmark * rewrite benchmark tests in fixtures * disable internal testing in benchmark * merge benchmark into integration test --------- Co-authored-by: StarGrys <771582678@qq.com> * Configure Makefile Compiling, fix typos * Fix Makefile Intel toolchains compile errors * Fix even more PEXSI related Makefile compiling issues * Update hsolver_pw.cpp (deepmodeling#3556) when use_uspp==false, overlap matrix should be E. * Fix: cuda build target (deepmodeling#3276) * Fix: cuda buid target * Update CMakeLists.txt --------- Co-authored-by: Denghui Lu <denghuilu@pku.edu.cn> --------- Co-authored-by: wqzhou <33364058+WHUweiqingzhou@users.noreply.github.com> Co-authored-by: kirk0830 <67682086+kirk0830@users.noreply.github.com> Co-authored-by: Haozhi Han <haozhi.han@outlook.com> Co-authored-by: Zhao Tianqi <hongriTianqi@users.noreply.github.com> Co-authored-by: PeizeLin <78645006+PeizeLin@users.noreply.github.com> Co-authored-by: jinzx10 <jzx016@hotmail.com> Co-authored-by: Chun Cai <amoycaic@gmail.com> Co-authored-by: Peng Xingliang <91927439+pxlxingliang@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Wenfei Li <38569667+wenfei-li@users.noreply.github.com> Co-authored-by: Denghui Lu <denghuilu@pku.edu.cn> Co-authored-by: YI Zeping <18586016708@163.com> Co-authored-by: wenfei-li <liwenfei@gmail.com> Co-authored-by: jingan-181 <78459531+jingan-181@users.noreply.github.com> Co-authored-by: StarGrys <771582678@qq.com> Co-authored-by: Haozhi Han <haozhi.han@stu.pku.edu.cn> * Revert "Modify inputs and update to latest version" * Update FindPEXSI.cmake to fix Comments * Fix CI errors * Fix CI Errors and Merge with Upstream * Resolve Pull Request Reviews * Fix parallel communication related issue * Fix vars in Makefile.vars, add input tests and comments for pexsi vars * Fix nspin > 1 cases * Improvement: take calculated mu as new initial guess, may slightly improve performance * Fix mistakes in the last commit * Fix: params and features - set default pexsi_temp - fix md in pexsi * fix empty lines * Fix: move params to pexsi_solver, rename USE_PEXSI to ENABLE_PEXSI * Tests: Modify Dockerfile and GitHub Workflows * Fix: wrong abacus link for dockerfile * Docs: added docs for pexsi inputs * Tests: three tests added for pexsi * Fix unit test issues in input_conv * Very good unit test, making my laptop fan spin * Change default pexsi_npole from 80 to 40 * Place pexsi_EDM in DensityMatrix, set size of pexsi_dm = 1 when GlobalV::NSPIN==4, and add comments for dmToRho * An unit test added for DiagoPexsi * modify for changed gint interface * correct nspin related behaviors * add efermi passthrough * Revert "add efermi passthrough" This reverts commit d7b402d. * commits to resolve conversations related to codes * DM and EDM pointers in pexsi now handled by diagopexsi, and copying h s matrices no longer needed * add pexsi examples * fix pexsi unit test (original version shouldn't run) * add building docs for pexsi * set cxx standard to c++14, which is required in make_unique * Fix: Fix typo related to pexsi * update to PPEXSIDFTDriver2 * default npoints to 1, so single core pexsi will work * Feature: exx operator for pw basis, single kpt * apply pexsi changes(?) * q-e style exx_div * Correct exxdiv * Fix Compile errors * refactor to abandon `pdiagh` * Fix mu_buffer and nspin * HSE examples * Feature: Multi-K exx * Feature: Multi-K exx * Updates with latest * Remove redundant global vars * Update to v3.9.0 * Update to v3.9.0, now code works * Remove Redundant cal_exx_energy in esolver_ks_pw.cpp * Some mess * Minor Fixes * Fix separate loop and screening * Add EXX stress * EXX Energy??? * Multi-K is broken??? * Fix: Multi-K and stress * Feature: ACE for single-K * Feature: ACE should work for multi-K, but not for sure * Feature: ACE works. Next step is ACE energy. * Fix: adapt to the latest instruction for variable `conv_esolver` * Reconstruct: move exx_helper to hamilt_pwdft * Refactor: in ESolver_KS_PW, calculate deband in iter_finish, not in hamilt2density * Fix: make files in consistent with upstream * Fix: Now EXX PW doesn't depend on LibRI * Fix: Add input constraints for EXX PW * Fix: Remove redundant mpi barrier * Fix: Clean irrelevant files * Fix: Clean irrelevant files * Feature: add ace flag, exit on using gpu * Refactor: Phase 1 for refactoring exx energy * Feature: now ace calculates energy * Feature: enable exx energy * Fix: fix makefile compilation error * Fix: One minor fix for a segmentation fault * Tests: one integrate test for exx pw, only for verifying whether exx pw works * Revert "Tests: one integrate test for exx pw, only for verifying whether exx pw works" This reverts commit e7b606f. * Fix: EXX PW ACE open only when separate_loop is on * add timer * Feature: Double Grid method of EXX PW * Feature: Double Grid method of EXX PW Stress * Fix: Double Grid method of EXX PW Stress * Feature: add double grid variable * Feature: add double grid variable * Fis: HSE stress * Fix: HSE Stress * Fix: Timer * Fix: Timer * For non mp sampling, disable extrapolation * Modify test * Modify mp * Format * Format * Feature: nspin == 2 scf * Fix: nspin == 2 scf * Docs: EXX PW Docs * Feature: EXX PW for nspin=2 * Docs: EXX PW Docs * Docs: EXX PW Docs * Docs: EXX PW Docs, minor fixes * Refactor * Refactor * Refactor * Refactor * Refactor * Refactor: fix unit test * Refactor: fix unit test * Refactor: fix unit test * Refactor: fix unit test * Bump version v3.9.0.7 * Refactor: Remove set kvec funcs in `K_Vectors` * Refactor: Remove final_scf * Refactor: Fix kvecc2d/d2c * Fix: Tests * Fix: Tests * Fix: Tests * Fix: Tests * Refactor: Final? * Fix * Fix * Fix * Fix * GPU EXX PW Support * Fix: Compile Error on CUDA > 12.9 * Fix: Compile Error on CUDA > 12.9 * NVTX3 * F***ing new version * Feature: Support linear combination of coulomb_param for EXX PW * Fix: Fix compile issue * F***ing new version * F***ing new version * F***ing new version * Uploading hybrid gauge tddft (deepmodeling#6369) * hybrid gague * update tests * update * update * update * update * update unit test * fix tests * update tests * fix read_wfc * fix catch_properties.sh * fix restart * update gpu test * update tests * fix * fix input_conv * Improve md calculation stress output in running log (deepmodeling#6366) * Improve md calculation stress output in running log * Module_IO Unittest modify * ModuleMD Unittests modify * modify code comment in fire_test.cpp * maintain setprecision(8) for md stress output * Refactor: Remove redundant Input_para from ESolver Class (deepmodeling#6370) * Refactor: Replace PARAM.inp with inp in ESolver classes for consistency * Refactor: Replace local input parameters with PARAM.inp in ESolver classes for consistency * Refactor: Use PARAM.inp.scf_ene_thr in ESolver_KS_LCAO iter_finish method * Revert "Refactor: Use PARAM.inp.scf_ene_thr in ESolver_KS_LCAO iter_finish method" This reverts commit b1bd0fd. * Revert "Refactor: Replace local input parameters with PARAM.inp in ESolver classes for consistency" This reverts commit f4f81e3. * Fix: Fix memory leak introduced by new gint module (deepmodeling#6375) * fix memory leak * delete copy assignment * refactor Exx_Opt_Orb (deepmodeling#6378) Co-authored-by: linpz <linpz@mail.ustc.edu.cn> * Add use sw and fix Floating point exception (deepmodeling#6372) * remove float error in sunway * fix ig=0 * add the sw * change the make_dir * unify the gg use * fix compile bug * add init * temporarily remove the sunway define * add the pesduo * fix compile bug * fix bug in the betar * modify the test * Update the output formats of rt-TDDFT (deepmodeling#6381) * update the output formats of rt-TDDFT * update the output formats of rt-TDDFT * fix a bug * update initialized velocities * found some output information is still lacking in MD module * [Refactor] Rename grid to module_grid and genelpa to module_genelpa (deepmodeling#6386) * Rename grid to module_grid * Rename genelpa to module_genelpa * Fix cmake * Update the outputs of geometry relaxation (deepmodeling#6387) * update the output formats of rt-TDDFT * update the output formats of rt-TDDFT * fix a bug * update initialized velocities * found some output information is still lacking in MD module * update output information * remove some global variables in relax_driver * update outputs * update relaxation outputs * update relaxation output messages * update tests of print info * fix a test * fix cg outputs * udpate cg test * update relax tests * update LCAO output stress format * change update_cell.cpp algorithm, when the ion move is larger than the cell length, it is fine to proceed the relaxation calculations * fix tests for unitcells * update cell * Feature: support the output of matrix representation of symm_ops (deepmodeling#6390) * Feature: support output the matrix representation of symmetry operation * Feature: support the output of matrix representation of symm_ops * update the document * Feature: Output real space wavefunction and partial charge density when `device=gpu` (deepmodeling#6391) * Fix GPU output of out_pchg and out_wfc_norm, out_wfc_re_im * GPU integrate test is functional again * Optimize RT-TDDFT dipole output (deepmodeling#6393) * Perf: support GPU version of cal_force_cc with LCAO basis (deepmodeling#6392) * support GPU version of cal_force_cc with LCAO basis * fix a bug * [Refactor] Move module_lr to source_lcao and add a new folder module_external in source_base (deepmodeling#6388) * Move module_lr to source_lcao * Fix test build * Move blas_connector to module_external * Fix header use * Fix internal header use * A fierce battle with Makefile😡 * Move blacs_connector.h to module_external * Move lapack_connector.h and lapack_wrapper.h to module_external * Fix header usage * Move scalapack_connector.h to module_external * Fix a bug for the output information after relaxation (deepmodeling#6395) * update the output formats of rt-TDDFT * update the output formats of rt-TDDFT * fix a bug * update initialized velocities * found some output information is still lacking in MD module * update output information * remove some global variables in relax_driver * update outputs * update relaxation outputs * update relaxation output messages * update tests of print info * fix a test * fix cg outputs * udpate cg test * update relax tests * update LCAO output stress format * change update_cell.cpp algorithm, when the ion move is larger than the cell length, it is fine to proceed the relaxation calculations * fix tests for unitcells * update cell * update some function names, update output A to Angstrom * change eV/A to eV/Angstrom * bump version to 3.9.0.10 (deepmodeling#6397) Co-authored-by: Liang Sun <50293369+sunliang98@users.noreply.github.com> * Fix: fix exx_gamma_extrapolation error in MPI * Fix: fix exx_gamma_extrapolation error in MPI * Update lapack.cu * Refactor: Use LAPACK interfaces from ATen * Fix: Integrate test * Fix: implement devinfo for potrf * Fix: MPI and Makefile * Fix: get_potential * Fix: ace --------- Co-authored-by: wqzhou <33364058+WHUweiqingzhou@users.noreply.github.com> Co-authored-by: kirk0830 <67682086+kirk0830@users.noreply.github.com> Co-authored-by: Haozhi Han <haozhi.han@outlook.com> Co-authored-by: Zhao Tianqi <hongriTianqi@users.noreply.github.com> Co-authored-by: PeizeLin <78645006+PeizeLin@users.noreply.github.com> Co-authored-by: jinzx10 <jzx016@hotmail.com> Co-authored-by: Chun Cai <amoycaic@gmail.com> Co-authored-by: Peng Xingliang <91927439+pxlxingliang@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Wenfei Li <38569667+wenfei-li@users.noreply.github.com> Co-authored-by: Denghui Lu <denghuilu@pku.edu.cn> Co-authored-by: YI Zeping <18586016708@163.com> Co-authored-by: wenfei-li <liwenfei@gmail.com> Co-authored-by: jingan-181 <78459531+jingan-181@users.noreply.github.com> Co-authored-by: StarGrys <771582678@qq.com> Co-authored-by: Haozhi Han <haozhi.han@stu.pku.edu.cn> Co-authored-by: Mohan Chen <mohan.chen.chen.mohan@gmail.com> Co-authored-by: HTZhao <104255052+ESROAMER@users.noreply.github.com> Co-authored-by: lanshuyue <140165754+lanshuyue@users.noreply.github.com> Co-authored-by: Liang Sun <50293369+sunliang98@users.noreply.github.com> Co-authored-by: dzzz2001 <153698752+dzzz2001@users.noreply.github.com> Co-authored-by: linpeize <linpeize2024@163.com> Co-authored-by: linpz <linpz@mail.ustc.edu.cn> Co-authored-by: liiutao <74701833+A-006@users.noreply.github.com> Co-authored-by: Mohan Chen <mohanchen@pku.edu.cn> Co-authored-by: Critsium <tsfxwbbzxy@163.com> Co-authored-by: Taoni Bao <baotaoni@pku.edu.cn> Co-authored-by: Chen Nuo <49788094+Cstandardlib@users.noreply.github.com>
diff --git a/source/source_base/module_container/ATen/kernels/cuda/lapack.cu b/source/source_base/module_container/ATen/kernels/cuda/lapack.cu
@@ -70,8 +70,6 @@ struct lapack_trtri<T, DEVICE_GPU> {
     {
         // TODO: trtri is not implemented in this method yet
         // Cause the trtri in cuSolver is not stable for ABACUS!
-        // But why?! trtri and potri are different routines for different job! 
-        // How can BPCG work without using a proper routine? 
         cuSolverConnector::trtri(cusolver_handle, uplo, diag, dim, Mat, lda);
         // cuSolverConnector::potri(cusolver_handle, uplo, diag, dim, Mat, lda);
     }

Original file line number	Diff line number	Diff line change
`@@ -70,8 +70,6 @@ struct lapack_trtri<T, DEVICE_GPU> {`
`70`	`70`	`{`
`71`	`71`	`// TODO: trtri is not implemented in this method yet`
`72`	`72`	`// Cause the trtri in cuSolver is not stable for ABACUS!`
`73`		`- // But why?! trtri and potri are different routines for different job!`
`74`		`- // How can BPCG work without using a proper routine?`
`75`	`73`	`cuSolverConnector::trtri(cusolver_handle, uplo, diag, dim, Mat, lda);`
`76`	`74`	`// cuSolverConnector::potri(cusolver_handle, uplo, diag, dim, Mat, lda);`
`77`	`75`	`}`