Skip to content

Commit a4f4678

Browse files
Flying-dragon-boxingmohanchenWHUweiqingzhouESROAMERlanshuyue
authored
Fix: EXX PW occupation number issue in nspin==2 && support for CUDA-Aware MPI marco (#6750)
* correct nspin related behaviors * add efermi passthrough * Revert "add efermi passthrough" This reverts commit d7b402d. * commits to resolve conversations related to codes * DM and EDM pointers in pexsi now handled by diagopexsi, and copying h s matrices no longer needed * add pexsi examples * fix pexsi unit test (original version shouldn't run) * add building docs for pexsi * set cxx standard to c++14, which is required in make_unique * Fix: Fix typo related to pexsi * update to PPEXSIDFTDriver2 * default npoints to 1, so single core pexsi will work * Feature: exx operator for pw basis, single kpt * apply pexsi changes(?) * q-e style exx_div * Correct exxdiv * Fix Compile errors * refactor to abandon `pdiagh` * Fix mu_buffer and nspin * HSE examples * Feature: Multi-K exx * Feature: Multi-K exx * Updates with latest * Remove redundant global vars * Update to v3.9.0 * Update to v3.9.0, now code works * Remove Redundant cal_exx_energy in esolver_ks_pw.cpp * Some mess * Minor Fixes * Fix separate loop and screening * Add EXX stress * EXX Energy??? * Multi-K is broken??? * Fix: Multi-K and stress * Feature: ACE for single-K * Feature: ACE should work for multi-K, but not for sure * Feature: ACE works. Next step is ACE energy. * Fix: adapt to the latest instruction for variable `conv_esolver` * Reconstruct: move exx_helper to hamilt_pwdft * Refactor: in ESolver_KS_PW, calculate deband in iter_finish, not in hamilt2density * Fix: make files in consistent with upstream * Fix: Now EXX PW doesn't depend on LibRI * Fix: Add input constraints for EXX PW * Fix: Remove redundant mpi barrier * Fix: Clean irrelevant files * Fix: Clean irrelevant files * Feature: add ace flag, exit on using gpu * Refactor: Phase 1 for refactoring exx energy * Feature: now ace calculates energy * Feature: enable exx energy * Fix: fix makefile compilation error * Fix: One minor fix for a segmentation fault * Tests: one integrate test for exx pw, only for verifying whether exx pw works * Revert "Tests: one integrate test for exx pw, only for verifying whether exx pw works" This reverts commit e7b606f. * Fix: EXX PW ACE open only when separate_loop is on * add timer * Feature: Double Grid method of EXX PW * Feature: Double Grid method of EXX PW Stress * Fix: Double Grid method of EXX PW Stress * Feature: add double grid variable * Feature: add double grid variable * Fis: HSE stress * Fix: HSE Stress * Fix: Timer * Fix: Timer * For non mp sampling, disable extrapolation * Modify test * Modify mp * Format * Format * Feature: nspin == 2 scf * Fix: nspin == 2 scf * Docs: EXX PW Docs * Feature: EXX PW for nspin=2 * Docs: EXX PW Docs * Docs: EXX PW Docs * Docs: EXX PW Docs, minor fixes * Refactor * Refactor * Refactor * Refactor * Refactor * Refactor: fix unit test * Refactor: fix unit test * Refactor: fix unit test * Refactor: fix unit test * Bump version v3.9.0.7 * Refactor: Remove set kvec funcs in `K_Vectors` * Refactor: Remove final_scf * Refactor: Fix kvecc2d/d2c * Fix: Tests * Fix: Tests * Fix: Tests * Fix: Tests * Refactor: Final? * Fix * Fix * Fix * Fix * GPU EXX PW Support * Fix: Compile Error on CUDA > 12.9 * Fix: Compile Error on CUDA > 12.9 * NVTX3 * F***ing new version * Feature: Support linear combination of coulomb_param for EXX PW * Fix: Fix compile issue * F***ing new version * F***ing new version * F***ing new version * Uploading hybrid gauge tddft (#6369) * hybrid gague * update tests * update * update * update * update * update unit test * fix tests * update tests * fix read_wfc * fix catch_properties.sh * fix restart * update gpu test * update tests * fix * fix input_conv * Improve md calculation stress output in running log (#6366) * Improve md calculation stress output in running log * Module_IO Unittest modify * ModuleMD Unittests modify * modify code comment in fire_test.cpp * maintain setprecision(8) for md stress output * Refactor: Remove redundant Input_para from ESolver Class (#6370) * Refactor: Replace PARAM.inp with inp in ESolver classes for consistency * Refactor: Replace local input parameters with PARAM.inp in ESolver classes for consistency * Refactor: Use PARAM.inp.scf_ene_thr in ESolver_KS_LCAO iter_finish method * Revert "Refactor: Use PARAM.inp.scf_ene_thr in ESolver_KS_LCAO iter_finish method" This reverts commit b1bd0fd. * Revert "Refactor: Replace local input parameters with PARAM.inp in ESolver classes for consistency" This reverts commit f4f81e3. * Fix: Fix memory leak introduced by new gint module (#6375) * fix memory leak * delete copy assignment * refactor Exx_Opt_Orb (#6378) Co-authored-by: linpz <[email protected]> * Add use sw and fix Floating point exception (#6372) * remove float error in sunway * fix ig=0 * add the sw * change the make_dir * unify the gg use * fix compile bug * add init * temporarily remove the sunway define * add the pesduo * fix compile bug * fix bug in the betar * modify the test * Update the output formats of rt-TDDFT (#6381) * update the output formats of rt-TDDFT * update the output formats of rt-TDDFT * fix a bug * update initialized velocities * found some output information is still lacking in MD module * [Refactor] Rename grid to module_grid and genelpa to module_genelpa (#6386) * Rename grid to module_grid * Rename genelpa to module_genelpa * Fix cmake * Update the outputs of geometry relaxation (#6387) * update the output formats of rt-TDDFT * update the output formats of rt-TDDFT * fix a bug * update initialized velocities * found some output information is still lacking in MD module * update output information * remove some global variables in relax_driver * update outputs * update relaxation outputs * update relaxation output messages * update tests of print info * fix a test * fix cg outputs * udpate cg test * update relax tests * update LCAO output stress format * change update_cell.cpp algorithm, when the ion move is larger than the cell length, it is fine to proceed the relaxation calculations * fix tests for unitcells * update cell * Feature: support the output of matrix representation of symm_ops (#6390) * Feature: support output the matrix representation of symmetry operation * Feature: support the output of matrix representation of symm_ops * update the document * Feature: Output real space wavefunction and partial charge density when `device=gpu` (#6391) * Fix GPU output of out_pchg and out_wfc_norm, out_wfc_re_im * GPU integrate test is functional again * Optimize RT-TDDFT dipole output (#6393) * Perf: support GPU version of cal_force_cc with LCAO basis (#6392) * support GPU version of cal_force_cc with LCAO basis * fix a bug * [Refactor] Move module_lr to source_lcao and add a new folder module_external in source_base (#6388) * Move module_lr to source_lcao * Fix test build * Move blas_connector to module_external * Fix header use * Fix internal header use * A fierce battle with Makefile😡 * Move blacs_connector.h to module_external * Move lapack_connector.h and lapack_wrapper.h to module_external * Fix header usage * Move scalapack_connector.h to module_external * Fix a bug for the output information after relaxation (#6395) * update the output formats of rt-TDDFT * update the output formats of rt-TDDFT * fix a bug * update initialized velocities * found some output information is still lacking in MD module * update output information * remove some global variables in relax_driver * update outputs * update relaxation outputs * update relaxation output messages * update tests of print info * fix a test * fix cg outputs * udpate cg test * update relax tests * update LCAO output stress format * change update_cell.cpp algorithm, when the ion move is larger than the cell length, it is fine to proceed the relaxation calculations * fix tests for unitcells * update cell * update some function names, update output A to Angstrom * change eV/A to eV/Angstrom * bump version to 3.9.0.10 (#6397) Co-authored-by: Liang Sun <[email protected]> * Fix: fix exx_gamma_extrapolation error in MPI * Fix: fix exx_gamma_extrapolation error in MPI * Update lapack.cu * Refactor: Use LAPACK interfaces from ATen * Fix: Integrate test * Fix: implement devinfo for potrf * Fix: MPI and Makefile * Fix: get_potential * Fix: ace * Refactor * Refactor * Refactor * Refactor * Refactor * Fix: conv * Revert "Fix: conv" This reverts commit d2da506. * Fix: conv * Fix: conv hard code thr for now * Fix: conv hard code thr for now * Fix: conv hard code thr for now * Fix: conv hard code thr for now * Refactor * Refactor * Refactor * Refactor * Refactor * Mod * Begin EXX KPAR * Begin EXX KPAR * Begin EXX KPAR * Begin EXX KPAR * Begin EXX KPAR * EXX KPAR WORKS * EXX KPAR WORKS Alternative * Fix GPU, but so ugly... * Undo cuda aware mpi * Undo cuda aware mpi * Revert "Undo cuda aware mpi" This reverts commit a8d71b2. * EXX KPAR WORKS on NSPIN=2 * Fix without MPI * Fix header * Separate ecut for exx pw * Documents related * Documents related * EXX KPAR BUG FIX * EXX KPAR BUG FIX * CUDA-Aware MPI * Fix: nspin=2 --------- Co-authored-by: Mohan Chen <[email protected]> Co-authored-by: wqzhou <[email protected]> Co-authored-by: HTZhao <[email protected]> Co-authored-by: lanshuyue <[email protected]> Co-authored-by: Liang Sun <[email protected]> Co-authored-by: dzzz2001 <[email protected]> Co-authored-by: linpeize <[email protected]> Co-authored-by: linpz <[email protected]> Co-authored-by: liiutao <[email protected]> Co-authored-by: Mohan Chen <[email protected]> Co-authored-by: Critsium <[email protected]> Co-authored-by: kirk0830 <[email protected]> Co-authored-by: Taoni Bao <[email protected]> Co-authored-by: Chen Nuo <[email protected]>
1 parent 414f59a commit a4f4678

File tree

3 files changed

+56
-9
lines changed

3 files changed

+56
-9
lines changed

source/source_pw/module_pwdft/operator_pw/exx_pw_ace.cpp

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,27 @@ void OperatorEXXPW<T, Device>::construct_ace() const
182182
// if (iq == 0)
183183
// std::cout << "Bcast psi_mq_real" << std::endl;
184184
#ifdef __MPI
185+
#ifdef __CUDA_MPI
185186
MPI_Bcast(psi_mq_real, wfcpw->nrxx, MPI_DOUBLE_COMPLEX, iq_pool, KP_WORLD);
187+
#else
188+
if (PARAM.inp.device == "cpu")
189+
{
190+
MPI_Bcast(psi_mq_real, wfcpw->nrxx, MPI_DOUBLE_COMPLEX, iq_pool, KP_WORLD);
191+
}
192+
else if (PARAM.inp.device == "gpu")
193+
{
194+
// need to copy to cpu first
195+
T* psi_mq_real_cpu = new T[wfcpw->nrxx];
196+
syncmem_complex_d2c_op()(psi_mq_real_cpu, psi_mq_real, wfcpw->nrxx);
197+
MPI_Bcast(psi_mq_real_cpu, wfcpw->nrxx, MPI_DOUBLE_COMPLEX, iq_pool, KP_WORLD);
198+
syncmem_complex_c2d_op()(psi_mq_real, psi_mq_real_cpu, wfcpw->nrxx);
199+
delete[] psi_mq_real_cpu;
200+
}
201+
else
202+
{
203+
ModuleBase::WARNING_QUIT("OperatorEXXPW", "construct_ace: unknown device");
204+
}
205+
#endif
186206
#endif
187207

188208
} // end of iq
@@ -287,7 +307,7 @@ template <typename T, typename Device>
287307
double OperatorEXXPW<T, Device>::cal_exx_energy_ace(psi::Psi<T, Device>* ppsi_) const
288308
{
289309
double Eexx = 0;
290-
310+
int nspin_fac = PARAM.inp.nspin == 2 ? 2 : 1;
291311
psi::Psi<T, Device> psi_ = *ppsi_;
292312
int* ik_ = const_cast<int*>(&this->ik);
293313
int ik_save = this->ik;

source/source_pw/module_pwdft/operator_pw/exx_pw_pot.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -392,7 +392,7 @@ double exx_divergence(Conv_Coulomb_Pot_K::Coulomb_Type coulomb_type,
392392
double ucell_omega)
393393
{
394394
double exx_div = 0;
395-
395+
// return exx_div;
396396
double nqs_half1 = 0.5 * kv->nmp[0];
397397
double nqs_half2 = 0.5 * kv->nmp[1];
398398
double nqs_half3 = 0.5 * kv->nmp[2];

source/source_pw/module_pwdft/operator_pw/op_exx_pw.cpp

Lines changed: 34 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,9 @@ void OperatorEXXPW<T, Device>::act(const int nbands,
174174
const bool is_first_node) const
175175
{
176176
if (first_iter) return;
177+
// std::cout << cal_exx_energy_ace(&psi) << " EXX energy" << std::endl;
178+
// MPI_Abort(MPI_COMM_WORLD, 0);
179+
// return;
177180

178181
if (is_first_node)
179182
{
@@ -213,6 +216,8 @@ void OperatorEXXPW<T, Device>::act_op(const int nbands,
213216
// for (auto iq: q_points)
214217
// std::cout << iq << ", ";
215218
// std::cout << std::endl;
219+
int nk_fac = PARAM.inp.nspin == 2 ? 2 : 1;
220+
int nk = wfcpw->nks / nk_fac;
216221

217222
// ik fixed here, select band n
218223
for (int n_iband = 0; n_iband < nbands; n_iband++)
@@ -226,7 +231,7 @@ void OperatorEXXPW<T, Device>::act_op(const int nbands,
226231
Real nqs = q_points.size();
227232
for (int iq: q_points)
228233
{
229-
get_exx_potential<Real, Device>(kv, wfcpw, rhopw_dev, pot, tpiba, gamma_extrapolation, ucell->omega, this->ik, iq);
234+
get_exx_potential<Real, Device>(kv, wfcpw, rhopw_dev, pot, tpiba, gamma_extrapolation, ucell->omega, this->ik, iq % nk);
230235
for (int m_iband = 0; m_iband < psi.get_nbands(); m_iband++)
231236
{
232237
// double wg_mqb_real = GlobalC::exx_helper.wg(iq, m_iband);
@@ -259,7 +264,6 @@ void OperatorEXXPW<T, Device>::act_op(const int nbands,
259264
}
260265

261266
T wk_iq = kv->wk[iq];
262-
T wk_ik = kv->wk[this->ik];
263267

264268
T tmp_scalar = wg_mqb / wk_iq / nqs;
265269
axpy_complex_op()(wfcpw->nrxx,
@@ -318,6 +322,10 @@ void OperatorEXXPW<T, Device>::act_op_kpar(const int nbands,
318322
// decide which pool does the iq belong to
319323
int iq_pool = kv->para_k.whichpool[iq];
320324
int iq_loc = iq - kv->para_k.startk_pool[iq_pool];
325+
if (ispin == 1)
326+
{
327+
iq_loc += wfcpw->nks / nspin_fac;
328+
}
321329

322330
for (int m_iband = 0; m_iband < psi.get_nbands(); m_iband++)
323331
{
@@ -339,11 +347,30 @@ void OperatorEXXPW<T, Device>::act_op_kpar(const int nbands,
339347
// send
340348
}
341349
#ifdef __MPI
350+
#ifdef __CUDA_MPI
342351
MPI_Bcast(psi_mq_real, wfcpw->nrxx, MPI_DOUBLE_COMPLEX, iq_pool, KP_WORLD);
352+
#else
353+
if (PARAM.inp.device == "cpu")
354+
{
355+
MPI_Bcast(psi_mq_real, wfcpw->nrxx, MPI_DOUBLE_COMPLEX, iq_pool, KP_WORLD);
356+
}
357+
else if (PARAM.inp.device == "gpu")
358+
{
359+
// need to copy to cpu first
360+
T* psi_mq_real_cpu = new T[wfcpw->nrxx];
361+
syncmem_complex_d2c_op()(psi_mq_real_cpu, psi_mq_real, wfcpw->nrxx);
362+
MPI_Bcast(psi_mq_real_cpu, wfcpw->nrxx, MPI_DOUBLE_COMPLEX, iq_pool, KP_WORLD);
363+
syncmem_complex_c2d_op()(psi_mq_real, psi_mq_real_cpu, wfcpw->nrxx);
364+
delete[] psi_mq_real_cpu;
365+
}
366+
else
367+
{
368+
ModuleBase::WARNING_QUIT("OperatorEXXPW", "construct_ace: unknown device");
369+
}
370+
#endif
343371
#endif
344372
for (int n_iband = 0; n_iband < nbands; n_iband++)
345373
{
346-
double wg_nkb = (*wg)(this->ik, n_iband);
347374
const T* psi_nk = tmpsi_in + n_iband * nbasis;
348375
// retrieve \psi_nk in real space
349376
wfcpw->recip_to_real(ctx, psi_nk, psi_nk_real, this->ik);
@@ -369,9 +396,8 @@ void OperatorEXXPW<T, Device>::act_op_kpar(const int nbands,
369396

370397
Real wk_iq = kv->wk[iq];
371398
Real wk_ik = kv->wk[this->ik];
372-
// std::cout << "wk_iq: " << wk_iq << " wk_ik: " << wk_ik << std::endl;
373399

374-
Real tmp_scalar = wg_mqb / wk_ik / nqs;
400+
Real tmp_scalar = wg_mqb / wk_ik / nqs; // wk_ik works for now, but wrong for symmetry.
375401

376402
T* h_psi_nk = tmhpsi + n_iband * nbasis;
377403
Real hybrid_alpha = GlobalC::exx_info.info_global.hybrid_alpha;
@@ -569,7 +595,8 @@ double OperatorEXXPW<T, Device>::cal_exx_energy_op(psi::Psi<T, Device> *ppsi_) c
569595

570596
for (int iq: q_points)
571597
{
572-
get_exx_potential<Real, Device>(kv, wfcpw, rhopw_dev, pot, tpiba, gamma_extrapolation, ucell->omega, ik, iq);
598+
int nk = wfcpw->nks / nk_fac;
599+
get_exx_potential<Real, Device>(kv, wfcpw, rhopw_dev, pot, tpiba, gamma_extrapolation, ucell->omega, ik, iq % nk);
573600
for (int m_iband = 0; m_iband < psi.get_nbands(); m_iband++)
574601
{
575602
// double wg_f = GlobalC::exx_helper.wg(iq, m_iband);
@@ -589,7 +616,7 @@ double OperatorEXXPW<T, Device>::cal_exx_energy_op(psi::Psi<T, Device> *ppsi_) c
589616

590617
int nks = wfcpw->nks;
591618
int npw = rhopw_dev->npw;
592-
int nk = nks / nk_fac;
619+
// int nk = nks / nk_fac;
593620
Eexx_ik_real += exx_cal_energy_op<T, Device>()(density_recip, pot, wg_iqb_real / nqs * wg_ikb_real / kv->wk[ik], npw);
594621

595622
} // m_iband

0 commit comments

Comments
 (0)