Skip to content

Commit ff59418

Browse files
Flying-dragon-boxingmohanchenWHUweiqingzhouESROAMERlanshuyue
authored andcommitted
Feature: EXX PW supports k-point parallelism (deepmodeling#6648)
* Very good unit test, making my laptop fan spin * Change default pexsi_npole from 80 to 40 * Place pexsi_EDM in DensityMatrix, set size of pexsi_dm = 1 when GlobalV::NSPIN==4, and add comments for dmToRho * An unit test added for DiagoPexsi * modify for changed gint interface * correct nspin related behaviors * add efermi passthrough * Revert "add efermi passthrough" This reverts commit d7b402d. * commits to resolve conversations related to codes * DM and EDM pointers in pexsi now handled by diagopexsi, and copying h s matrices no longer needed * add pexsi examples * fix pexsi unit test (original version shouldn't run) * add building docs for pexsi * set cxx standard to c++14, which is required in make_unique * Fix: Fix typo related to pexsi * update to PPEXSIDFTDriver2 * default npoints to 1, so single core pexsi will work * Feature: exx operator for pw basis, single kpt * apply pexsi changes(?) * q-e style exx_div * Correct exxdiv * Fix Compile errors * refactor to abandon `pdiagh` * Fix mu_buffer and nspin * HSE examples * Feature: Multi-K exx * Feature: Multi-K exx * Updates with latest * Remove redundant global vars * Update to v3.9.0 * Update to v3.9.0, now code works * Remove Redundant cal_exx_energy in esolver_ks_pw.cpp * Some mess * Minor Fixes * Fix separate loop and screening * Add EXX stress * EXX Energy??? * Multi-K is broken??? * Fix: Multi-K and stress * Feature: ACE for single-K * Feature: ACE should work for multi-K, but not for sure * Feature: ACE works. Next step is ACE energy. * Fix: adapt to the latest instruction for variable `conv_esolver` * Reconstruct: move exx_helper to hamilt_pwdft * Refactor: in ESolver_KS_PW, calculate deband in iter_finish, not in hamilt2density * Fix: make files in consistent with upstream * Fix: Now EXX PW doesn't depend on LibRI * Fix: Add input constraints for EXX PW * Fix: Remove redundant mpi barrier * Fix: Clean irrelevant files * Fix: Clean irrelevant files * Feature: add ace flag, exit on using gpu * Refactor: Phase 1 for refactoring exx energy * Feature: now ace calculates energy * Feature: enable exx energy * Fix: fix makefile compilation error * Fix: One minor fix for a segmentation fault * Tests: one integrate test for exx pw, only for verifying whether exx pw works * Revert "Tests: one integrate test for exx pw, only for verifying whether exx pw works" This reverts commit e7b606f. * Fix: EXX PW ACE open only when separate_loop is on * add timer * Feature: Double Grid method of EXX PW * Feature: Double Grid method of EXX PW Stress * Fix: Double Grid method of EXX PW Stress * Feature: add double grid variable * Feature: add double grid variable * Fis: HSE stress * Fix: HSE Stress * Fix: Timer * Fix: Timer * For non mp sampling, disable extrapolation * Modify test * Modify mp * Format * Format * Feature: nspin == 2 scf * Fix: nspin == 2 scf * Docs: EXX PW Docs * Feature: EXX PW for nspin=2 * Docs: EXX PW Docs * Docs: EXX PW Docs * Docs: EXX PW Docs, minor fixes * Refactor * Refactor * Refactor * Refactor * Refactor * Refactor: fix unit test * Refactor: fix unit test * Refactor: fix unit test * Refactor: fix unit test * Bump version v3.9.0.7 * Refactor: Remove set kvec funcs in `K_Vectors` * Refactor: Remove final_scf * Refactor: Fix kvecc2d/d2c * Fix: Tests * Fix: Tests * Fix: Tests * Fix: Tests * Refactor: Final? * Fix * Fix * Fix * Fix * GPU EXX PW Support * Fix: Compile Error on CUDA > 12.9 * Fix: Compile Error on CUDA > 12.9 * NVTX3 * F***ing new version * Feature: Support linear combination of coulomb_param for EXX PW * Fix: Fix compile issue * F***ing new version * F***ing new version * F***ing new version * Uploading hybrid gauge tddft (deepmodeling#6369) * hybrid gague * update tests * update * update * update * update * update unit test * fix tests * update tests * fix read_wfc * fix catch_properties.sh * fix restart * update gpu test * update tests * fix * fix input_conv * Improve md calculation stress output in running log (deepmodeling#6366) * Improve md calculation stress output in running log * Module_IO Unittest modify * ModuleMD Unittests modify * modify code comment in fire_test.cpp * maintain setprecision(8) for md stress output * Refactor: Remove redundant Input_para from ESolver Class (deepmodeling#6370) * Refactor: Replace PARAM.inp with inp in ESolver classes for consistency * Refactor: Replace local input parameters with PARAM.inp in ESolver classes for consistency * Refactor: Use PARAM.inp.scf_ene_thr in ESolver_KS_LCAO iter_finish method * Revert "Refactor: Use PARAM.inp.scf_ene_thr in ESolver_KS_LCAO iter_finish method" This reverts commit b1bd0fd. * Revert "Refactor: Replace local input parameters with PARAM.inp in ESolver classes for consistency" This reverts commit f4f81e3. * Fix: Fix memory leak introduced by new gint module (deepmodeling#6375) * fix memory leak * delete copy assignment * refactor Exx_Opt_Orb (deepmodeling#6378) Co-authored-by: linpz <[email protected]> * Add use sw and fix Floating point exception (deepmodeling#6372) * remove float error in sunway * fix ig=0 * add the sw * change the make_dir * unify the gg use * fix compile bug * add init * temporarily remove the sunway define * add the pesduo * fix compile bug * fix bug in the betar * modify the test * Update the output formats of rt-TDDFT (deepmodeling#6381) * update the output formats of rt-TDDFT * update the output formats of rt-TDDFT * fix a bug * update initialized velocities * found some output information is still lacking in MD module * [Refactor] Rename grid to module_grid and genelpa to module_genelpa (deepmodeling#6386) * Rename grid to module_grid * Rename genelpa to module_genelpa * Fix cmake * Update the outputs of geometry relaxation (deepmodeling#6387) * update the output formats of rt-TDDFT * update the output formats of rt-TDDFT * fix a bug * update initialized velocities * found some output information is still lacking in MD module * update output information * remove some global variables in relax_driver * update outputs * update relaxation outputs * update relaxation output messages * update tests of print info * fix a test * fix cg outputs * udpate cg test * update relax tests * update LCAO output stress format * change update_cell.cpp algorithm, when the ion move is larger than the cell length, it is fine to proceed the relaxation calculations * fix tests for unitcells * update cell * Feature: support the output of matrix representation of symm_ops (deepmodeling#6390) * Feature: support output the matrix representation of symmetry operation * Feature: support the output of matrix representation of symm_ops * update the document * Feature: Output real space wavefunction and partial charge density when `device=gpu` (deepmodeling#6391) * Fix GPU output of out_pchg and out_wfc_norm, out_wfc_re_im * GPU integrate test is functional again * Optimize RT-TDDFT dipole output (deepmodeling#6393) * Perf: support GPU version of cal_force_cc with LCAO basis (deepmodeling#6392) * support GPU version of cal_force_cc with LCAO basis * fix a bug * [Refactor] Move module_lr to source_lcao and add a new folder module_external in source_base (deepmodeling#6388) * Move module_lr to source_lcao * Fix test build * Move blas_connector to module_external * Fix header use * Fix internal header use * A fierce battle with Makefile😡 * Move blacs_connector.h to module_external * Move lapack_connector.h and lapack_wrapper.h to module_external * Fix header usage * Move scalapack_connector.h to module_external * Fix a bug for the output information after relaxation (deepmodeling#6395) * update the output formats of rt-TDDFT * update the output formats of rt-TDDFT * fix a bug * update initialized velocities * found some output information is still lacking in MD module * update output information * remove some global variables in relax_driver * update outputs * update relaxation outputs * update relaxation output messages * update tests of print info * fix a test * fix cg outputs * udpate cg test * update relax tests * update LCAO output stress format * change update_cell.cpp algorithm, when the ion move is larger than the cell length, it is fine to proceed the relaxation calculations * fix tests for unitcells * update cell * update some function names, update output A to Angstrom * change eV/A to eV/Angstrom * bump version to 3.9.0.10 (deepmodeling#6397) Co-authored-by: Liang Sun <[email protected]> * Fix: fix exx_gamma_extrapolation error in MPI * Fix: fix exx_gamma_extrapolation error in MPI * Update lapack.cu * Refactor: Use LAPACK interfaces from ATen * Fix: Integrate test * Fix: implement devinfo for potrf * Fix: MPI and Makefile * Fix: get_potential * Fix: ace * Refactor * Refactor * Refactor * Refactor * Refactor * Fix: conv * Revert "Fix: conv" This reverts commit d2da506. * Fix: conv * Fix: conv hard code thr for now * Fix: conv hard code thr for now * Fix: conv hard code thr for now * Fix: conv hard code thr for now * Refactor * Refactor * Refactor * Refactor * Refactor * Mod * Begin EXX KPAR * Begin EXX KPAR * Begin EXX KPAR * Begin EXX KPAR * Begin EXX KPAR * EXX KPAR WORKS * EXX KPAR WORKS Alternative * Fix GPU, but so ugly... * Undo cuda aware mpi * Undo cuda aware mpi * Revert "Undo cuda aware mpi" This reverts commit a8d71b2. * EXX KPAR WORKS on NSPIN=2 * Fix without MPI * Fix header --------- Co-authored-by: Mohan Chen <[email protected]> Co-authored-by: wqzhou <[email protected]> Co-authored-by: HTZhao <[email protected]> Co-authored-by: lanshuyue <[email protected]> Co-authored-by: Liang Sun <[email protected]> Co-authored-by: dzzz2001 <[email protected]> Co-authored-by: linpeize <[email protected]> Co-authored-by: linpz <[email protected]> Co-authored-by: liiutao <[email protected]> Co-authored-by: Mohan Chen <[email protected]> Co-authored-by: Critsium <[email protected]> Co-authored-by: kirk0830 <[email protected]> Co-authored-by: Taoni Bao <[email protected]> Co-authored-by: Chen Nuo <[email protected]>
1 parent 4d57ca0 commit ff59418

File tree

4 files changed

+332
-123
lines changed

4 files changed

+332
-123
lines changed

source/source_pw/module_pwdft/operator_pw/exx_pw_ace.cpp

Lines changed: 159 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
#include "op_exx_pw.h"
2+
#include "source_base/parallel_comm.h"
3+
#include "source_io/module_parameter/parameter.h"
24

35
namespace hamilt
46
{
@@ -60,10 +62,6 @@ void OperatorEXXPW<T, Device>::act_op_ace(const int nbands,
6062
nbasis
6163
);
6264

63-
64-
// // negative sign, add to hpsi
65-
// vec_add_vec_complex_op()(this->ctx, nbands * nbasis, tmhpsi, hpsi, -1, tmhpsi, 1);
66-
// delmem_complex_op()(hpsi);
6765
delmem_complex_op()(Xi_psi);
6866
ModuleBase::timer::tick("OperatorEXXPW", "act_op_ace");
6967

@@ -72,13 +70,12 @@ void OperatorEXXPW<T, Device>::act_op_ace(const int nbands,
7270
template <typename T, typename Device>
7371
void OperatorEXXPW<T, Device>::construct_ace() const
7472
{
75-
// int nkb = p_exx_helper->psi.get_nbands() * p_exx_helper->psi.get_nk();
7673
int nbands = psi.get_nbands();
7774
int nbasis = psi.get_nbasis();
7875
int nk = psi.get_nk();
7976

77+
int* ik_ = const_cast<int*>(&this->ik);
8078
int ik_save = this->ik;
81-
int * ik_ = const_cast<int*>(&this->ik);
8279

8380
T intermediate_one = 1.0, intermediate_zero = 0.0;
8481

@@ -116,93 +113,167 @@ void OperatorEXXPW<T, Device>::construct_ace() const
116113
if (first_iter) return;
117114
ModuleBase::timer::tick("OperatorEXXPW", "construct_ace");
118115

119-
for (int ik = 0; ik < nk; ik++)
116+
int nk_max = kv->para_k.get_max_nks_pool();
117+
int nspin_fac = PARAM.inp.nspin == 2 ? 2 : 1;
118+
for (int ispin = 0; ispin < nspin_fac; ispin++)
120119
{
121-
int npwk = wfcpw->npwk[ik];
122-
123-
T* Xi_ace = Xi_ace_k[ik];
124-
psi.fix_kb(ik, 0);
125-
T* p_psi = psi.get_pointer();
126-
127-
setmem_complex_op()(h_psi_ace, 0, nbands * nbasis);
120+
for (int ik0 = 0; ik0 < nk_max; ik0++)
121+
{
122+
int ik = ik0 + ispin * nk_max;
123+
int npwk = wfcpw->npwk[ik];
124+
125+
T* Xi_ace = Xi_ace_k[ik];
126+
psi.fix_kb(ik, 0);
127+
T* p_psi = psi.get_pointer();
128+
129+
setmem_complex_op()(h_psi_ace, 0, nbands * nbasis);
130+
131+
setmem_complex_op()(h_psi_recip, 0, wfcpw->npwk_max);
132+
setmem_complex_op()(h_psi_real, 0, rhopw_dev->nrxx);
133+
setmem_complex_op()(density_real, 0, rhopw_dev->nrxx);
134+
setmem_complex_op()(density_recip, 0, rhopw_dev->npw);
135+
setmem_complex_op()(psi_nk_real, 0, wfcpw->nrxx);
136+
setmem_complex_op()(psi_mq_real, 0, wfcpw->nrxx);
137+
int nqs = kv->get_nkstot_full();
138+
139+
bool skip_ik = false;
140+
if (ik >= wfcpw->nks)
141+
{
142+
skip_ik = true;
143+
}
144+
if (skip_ik)
145+
{
146+
// ik fixed here, select band n
147+
for (int iq0 = 0; iq0 < nqs; iq0++)
148+
{
149+
int iq = iq0 + ik;
150+
// for \psi_nk, get the pw of iq and band m
151+
get_exx_potential<Real, Device>(kv, wfcpw, rhopw_dev, pot, tpiba, gamma_extrapolation, ucell->omega, ik, iq);
152+
153+
// decide which pool does the iq belong to
154+
int iq_pool = kv->para_k.whichpool[iq0];
155+
int iq_loc = iq - kv->para_k.startk_pool[iq_pool];
156+
157+
for (int m_iband = 0; m_iband < psi.get_nbands(); m_iband++)
158+
{
159+
double wg_mqb = 0;
160+
bool skip = false;
161+
if (iq_pool == GlobalV::MY_POOL)
162+
{
163+
wg_mqb = (*wg)(iq_loc, m_iband);
164+
}
165+
#ifdef __MPI
166+
MPI_Bcast(&wg_mqb, 1, MPI_DOUBLE, kv->para_k.get_startpro_pool(iq_pool), MPI_COMM_WORLD);
167+
#endif
168+
if (wg_mqb < 1e-12)
169+
continue;
170+
171+
if (iq_pool == GlobalV::MY_POOL)
172+
{
173+
const T* psi_mq = get_pw(m_iband, iq_loc);
174+
wfcpw->recip_to_real(ctx, psi_mq, psi_mq_real, iq_loc);
175+
// send
176+
}
177+
// if (iq == 0)
178+
// std::cout << "Bcast psi_mq_real" << std::endl;
179+
#ifdef __MPI
180+
MPI_Bcast(psi_mq_real, wfcpw->nrxx, MPI_DOUBLE_COMPLEX, iq_pool, KP_WORLD);
181+
#endif
128182

129-
*ik_ = ik;
130-
131-
act_op(
132-
nbands,
133-
nbasis,
134-
1,
135-
p_psi,
136-
h_psi_ace,
137-
nbasis,
138-
false
139-
);
140-
141-
// psi_h_psi_ace = psi^\dagger * h_psi_ace
142-
// p_exx_helper->psi.fix_kb(0, 0);
143-
gemm_complex_op()('C',
144-
'N',
145-
nbands,
146-
nbands,
147-
npwk,
148-
&intermediate_one,
149-
p_psi,
150-
nbasis,
151-
h_psi_ace,
152-
nbasis,
153-
&intermediate_zero,
154-
psi_h_psi_ace,
155-
nbands);
156-
157-
// reduction of psi_h_psi_ace, due to distributed memory
158-
Parallel_Reduce::reduce_pool(psi_h_psi_ace, nbands * nbands);
159-
160-
T intermediate_minus_one = -1.0;
161-
axpy_complex_op()(nbands * nbands,
162-
&intermediate_minus_one,
163-
psi_h_psi_ace,
164-
1,
165-
L_ace,
166-
1);
167-
168-
169-
int info = 0;
170-
char up = 'U', lo = 'L';
171-
172-
lapack_potrf()(lo, nbands, L_ace, nbands);
173-
174-
// expand for-loop
175-
for (int i = 0; i < nbands; ++i) {
176-
setmem_complex_op()(L_ace + i * nbands, 0, i);
183+
} // end of iq
184+
185+
}
186+
}
187+
else
188+
{
189+
*ik_ = ik;
190+
act_op_kpar(nbands, nbasis, 1, p_psi, h_psi_ace, nbasis, false);
191+
// psi_h_psi_ace = psi^\dagger * h_psi_ace
192+
// p_exx_helper->psi.fix_kb(0, 0);
193+
gemm_complex_op()('C',
194+
'N',
195+
nbands,
196+
nbands,
197+
npwk,
198+
&intermediate_one,
199+
p_psi,
200+
nbasis,
201+
h_psi_ace,
202+
nbasis,
203+
&intermediate_zero,
204+
psi_h_psi_ace,
205+
nbands);
206+
207+
// reduction of psi_h_psi_ace, due to distributed memory
208+
Parallel_Reduce::reduce_pool(psi_h_psi_ace, nbands * nbands);
209+
210+
T intermediate_minus_one = -1.0;
211+
axpy_complex_op()(nbands * nbands,
212+
&intermediate_minus_one,
213+
psi_h_psi_ace,
214+
1,
215+
L_ace,
216+
1);
217+
218+
219+
int info = 0;
220+
char up = 'U', lo = 'L';
221+
222+
// for (int i = 0; i < nbands; ++i)
223+
// {
224+
// for (int j = 0; j < nbands; ++j)
225+
// {
226+
// // std::cout << L_ace[i * nbands + j]. << " ";
227+
// if (L_ace[i * nbands + j].imag() >= 0.0)
228+
// {
229+
// std::cout << L_ace[i * nbands + j].real() << "+" << L_ace[i * nbands + j].imag() << "im ";
230+
// }
231+
// else
232+
// {
233+
// std::cout << L_ace[i * nbands + j].real() << L_ace[i * nbands + j].imag() << "im ";
234+
// }
235+
// }
236+
// std::cout << ";" << std::endl;
237+
// }
238+
// MPI_Barrier(MPI_COMM_WORLD);
239+
// MPI_Abort(MPI_COMM_WORLD, 0);
240+
241+
lapack_potrf()(lo, nbands, L_ace, nbands);
242+
243+
// expand for-loop
244+
for (int i = 0; i < nbands; ++i) {
245+
setmem_complex_op()(L_ace + i * nbands, 0, i);
246+
}
247+
248+
// L_ace inv in place
249+
char non = 'N';
250+
lapack_trtri()(lo, non, nbands, L_ace, nbands);
251+
252+
// Xi_ace = L_ace^-1 * h_psi_ace^dagger
253+
gemm_complex_op()('N',
254+
'C',
255+
nbands,
256+
npwk,
257+
nbands,
258+
&intermediate_one,
259+
L_ace,
260+
nbands,
261+
h_psi_ace,
262+
nbasis,
263+
&intermediate_zero,
264+
Xi_ace,
265+
nbands);
266+
267+
// clear mem
268+
setmem_complex_op()(h_psi_ace, 0, nbands * nbasis);
269+
setmem_complex_op()(psi_h_psi_ace, 0, nbands * nbands);
270+
setmem_complex_op()(L_ace, 0, nbands * nbands);
271+
}
177272
}
178-
179-
// L_ace inv in place
180-
char non = 'N';
181-
lapack_trtri()(lo, non, nbands, L_ace, nbands);
182-
183-
// Xi_ace = L_ace^-1 * h_psi_ace^dagger
184-
gemm_complex_op()('N',
185-
'C',
186-
nbands,
187-
npwk,
188-
nbands,
189-
&intermediate_one,
190-
L_ace,
191-
nbands,
192-
h_psi_ace,
193-
nbasis,
194-
&intermediate_zero,
195-
Xi_ace,
196-
nbands);
197-
198-
// clear mem
199-
setmem_complex_op()(h_psi_ace, 0, nbands * nbasis);
200-
setmem_complex_op()(psi_h_psi_ace, 0, nbands * nbands);
201-
setmem_complex_op()(L_ace, 0, nbands * nbands);
202-
203273
}
204274

205275
*ik_ = ik_save;
276+
206277
ModuleBase::timer::tick("OperatorEXXPW", "construct_ace");
207278

208279
}
@@ -234,7 +305,7 @@ double OperatorEXXPW<T, Device>::cal_exx_energy_ace(psi::Psi<T, Device>* ppsi_)
234305
}
235306
}
236307

237-
Parallel_Reduce::reduce_pool(Eexx);
308+
Parallel_Reduce::reduce_all(Eexx);
238309
*ik_ = ik_save;
239310
return Eexx;
240311
}

source/source_pw/module_pwdft/operator_pw/exx_pw_pot.cpp

Lines changed: 28 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,20 @@ void get_exx_potential(const K_Vectors* kv,
3030
// fill zero
3131
setmem_real_cpu_op()(pot_cpu, 0, npw);
3232

33+
std::vector<ModuleBase::Vector3<double>> qvec_c, qvec_d;
34+
#ifdef __MPI
35+
kv->para_k.gatherkvec(kv->kvec_c, qvec_c);
36+
kv->para_k.gatherkvec(kv->kvec_d, qvec_d);
37+
#else
38+
qvec_c = kv->kvec_c;
39+
qvec_d = kv->kvec_d;
40+
#endif
41+
42+
if (ik > nks)
43+
{
44+
return;
45+
}
46+
3347
// calculate Fock pot
3448
auto param_fock = GlobalC::exx_info.info_global.coulomb_param[Conv_Coulomb_Pot_K::Coulomb_Type::Fock];
3549
for (int i = 0; i < param_fock.size(); i++)
@@ -39,8 +53,8 @@ void get_exx_potential(const K_Vectors* kv,
3953
double alpha = std::stod(param["alpha"]);
4054
const ModuleBase::Vector3<double> k_c = wfcpw->kvec_c[ik];
4155
const ModuleBase::Vector3<double> k_d = wfcpw->kvec_d[ik];
42-
const ModuleBase::Vector3<double> q_c = wfcpw->kvec_c[iq];
43-
const ModuleBase::Vector3<double> q_d = wfcpw->kvec_d[iq];
56+
const ModuleBase::Vector3<double> q_c = qvec_c[iq];
57+
const ModuleBase::Vector3<double> q_d = qvec_d[iq];
4458

4559
#ifdef _OPENMP
4660
#pragma omp parallel for schedule(static)
@@ -109,8 +123,8 @@ void get_exx_potential(const K_Vectors* kv,
109123
ucell_omega);
110124
const ModuleBase::Vector3<double> k_c = wfcpw->kvec_c[ik];
111125
const ModuleBase::Vector3<double> k_d = wfcpw->kvec_d[ik];
112-
const ModuleBase::Vector3<double> q_c = wfcpw->kvec_c[iq];
113-
const ModuleBase::Vector3<double> q_d = wfcpw->kvec_d[iq];
126+
const ModuleBase::Vector3<double> q_c = qvec_c[iq];
127+
const ModuleBase::Vector3<double> q_d = qvec_d[iq];
114128

115129
#ifdef _OPENMP
116130
#pragma omp parallel for schedule(static)
@@ -146,6 +160,10 @@ void get_exx_potential(const K_Vectors* kv,
146160
// const int ig_kq = ik * nks * npw + iq * npw + ig;
147161

148162
Real gg = (k_c - q_c + rhopw_dev->gcar[ig]).norm2() * tpiba2;
163+
// if (ig == 0 && GlobalV::MY_RANK==1)
164+
// {
165+
// printf("k-q+G: %f %f %f\n", (k_c - q_c + rhopw_dev->gcar[ig])[0], (k_c - q_c + rhopw_dev->gcar[ig])[1], (k_c - q_c + rhopw_dev->gcar[ig])[2]);
166+
// }
149167
// if (kqgcar2 > 1e-12) // vasp uses 1/40 of the smallest (k spacing)**2
150168
if (gg >= 1e-8)
151169
{
@@ -388,7 +406,7 @@ double exx_divergence(Conv_Coulomb_Pot_K::Coulomb_Type coulomb_type,
388406

389407
// this is the \sum_q F(q) part
390408
// temporarily for all k points, should be replaced to q points later
391-
for (int ik = 0; ik < wfcpw->nks; ik++)
409+
for (int ik = 0; ik < wfcpw->nks / nk_fac; ik++)
392410
{
393411
const ModuleBase::Vector3<double> k_c = wfcpw->kvec_c[ik];
394412
const ModuleBase::Vector3<double> k_d = wfcpw->kvec_d[ik];
@@ -437,7 +455,7 @@ double exx_divergence(Conv_Coulomb_Pot_K::Coulomb_Type coulomb_type,
437455
}
438456
}
439457

440-
Parallel_Reduce::reduce_pool(div);
458+
Parallel_Reduce::reduce_all(div);
441459
// std::cout << "EXX div: " << div << std::endl;
442460

443461
// if (PARAM.inp.dft_functional == "hse")
@@ -454,8 +472,8 @@ double exx_divergence(Conv_Coulomb_Pot_K::Coulomb_Type coulomb_type,
454472
}
455473
}
456474

457-
div *= ModuleBase::e2 * ModuleBase::FOUR_PI / tpiba2 / wfcpw->nks;
458-
// std::cout << "div: " << div << std::endl;
475+
div *= ModuleBase::e2 * ModuleBase::FOUR_PI / tpiba2 / kv->get_nkstot_full();
476+
// std::cout << "div: " << div << std::endl;
459477

460478
// numerically value the mean value of F(q) in the reciprocal space
461479
// This means we need to calculate the average of F(q) in the first brillouin zone
@@ -481,9 +499,9 @@ double exx_divergence(Conv_Coulomb_Pot_K::Coulomb_Type coulomb_type,
481499
aa += 1.0 / std::sqrt(alpha * ModuleBase::PI);
482500

483501
div -= ModuleBase::e2 * ucell_omega * aa;
484-
exx_div = div * wfcpw->nks / nk_fac;
502+
exx_div = div * kv->get_nkstot_full();
485503
// exx_div = 0;
486-
// std::cout << "EXX divergence: " << exx_div << std::endl;
504+
// std::cout << "EXX divergence: " << exx_div << std::endl;
487505

488506
return exx_div;
489507
}

0 commit comments

Comments
 (0)