Skip to content

Commit 50fe6d7

Browse files
committed
merge the latest abacus
2 parents 7061eb7 + 27075c1 commit 50fe6d7

File tree

36 files changed

+2057
-390
lines changed

36 files changed

+2057
-390
lines changed

docs/advanced/input_files/input-main.md

Lines changed: 14 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -243,8 +243,8 @@
243243
- [exx\_opt\_orb\_ecut](#exx_opt_orb_ecut)
244244
- [exx\_opt\_orb\_tolerence](#exx_opt_orb_tolerence)
245245
- [exx\_real\_number](#exx_real_number)
246-
- [exx\_symmetry\_realspace](#exx_symmetry_realspace)
247246
- [rpa\_ccp\_rmesh\_times](#rpa_ccp_rmesh_times)
247+
- [exx\_symmetry\_realspace](#exx_symmetry_realspace)
248248
- [out\_ri\_cv](#out_ri_cv)
249249
- [Molecular dynamics](#molecular-dynamics)
250250
- [md\_type](#md_type)
@@ -273,6 +273,9 @@
273273
- [lj\_epsilon](#lj_epsilon)
274274
- [lj\_sigma](#lj_sigma)
275275
- [pot\_file](#pot_file)
276+
- [dp\_rescaling](#dp_rescaling)
277+
- [dp\_fparam](#dp_fparam)
278+
- [dp\_aparam](#dp_aparam)
276279
- [msst\_direction](#msst_direction)
277280
- [msst\_vel](#msst_vel)
278281
- [msst\_vis](#msst_vis)
@@ -422,11 +425,12 @@
422425
- [nocc](#nocc)
423426
- [nvirt](#nvirt)
424427
- [lr\_nstates](#lr_nstates)
428+
- [lr\_unrestricted](#lr_unrestricted)
425429
- [abs\_wavelen\_range](#abs_wavelen_range)
426430
- [out\_wfc\_lr](#out_wfc_lr)
427431
- [abs\_broadening](#abs_broadening)
428432
- [ri\_hartree\_benchmark](#ri_hartree_benchmark)
429-
- [aims_nbasis](#aims_nbasis)
433+
- [aims\_nbasis](#aims_nbasis)
430434
- [Reduced Density Matrix Functional Theory](#Reduced-Density-Matrix-Functional-Theory)
431435
- [rdmft](#rdmft)
432436
- [rdmft\_power\_alpha](#rdmft_power_alpha)
@@ -2911,46 +2915,38 @@ These variables are used to control vdW-corrected related parameters.
29112915
- **Type**: String
29122916
- **Description**: Specifies the method used for Van der Waals (VdW) correction. Available options are:
29132917
- `d2`: [Grimme's D2](https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.20495) dispersion correction method
2914-
- `d3_0`: [Grimme's DFT-D3(0)](https://aip.scitation.org/doi/10.1063/1.3382344) dispersion correction method
2915-
- `d3_bj`: [Grimme's DFTD3(BJ)](https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.21759) dispersion correction method
2918+
- `d3_0`: [Grimme's DFT-D3(0)](https://aip.scitation.org/doi/10.1063/1.3382344) dispersion correction method (zero-damping)
2919+
- `d3_bj`: [Grimme's DFTD3(BJ)](https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.21759) dispersion correction method (BJ-damping)
29162920
- `none`: no vdW correction
29172921
- **Default**: none
2922+
- **Note**: ABACUS supports automatic setting on DFT-D3 parameters for common functionals after version 3.8.3 (and several develop versions earlier). To benefit from this feature, please specify the parameter `dft_functional` explicitly (for more details on this parameter, please see [dft_functional](#dft_functional)), otherwise the autoset procedure will crash with error message like `cannot find DFT-D3 parameter for XC(***)`. If not satisfied with those in-built parameters, any manually setting on `vdw_s6`, `vdw_s8`, `vdw_a1` and `vdw_a2` will overwrite.
2923+
- **Special**: There are special cases for functional family wB97 (Omega-B97): if want to use the functional wB97X-D3BJ, one needs to specify the `dft_functional` as `HYB_GGA_WB97X_V` and `vdw_method` as `d3_bj`. If want to use the functional wB97X-D3, specify `dft_functional` as `HYB_GGA_WB97X_D3` and `vdw_method` as `d3_0`.
29182924

29192925
### vdw_s6
29202926

29212927
- **Type**: Real
29222928
- **Availability**: `vdw_method` is set to `d2`, `d3_0`, or `d3_bj`
2923-
- **Description**: This scale factor is used to optimize the interaction energy deviations in van der Waals (vdW) corrected calculations. The recommended values of this parameter are dependent on the chosen vdW correction method and the DFT functional being used. For DFT-D2, the recommended values are 0.75 (PBE), 1.2 (BLYP), 1.05 (B-P86), 1.0 (TPSS), and 1.05 (B3LYP). For DFT-D3, recommended values with different DFT functionals can be found on the [here](https://www.chemiebn.uni-bonn.de/pctc/mulliken-center/software/dft-d3/dft-d3). The default value of this parameter in ABACUS is set to be the recommended value for PBE.
2929+
- **Description**: This scale factor is used to optimize the interaction energy deviations in van der Waals (vdW) corrected calculations. The recommended values of this parameter are dependent on the chosen vdW correction method and the DFT functional being used. For DFT-D2, the recommended values are 0.75 (PBE), 1.2 (BLYP), 1.05 (B-P86), 1.0 (TPSS), and 1.05 (B3LYP). If not set, will use values of PBE functional. For DFT-D3, recommended values with different DFT functionals can be found on the [here](https://github.com/dftd3/simple-dftd3/blob/main/assets/parameters.toml). If not set, will search in ABACUS built-in dataset based on the `dft_functional` keywords. User set value will overwrite the searched value.
29242930
- **Default**:
29252931
- 0.75: if `vdw_method` is set to `d2`
2926-
- 1.0: if `vdw_method` is set to `d3_0` or `d3_bj`
29272932

29282933
### vdw_s8
29292934

29302935
- **Type**: Real
29312936
- **Availability**: `vdw_method` is set to `d3_0` or `d3_bj`
2932-
- **Description**: This scale factor is relevant for D3(0) and D3(BJ) van der Waals (vdW) correction methods. The recommended values of this parameter with different DFT functionals can be found on the [webpage](https://www.chemiebn.uni-bonn.de/pctc/mulliken-center/software/dft-d3/dft-d3). The default value of this parameter in ABACUS is set to be the recommended value for PBE.
2933-
- **Default**:
2934-
- 0.722: if `vdw_method` is set to `d3_0`
2935-
- 0.7875: if `vdw_method` is set to `d3_bj`
2937+
- **Description**: This scale factor is relevant for D3(0) and D3(BJ) van der Waals (vdW) correction methods. The recommended values of this parameter with different DFT functionals can be found on the [webpage](https://github.com/dftd3/simple-dftd3/blob/main/assets/parameters.toml). If not set, will search in ABACUS built-in dataset based on the `dft_functional` keywords. User set value will overwrite the searched value.
29362938

29372939
### vdw_a1
29382940

29392941
- **Type**: Real
29402942
- **Availability**: `vdw_method` is set to `d3_0` or `d3_bj`
2941-
- **Description**: This damping function parameter is relevant for D3(0) and D3(BJ) van der Waals (vdW) correction methods. The recommended values of this parameter with different DFT functionals can be found on the [webpage](https://www.chemiebn.uni-bonn.de/pctc/mulliken-center/software/dft-d3/dft-d3). The default value of this parameter in ABACUS is set to be the recommended value for PBE.
2942-
- **Default**:
2943-
- 1.217: if `vdw_method` is set to `d3_0`
2944-
- 0.4289: if `vdw_method` is set to `d3_bj`
2943+
- **Description**: This damping function parameter is relevant for D3(0) and D3(BJ) van der Waals (vdW) correction methods. The recommended values of this parameter with different DFT functionals can be found on the [webpage](https://github.com/dftd3/simple-dftd3/blob/main/assets/parameters.toml). If not set, will search in ABACUS built-in dataset based on the `dft_functional` keywords. User set value will overwrite the searched value.
29452944

29462945
### vdw_a2
29472946

29482947
- **Type**: Real
29492948
- **Availability**: `vdw_method` is set to `d3_0` or `d3_bj`
2950-
- **Description**: This damping function parameter is only relevant for D3(0) and D3(BJ) van der Waals (vdW) correction methods. The recommended values of this parameter with different DFT functionals can be found on the [webpage](https://www.chemiebn.uni-bonn.de/pctc/mulliken-center/software/dft-d3/dft-d3). The default value of this parameter in ABACUS is set to be the recommended value for PBE.
2951-
- **Default**:
2952-
- 1.0: if `vdw_method` is set to `d3_0`
2953-
- 4.4407: if `vdw_method` is set to `d3_bj`
2949+
- **Description**: This damping function parameter is only relevant for D3(0) and D3(BJ) van der Waals (vdW) correction methods. The recommended values of this parameter with different DFT functionals can be found on the [webpage](https://github.com/dftd3/simple-dftd3/blob/main/assets/parameters.toml). If not set, will search in ABACUS built-in dataset based on the `dft_functional` keywords. User set value will overwrite the searched value.
29542950

29552951
### vdw_d
29562952

source/module_base/blas_connector.cpp

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ void BlasConnector::gemm(const char transa, const char transb, const int m, cons
9393
}
9494
#ifdef __DSP
9595
else if (device_type == base_device::AbacusDevice_t::DspDevice){
96-
sgemm_mt_(&transb, &transa, &n, &m, &k,
96+
sgemm_mth_(&transb, &transa, &n, &m, &k,
9797
&alpha, b, &ldb, a, &lda,
9898
&beta, c, &ldc, GlobalV::MY_RANK);
9999
}
@@ -111,7 +111,7 @@ void BlasConnector::gemm(const char transa, const char transb, const int m, cons
111111
}
112112
#ifdef __DSP
113113
else if (device_type == base_device::AbacusDevice_t::DspDevice){
114-
dgemm_mt_(&transb, &transa, &n, &m, &k,
114+
dgemm_mth_(&transb, &transa, &n, &m, &k,
115115
&alpha, b, &ldb, a, &lda,
116116
&beta, c, &ldc, GlobalV::MY_RANK);
117117
}
@@ -129,7 +129,7 @@ void BlasConnector::gemm(const char transa, const char transb, const int m, cons
129129
}
130130
#ifdef __DSP
131131
else if (device_type == base_device::AbacusDevice_t::DspDevice) {
132-
cgemm_mt_(&transb, &transa, &n, &m, &k,
132+
cgemm_mth_(&transb, &transa, &n, &m, &k,
133133
&alpha, b, &ldb, a, &lda,
134134
&beta, c, &ldc, GlobalV::MY_RANK);
135135
}
@@ -147,7 +147,7 @@ void BlasConnector::gemm(const char transa, const char transb, const int m, cons
147147
}
148148
#ifdef __DSP
149149
else if (device_type == base_device::AbacusDevice_t::DspDevice) {
150-
zgemm_mt_(&transb, &transa, &n, &m, &k,
150+
zgemm_mth_(&transb, &transa, &n, &m, &k,
151151
&alpha, b, &ldb, a, &lda,
152152
&beta, c, &ldc, GlobalV::MY_RANK);
153153
}

source/module_base/kernels/dsp/dsp_connector.h

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22
#define DSP_CONNECTOR_H
33
#ifdef __DSP
44

5+
#include "module_base/module_device/device.h"
6+
#include "module_base/module_device/memory_op.h"
7+
#include "module_hsolver/diag_comm_info.h"
8+
59
// Base dsp functions
610
void dspInitHandle(int id);
711
void dspDestoryHandle(int id);
@@ -62,5 +66,66 @@ void cgemm_mth_(const char *transa, const char *transb,
6266

6367
//#define zgemm_ zgemm_mt
6468

69+
// The next is dsp utils. It may be moved to other files if this file get too huge
70+
71+
template <typename T>
72+
void dsp_dav_subspace_reduce(T* hcc, T* scc, int nbase, int nbase_x, int notconv, MPI_Comm diag_comm){
73+
74+
using syncmem_complex_op = base_device::memory::synchronize_memory_op<T, base_device::DEVICE_CPU, base_device::DEVICE_CPU>;
75+
76+
auto* swap = new T[notconv * nbase_x];
77+
auto* target = new T[notconv * nbase_x];
78+
syncmem_complex_op()(cpu_ctx, cpu_ctx, swap, hcc + nbase * nbase_x, notconv * nbase_x);
79+
if (base_device::get_current_precision(swap) == "single")
80+
{
81+
MPI_Reduce(swap,
82+
target,
83+
notconv * nbase_x,
84+
MPI_COMPLEX,
85+
MPI_SUM,
86+
0,
87+
diag_comm);
88+
}
89+
else
90+
{
91+
MPI_Reduce(swap,
92+
target,
93+
notconv * nbase_x,
94+
MPI_DOUBLE_COMPLEX,
95+
MPI_SUM,
96+
0,
97+
diag_comm);
98+
}
99+
100+
syncmem_complex_op()(cpu_ctx, cpu_ctx, hcc + nbase * nbase_x, target, notconv * nbase_x);
101+
syncmem_complex_op()(cpu_ctx, cpu_ctx, swap, scc + nbase * nbase_x, notconv * nbase_x);
102+
103+
if (base_device::get_current_precision(swap) == "single")
104+
{
105+
MPI_Reduce(swap,
106+
target,
107+
notconv * nbase_x,
108+
MPI_COMPLEX,
109+
MPI_SUM,
110+
0,
111+
diag_comm);
112+
}
113+
else
114+
{
115+
MPI_Reduce(swap,
116+
target,
117+
notconv * nbase_x,
118+
MPI_DOUBLE_COMPLEX,
119+
MPI_SUM,
120+
0,
121+
diag_comm);
122+
}
123+
124+
syncmem_complex_op()(cpu_ctx, cpu_ctx, scc + nbase * nbase_x, target, notconv * nbase_x);
125+
delete[] swap;
126+
delete[] target;
127+
}
128+
129+
65130
#endif
66131
#endif

source/module_base/module_device/memory_op.cpp

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -346,5 +346,57 @@ template struct delete_memory_op<std::complex<float>, base_device::DEVICE_GPU>;
346346
template struct delete_memory_op<std::complex<double>, base_device::DEVICE_GPU>;
347347
#endif
348348

349+
#ifdef __DSP
350+
351+
template <typename FPTYPE>
352+
struct resize_memory_op_mt<FPTYPE, base_device::DEVICE_CPU>
353+
{
354+
void operator()(const base_device::DEVICE_CPU* dev, FPTYPE*& arr, const size_t size, const char* record_in)
355+
{
356+
if (arr != nullptr)
357+
{
358+
free_ht(arr);
359+
}
360+
arr = (FPTYPE*)malloc_ht(sizeof(FPTYPE) * size, GlobalV::MY_RANK);
361+
std::string record_string;
362+
if (record_in != nullptr)
363+
{
364+
record_string = record_in;
365+
}
366+
else
367+
{
368+
record_string = "no_record";
369+
}
370+
371+
if (record_string != "no_record")
372+
{
373+
ModuleBase::Memory::record(record_string, sizeof(FPTYPE) * size);
374+
}
375+
}
376+
};
377+
378+
template <typename FPTYPE>
379+
struct delete_memory_op_mt<FPTYPE, base_device::DEVICE_CPU>
380+
{
381+
void operator()(const base_device::DEVICE_CPU* dev, FPTYPE* arr)
382+
{
383+
free_ht(arr);
384+
}
385+
};
386+
387+
388+
template struct resize_memory_op_mt<int, base_device::DEVICE_CPU>;
389+
template struct resize_memory_op_mt<float, base_device::DEVICE_CPU>;
390+
template struct resize_memory_op_mt<double, base_device::DEVICE_CPU>;
391+
template struct resize_memory_op_mt<std::complex<float>, base_device::DEVICE_CPU>;
392+
template struct resize_memory_op_mt<std::complex<double>, base_device::DEVICE_CPU>;
393+
394+
template struct delete_memory_op_mt<int, base_device::DEVICE_CPU>;
395+
template struct delete_memory_op_mt<float, base_device::DEVICE_CPU>;
396+
template struct delete_memory_op_mt<double, base_device::DEVICE_CPU>;
397+
template struct delete_memory_op_mt<std::complex<float>, base_device::DEVICE_CPU>;
398+
template struct delete_memory_op_mt<std::complex<double>, base_device::DEVICE_CPU>;
399+
#endif
400+
349401
} // namespace memory
350402
} // namespace base_device

source/module_base/module_device/memory_op.h

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,36 @@ struct delete_memory_op<FPTYPE, base_device::DEVICE_GPU>
146146
};
147147
#endif // __CUDA || __UT_USE_CUDA || __ROCM || __UT_USE_ROCM
148148

149+
#ifdef __DSP
150+
151+
template <typename FPTYPE, typename Device>
152+
struct resize_memory_op_mt
153+
{
154+
/// @brief Allocate memory for a given pointer. Note this op will free the pointer first.
155+
///
156+
/// Input Parameters
157+
/// \param dev : the type of computing device
158+
/// \param size : array size
159+
/// \param record_string : label for memory record
160+
///
161+
/// Output Parameters
162+
/// \param arr : allocated array
163+
void operator()(const Device* dev, FPTYPE*& arr, const size_t size, const char* record_in = nullptr);
164+
};
165+
166+
template <typename FPTYPE, typename Device>
167+
struct delete_memory_op_mt
168+
{
169+
/// @brief free memory for multi-device
170+
///
171+
/// Input Parameters
172+
/// \param dev : the type of computing device
173+
/// \param arr : the input array
174+
void operator()(const Device* dev, FPTYPE* arr);
175+
};
176+
177+
#endif // __DSP
178+
149179
} // end of namespace memory
150180
} // end of namespace base_device
151181

@@ -233,5 +263,4 @@ using castmem_z2c_d2h_op = base_device::memory::
233263

234264
static base_device::DEVICE_CPU* cpu_ctx = {};
235265
static base_device::DEVICE_GPU* gpu_ctx = {};
236-
237266
#endif // MODULE_DEVICE_MEMORY_H_

source/module_base/module_device/types.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ namespace base_device
66

77
struct DEVICE_CPU;
88
struct DEVICE_GPU;
9+
struct DEVICE_DSP;
910

1011
enum AbacusDevice_t
1112
{

source/module_esolver/esolver_ks_lcao.cpp

Lines changed: 4 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -973,76 +973,11 @@ void ESolver_KS_LCAO<TK, TR>::iter_finish(int& iter)
973973
if( GlobalC::exx_info.info_global.cal_exx && this->conv_esolver ) one_step_exx = true;
974974

975975
// 3) save exx matrix
976-
int two_level_step = GlobalC::exx_info.info_ri.real_number ? this->exd->two_level_step : this->exc->two_level_step;
977-
978-
if (GlobalC::restart.info_save.save_H && two_level_step > 0
979-
&& (!GlobalC::exx_info.info_global.separate_loop || iter == 1)) // to avoid saving the same value repeatedly
980-
{
981-
////////// for Add_Hexx_Type::k
982-
/*
983-
hamilt::HS_Matrix_K<TK> Hexxk_save(&this->pv, 1);
984-
for (int ik = 0; ik < this->kv.get_nks(); ++ik) {
985-
Hexxk_save.set_zero_hk();
986-
987-
hamilt::OperatorEXX<hamilt::OperatorLCAO<TK, TR>> opexx_save(&Hexxk_save,
988-
nullptr,
989-
this->kv);
990-
991-
opexx_save.contributeHk(ik);
992-
993-
GlobalC::restart.save_disk("Hexx",
994-
ik,
995-
this->pv.get_local_size(),
996-
Hexxk_save.get_hk());
997-
}*/
998-
////////// for Add_Hexx_Type:R
999-
const std::string& restart_HR_path = GlobalC::restart.folder + "HexxR" + std::to_string(GlobalV::MY_RANK);
1000-
if (GlobalC::exx_info.info_ri.real_number)
1001-
{
1002-
ModuleIO::write_Hexxs_csr(restart_HR_path, GlobalC::ucell, this->exd->get_Hexxs());
1003-
}
1004-
else
1005-
{
1006-
ModuleIO::write_Hexxs_csr(restart_HR_path, GlobalC::ucell, this->exc->get_Hexxs());
1007-
}
1008-
if (GlobalV::MY_RANK == 0)
1009-
{
1010-
GlobalC::restart.save_disk("Eexx", 0, 1, &this->pelec->f_en.exx);
1011-
}
1012-
}
1013-
1014-
if (GlobalC::exx_info.info_global.cal_exx && this->conv_esolver)
976+
if (GlobalC::exx_info.info_global.cal_exx)
1015977
{
1016-
// Kerker mixing does not work for the density matrix.
1017-
// In the separate loop case, it can still work in the subsequent inner loops where Hexx(DM) is fixed.
1018-
// In the non-separate loop case where Hexx(DM) is updated in every iteration of the 2nd loop, it should be
1019-
// closed.
1020-
if (!GlobalC::exx_info.info_global.separate_loop)
1021-
{
1022-
this->p_chgmix->close_kerker_gg0();
1023-
}
1024-
if (GlobalC::exx_info.info_ri.real_number)
1025-
{
1026-
this->conv_esolver = this->exd->exx_after_converge(
1027-
*this->p_hamilt,
1028-
*dynamic_cast<const elecstate::ElecStateLCAO<TK>*>(this->pelec)->get_DM(),
1029-
this->kv,
1030-
PARAM.inp.nspin,
1031-
iter,
1032-
this->pelec->f_en.etot,
1033-
this->scf_ene_thr);
1034-
}
1035-
else
1036-
{
1037-
this->conv_esolver = this->exc->exx_after_converge(
1038-
*this->p_hamilt,
1039-
*dynamic_cast<const elecstate::ElecStateLCAO<TK>*>(this->pelec)->get_DM(),
1040-
this->kv,
1041-
PARAM.inp.nspin,
1042-
iter,
1043-
this->pelec->f_en.etot,
1044-
this->scf_ene_thr);
1045-
}
978+
GlobalC::exx_info.info_ri.real_number ?
979+
this->exd->exx_iter_finish(this->kv, GlobalC::ucell, *this->p_hamilt, *this->pelec, *this->p_chgmix, this->scf_ene_thr, iter, this->conv_esolver) :
980+
this->exc->exx_iter_finish(this->kv, GlobalC::ucell, *this->p_hamilt, *this->pelec, *this->p_chgmix, this->scf_ene_thr, iter, this->conv_esolver);
1046981
}
1047982
#endif
1048983

0 commit comments

Comments
 (0)