Skip to content

Commit 31fcd02

Browse files
authored
Merge branch 'develop' into Exx_Opt_Orb
2 parents 232cb5d + 6b81902 commit 31fcd02

File tree

197 files changed

+2071
-1974
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

197 files changed

+2071
-1974
lines changed

.github/workflows/coverage.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ jobs:
1818
apt update && apt install -y lcov
1919
- name: Building
2020
run: |
21-
cmake -B build -DENABLE_DEEPKS=ON -DENABLE_LIBXC=ON -DBUILD_TESTING=ON -DENABLE_COVERAGE=ON
21+
cmake -B build -DBUILD_TESTING=ON -DENABLE_DEEPKS=ON -DENABLE_LIBXC=ON -DENABLE_LIBRI=ON -DENABLE_PAW=ON -DENABLE_GOOGLEBENCH=ON -DENABLE_RAPIDJSON=ON
2222
cmake --build build -j`nproc`
2323
cmake --install build
2424
- name: Testing

docs/advanced/acceleration/cuda.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ The ABACUS program will automatically determine whether the current ELPA support
3636
## Run with the GPU support by editing the INPUT script:
3737

3838
In `INPUT` file we need to set the input parameter [device](../input_files/input-main.md#device) to `gpu`. If this parameter is not set, ABACUS will try to determine if there are available GPUs.
39-
- Set `ks_solver`: For the PW basis, CG, BPCG and Davidson methods are supported on GPU; set the input parameter [ks_solver](../input_files/input-main.md#ks_solver) to `cg`, `bpcg` or `dav`. For the LCAO basis, `cusolver` and `elpa` is supported on GPU.
39+
- Set `ks_solver`: For the PW basis, CG, BPCG and Davidson methods are supported on GPU; set the input parameter [ks_solver](../input_files/input-main.md#ks_solver) to `cg`, `bpcg` or `dav`. For the LCAO basis, `cusolver`, `cusolvermp` and `elpa` is supported on GPU.
4040
- **multi-card**: ABACUS allows for multi-GPU acceleration. If you have multiple GPU cards, you can run ABACUS with several MPI processes, and each process will utilize one GPU card. For example, the command `mpirun -n 2 abacus` will by default launch two GPUs for computation. If you only have one card, this command will only start one GPU.
4141

4242
## Examples

docs/advanced/input_files/input-main.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -933,6 +933,8 @@ calculations.
933933
- **genelpa**: This method should be used if you choose localized orbitals.
934934
- **scalapack_gvx**: Scalapack can also be used for localized orbitals.
935935
- **cusolver**: This method needs building with CUDA and at least one gpu is available.
936+
- **cusolvermp**: This method supports multi-GPU acceleration and needs building with CUDA。 Note that when using cusolvermp, you should set the number of MPI processes to be equal to the number of GPUs.
937+
- **elpa**: The ELPA solver supports both CPU and GPU. By setting the `device` to GPU, you can launch the ELPA solver with GPU acceleration (provided that you have installed a GPU-supported version of ELPA, which requires you to manually compile and install ELPA, and the ABACUS should be compiled with -DUSE_ELPA=ON and -DUSE_CUDA=ON). The ELPA solver also supports multi-GPU acceleration.
936938

937939
If you set ks_solver=`genelpa` for basis_type=`pw`, the program will be stopped with an error message:
938940

@@ -941,7 +943,13 @@ calculations.
941943
```
942944

943945
Then the user has to correct the input file and restart the calculation.
944-
- **Default**: cg (plane-wave basis), or genelpa (localized atomic orbital basis, if compiling option `USE_ELPA` has been set),lapack (localized atomic orbital basis, if compiling option `ENABLE_MPI` has not been set), scalapack_gvx, (localized atomic orbital basis, if compiling option `USE_ELPA` has not been set and if compiling option `ENABLE_MPI` has been set)
946+
- **Default**:
947+
- **PW basis**: cg.
948+
- **LCAO basis**:
949+
- genelpa (if compiling option `USE_ELPA` has been set)
950+
- lapack (if compiling option `ENABLE_MPI` has not been set)
951+
- scalapack_gvx (if compiling option `USE_ELPA` has not been set and compiling option `ENABLE_MPI` has been set)
952+
- cusolver (if compiling option `USE_CUDA` has been set)
945953

946954
### nbands
947955

docs/advanced/install.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,9 +93,9 @@ cmake -B build -DUSE_CUDA=1 -DCMAKE_CUDA_COMPILER=${path to cuda toolkit}/bin/nv
9393

9494
## Build math library from source
9595

96-
> Note: This flag is **enabled by default**. It will get better performance than the standard implementation on `gcc` and `clang`. But it **will be disabled** when using `Intel Compiler` since the math functions will get wrong results and the performance is also unexpectly poor.
96+
> Note: We recommend using the latest available compiler sets, since they offer faster implementations of math functions.
9797
98-
To build math functions from source code, instead of using c++ standard implementation, define `USE_ABACUS_LIBM` flag.
98+
This flag is disabled by default. To build math functions from source code, define `USE_ABACUS_LIBM` flag. It is expected to get a better performance on legacy versions of `gcc` and `clang`.
9999

100100
Currently supported math functions:
101101
`sin`, `cos`, `sincos`, `exp`, `cexp`

python/pyabacus/src/py_diago_dav_subspace.hpp

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -113,23 +113,21 @@ class PyDiagoDavSubspace
113113
auto hpsi_func = [mm_op] (
114114
std::complex<double> *psi_in,
115115
std::complex<double> *hpsi_out,
116-
const int nband_in,
117-
const int nbasis_in,
118-
const int band_index1,
119-
const int band_index2
116+
const int ld_psi,
117+
const int nvec
120118
) {
121119
// Note: numpy's py::array_t is row-major, but
122120
// our raw pointer-array is column-major
123-
py::array_t<std::complex<double>, py::array::f_style> psi({nbasis_in, band_index2 - band_index1 + 1});
121+
py::array_t<std::complex<double>, py::array::f_style> psi({ld_psi, nvec});
124122
py::buffer_info psi_buf = psi.request();
125123
std::complex<double>* psi_ptr = static_cast<std::complex<double>*>(psi_buf.ptr);
126-
std::copy(psi_in + band_index1 * nbasis_in, psi_in + (band_index2 + 1) * nbasis_in, psi_ptr);
124+
std::copy(psi_in, psi_in + nvec * ld_psi, psi_ptr);
127125

128126
py::array_t<std::complex<double>, py::array::f_style> hpsi = mm_op(psi);
129127

130128
py::buffer_info hpsi_buf = hpsi.request();
131129
std::complex<double>* hpsi_ptr = static_cast<std::complex<double>*>(hpsi_buf.ptr);
132-
std::copy(hpsi_ptr, hpsi_ptr + (band_index2 - band_index1 + 1) * nbasis_in, hpsi_out);
130+
std::copy(hpsi_ptr, hpsi_ptr + nvec * ld_psi, hpsi_out);
133131
};
134132

135133
obj = std::make_unique<hsolver::Diago_DavSubspace<std::complex<double>, base_device::DEVICE_CPU>>(

python/pyabacus/src/py_diago_david.hpp

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -111,30 +111,27 @@ class PyDiagoDavid
111111
auto hpsi_func = [mm_op] (
112112
std::complex<double> *psi_in,
113113
std::complex<double> *hpsi_out,
114-
const int nband_in,
115-
const int nbasis_in,
116-
const int band_index1,
117-
const int band_index2
114+
const int ld_psi,
115+
const int nvec
118116
) {
119117
// Note: numpy's py::array_t is row-major, but
120118
// our raw pointer-array is column-major
121-
py::array_t<std::complex<double>, py::array::f_style> psi({nbasis_in, band_index2 - band_index1 + 1});
119+
py::array_t<std::complex<double>, py::array::f_style> psi({ld_psi, nvec});
122120
py::buffer_info psi_buf = psi.request();
123121
std::complex<double>* psi_ptr = static_cast<std::complex<double>*>(psi_buf.ptr);
124-
std::copy(psi_in + band_index1 * nbasis_in, psi_in + (band_index2 + 1) * nbasis_in, psi_ptr);
122+
std::copy(psi_in, psi_in + nvec * ld_psi, psi_ptr);
125123

126124
py::array_t<std::complex<double>, py::array::f_style> hpsi = mm_op(psi);
127125

128126
py::buffer_info hpsi_buf = hpsi.request();
129127
std::complex<double>* hpsi_ptr = static_cast<std::complex<double>*>(hpsi_buf.ptr);
130-
std::copy(hpsi_ptr, hpsi_ptr + (band_index2 - band_index1 + 1) * nbasis_in, hpsi_out);
128+
std::copy(hpsi_ptr, hpsi_ptr + nvec * ld_psi, hpsi_out);
131129
};
132130

133131
auto spsi_func = [this] (
134132
const std::complex<double> *psi_in,
135133
std::complex<double> *spsi_out,
136134
const int nrow,
137-
const int npw,
138135
const int nbands
139136
) {
140137
syncmem_op()(this->ctx, this->ctx, spsi_out, psi_in, static_cast<size_t>(nbands * nrow));

source/driver.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ void Driver::init()
4141

4242
// (3) output information
4343
time_t time_finish = std::time(nullptr);
44-
Print_Info::print_time(time_start, time_finish);
44+
ModuleIO::print_time(time_start, time_finish);
4545

4646
// (4) close all of the running logs
4747
ModuleBase::Global_File::close_all_log(GlobalV::MY_RANK, PARAM.inp.out_alllog,PARAM.inp.calculation);

source/module_base/global_variable.cpp

Lines changed: 1 addition & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -12,24 +12,6 @@
1212
#include <vector>
1313
namespace GlobalV
1414
{
15-
16-
//----------------------------------------------------------
17-
// EXPLAIN : Basic Global Variables
18-
// In practice calculation, these values are set in
19-
// input.cpp.
20-
//----------------------------------------------------------
21-
int NBANDS = 0;
22-
int NLOCAL = 0; // total number of local basis.
23-
24-
double nupdown = 0.0;
25-
26-
bool use_uspp = false;
27-
std::string KS_SOLVER = "cg"; // xiaohui add 2013-09-01
28-
double SEARCH_RADIUS = -1.0;
29-
30-
int NB2D = 1;
31-
32-
3315
//----------------------------------------------------------
3416
// EXPLAIN : Parallel information
3517
//----------------------------------------------------------
@@ -52,21 +34,11 @@ int GRANK = MY_RANK;
5234
int GSIZE = DSIZE;
5335

5436
//----------------------------------------------------------
55-
// EXPLAIN : The input file name and directory
37+
// EXPLAIN : ofstream for output
5638
//----------------------------------------------------------
5739
std::ofstream ofs_running;
5840
std::ofstream ofs_warning;
5941
std::ofstream ofs_info; // output math lib info
6042
std::ofstream ofs_device; // output device info
6143

62-
63-
//==========================================================
64-
// device flags added by denghui
65-
//==========================================================
66-
std::string device_flag = "unknown";
67-
68-
double nelec = 0;
69-
70-
71-
// on-site orbitals
7244
} // namespace GlobalV

source/module_base/global_variable.h

Lines changed: 0 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -13,25 +13,6 @@
1313

1414
namespace GlobalV
1515
{
16-
//==========================================================
17-
// EXPLAIN : Basic Global Variables
18-
//==========================================================
19-
20-
extern int NBANDS;
21-
extern int NLOCAL; // 1.1 // mohan add 2009-05-29
22-
23-
extern double nupdown;
24-
extern bool use_uspp;
25-
26-
extern std::string KS_SOLVER; // xiaohui add 2013-09-01
27-
extern double SEARCH_RADIUS; // 11.1 // mohan add 2011-03-10
28-
29-
30-
extern int NB2D; // 16.5 dividsion of 2D_matrix.
31-
32-
// pw, 2: real drho for lcao
33-
34-
3516
//========================================================================
3617
// EXPLAIN : Parallel information
3718
// GLOBAL VARIABLES :
@@ -84,30 +65,5 @@ extern std::ofstream ofs_running;
8465
extern std::ofstream ofs_warning;
8566
extern std::ofstream ofs_info;
8667
extern std::ofstream ofs_device;
87-
88-
89-
// mixing parameters
90-
91-
//==========================================================
92-
// device flags added by denghui
93-
//==========================================================
94-
extern std::string device_flag;
95-
//==========================================================
96-
// precision flags added by denghui
97-
//==========================================================
98-
99-
// "out_chg" elec step.
100-
/// @brief method to initialize wavefunction
101-
/// @author kirk0830, 20230920
102-
/// @brief whether use the new psi initializer to initialize psi
103-
/// @author ykhuang, 20230920
104-
105-
extern double nelec;
106-
107-
// Deltaspin related
108-
109-
// Quasiatomic orbital related
110-
111-
// radius of on-site orbitals
11268
} // namespace GlobalV
11369
#endif

source/module_base/module_device/device.cpp

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -148,13 +148,12 @@ int set_device_by_rank(const MPI_Comm mpi_comm) {
148148
#endif
149149

150150
std::string get_device_flag(const std::string &device,
151-
const std::string &ks_solver,
152151
const std::string &basis_type) {
153152
if (device == "cpu") {
154153
return "cpu"; // no extra checks required
155154
}
156155
std::string error_message;
157-
if (device != "" and device != "gpu")
156+
if (device != "auto" and device != "gpu")
158157
{
159158
error_message += "Parameter \"device\" can only be set to \"cpu\" or \"gpu\"!";
160159
ModuleBase::WARNING_QUIT("device", error_message);

0 commit comments

Comments
 (0)