deepmodeling
diff --git a/‎docs/advanced/acceleration/cuda.md‎
Lines changed: 4 additions & 2 deletions b/‎docs/advanced/acceleration/cuda.md‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎docs/advanced/input_files/input-main.md‎
Lines changed: 29 additions & 2 deletions b/‎docs/advanced/input_files/input-main.md‎
Lines changed: 29 additions & 2 deletions
diff --git a/‎python/pyabacus/src/py_diago_dav_subspace.hpp‎
Lines changed: 9 additions & 8 deletions b/‎python/pyabacus/src/py_diago_dav_subspace.hpp‎
Lines changed: 9 additions & 8 deletions
diff --git a/‎python/pyabacus/src/py_diago_david.hpp‎
Lines changed: 7 additions & 9 deletions b/‎python/pyabacus/src/py_diago_david.hpp‎
Lines changed: 7 additions & 9 deletions
diff --git a/‎python/pyabacus/src/pyabacus/hsolver/_hsolver.py‎
Lines changed: 16 additions & 14 deletions b/‎python/pyabacus/src/pyabacus/hsolver/_hsolver.py‎
Lines changed: 16 additions & 14 deletions
diff --git a/‎source/Makefile.Objects‎
Lines changed: 10 additions & 1 deletion b/‎source/Makefile.Objects‎
Lines changed: 10 additions & 1 deletion
diff --git a/‎source/driver_run.cpp‎
Lines changed: 1 addition & 1 deletion b/‎source/driver_run.cpp‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎source/module_base/global_variable.cpp‎
Lines changed: 0 additions & 3 deletions b/‎source/module_base/global_variable.cpp‎
Lines changed: 0 additions & 3 deletions
diff --git a/‎source/module_base/global_variable.h‎
Lines changed: 0 additions & 3 deletions b/‎source/module_base/global_variable.h‎
Lines changed: 0 additions & 3 deletions
@@ -29,12 +29,14 @@ To compile and use ABACUS in CUDA mode, you currently need to have an NVIDIA GPU
 
 Check the [Advanced Installation Options](https://abacus-rtd.readthedocs.io/en/latest/advanced/install.html#build-with-cuda-support) for the installation of CUDA version support.
 
-When the compilation parameter USE_ELPA is ON (which is the default value) and USE_CUDA is also set to ON, the ELPA library needs to [enable GPU support](https://github.com/marekandreas/elpa/blob/master/documentation/INSTALL.md) at compile time.
+Setting both USE_ELPA and USE_CUDA to ON does not automatically enable ELPA to run on GPUs. ELPA support for GPUs needs to be enabled when ELPA is compiled. [enable GPU support](https://github.com/marekandreas/elpa/blob/master/documentation/INSTALL.md).
+
+The ABACUS program will automatically determine whether the current ELPA supports GPU based on the elpa/elpa_configured_options.h header file. Users can also check this header file to determine the GPU support of ELPA in their environment. ELPA introduced a new API elpa_setup_gpu in version 2023.11.001. So if you want to enable ELPA GPU in ABACUS, the ELPA version must be greater than or equal to 2023.11.001.
 
 ## Run with the GPU support by editing the INPUT script:
 
 In `INPUT` file we need to set the input parameter [device](../input_files/input-main.md#device) to `gpu`. If this parameter is not set, ABACUS will try to determine if there are available GPUs.
-- Set `ks_solver`: For the PW basis, CG, BPCG and Davidson methods are supported on GPU; set the input parameter [ks_solver](../input_files/input-main.md#ks_solver) to `cg`, `bpcg` or `dav`. For the LCAO basis, `cusolver` and `elpa` is supported on GPU.
+- Set `ks_solver`: For the PW basis, CG, BPCG and Davidson methods are supported on GPU; set the input parameter [ks_solver](../input_files/input-main.md#ks_solver) to `cg`, `bpcg` or `dav`. For the LCAO basis, `cusolver`, `cusolvermp` and `elpa` is supported on GPU.
 - **multi-card**: ABACUS allows for multi-GPU acceleration. If you have multiple GPU cards, you can run ABACUS with several MPI processes, and each process will utilize one GPU card. For example, the command `mpirun -n 2 abacus` will by default launch two GPUs for computation. If you only have one card, this command will only start one GPU. 
 
 ## Examples
 
@@ -161,6 +161,7 @@
     - [nbands\_istate](#nbands_istate)
     - [bands\_to\_print](#bands_to_print)
     - [if\_separate\_k](#if_separate_k)
+    - [out\_elf](#out_elf)
   - [Density of states](#density-of-states)
     - [dos\_edelta\_ev](#dos_edelta_ev)
     - [dos\_sigma](#dos_sigma)
@@ -666,6 +667,7 @@ These variables are used to control parameters related to input files.
 - **Type**: String
 - **Description**: the name of the structure file
   - Containing various information about atom species, including pseudopotential files, local orbitals files, cell information, atom positions, and whether atoms should be allowed to move.
+  - When [calculation](#calculation) is set to `md` and [md_restart](#md_restart) is set to `true`, this keyword will NOT work.
   - Refer to [Doc](https://github.com/deepmodeling/abacus-develop/blob/develop/docs/advanced/input_files/stru.md)
 - **Default**: STRU
 
@@ -931,6 +933,8 @@ calculations.
   - **genelpa**: This method should be used if you choose localized orbitals.
   - **scalapack_gvx**: Scalapack can also be used for localized orbitals.
   - **cusolver**: This method needs building with CUDA and at least one gpu is available.
+  - **cusolvermp**: This method supports multi-GPU acceleration and needs building with CUDA。 Note that when using cusolvermp, you should set the number of MPI processes to be equal to the number of GPUs.
+  - **elpa**: The ELPA solver supports both CPU and GPU. By setting the `device` to GPU, you can launch the ELPA solver with GPU acceleration (provided that you have installed a GPU-supported version of ELPA, which requires you to manually compile and install ELPA, and the ABACUS should be compiled with -DUSE_ELPA=ON and -DUSE_CUDA=ON). The ELPA solver also supports multi-GPU acceleration.
 
   If you set ks_solver=`genelpa` for basis_type=`pw`, the program will be stopped with an error message:
 
@@ -939,7 +943,13 @@ calculations.
   ```
 
   Then the user has to correct the input file and restart the calculation.
-- **Default**: cg (plane-wave basis), or genelpa (localized atomic orbital basis, if compiling option `USE_ELPA` has been set),lapack (localized atomic orbital basis, if compiling option `ENABLE_MPI` has not been set), scalapack_gvx, (localized atomic orbital basis, if compiling option `USE_ELPA` has not been set and if compiling option `ENABLE_MPI` has been set)
+- **Default**: 
+  - **PW basis**: cg.
+  - **LCAO basis**:
+    - genelpa (if compiling option `USE_ELPA` has been set)
+    - lapack (if compiling option `ENABLE_MPI` has not been set)
+    - scalapack_gvx (if compiling option `USE_ELPA` has not been set and compiling option `ENABLE_MPI` has been set)
+    - cusolver (if compiling option `USE_CUDA` has been set)
 
 ### nbands
 
@@ -1520,7 +1530,7 @@ These variables are used to control the output of properties.
 - **Type**: Integer \[Integer\](optional)
 - **Description**: 
   The first integer controls whether to output the charge density on real space grids:
-  - 1. Output the charge density (in Bohr^-3) on real space grids into the density files in the folder `OUT.${suffix}`. The files are named as:
+  - 1: Output the charge density (in Bohr^-3) on real space grids into the density files in the folder `OUT.${suffix}`. The files are named as:
     - nspin = 1: SPIN1_CHG.cube;
     - nspin = 2: SPIN1_CHG.cube, and SPIN2_CHG.cube;
     - nspin = 4: SPIN1_CHG.cube, SPIN2_CHG.cube, SPIN3_CHG.cube, and SPIN4_CHG.cube.
@@ -1800,6 +1810,23 @@ The band (KS orbital) energy for each (k-point, spin, band) will be printed in t
 - **Description**: Specifies whether to write the partial charge densities for all k-points to individual files or merge them. **Warning**: Enabling symmetry may produce incorrect results due to incorrect k-point weights. Therefore, when calculating partial charge densities, it is strongly recommended to set `symmetry = -1`.
 - **Default**: false
 
+### out_elf
+
+- **Type**: Integer \[Integer\](optional)
+- **Availability**: Only for Kohn-Sham DFT and Orbital Free DFT.
+- **Description**: Whether to output the electron localization function (ELF) in the folder `OUT.${suffix}`. The files are named as 
+    - nspin = 1:
+      - ELF.cube: ${\rm{ELF}} = \frac{1}{1+\chi^2}$, $\chi = \frac{\frac{1}{2}\sum_{i}{f_i |\nabla\psi_{i}|^2} - \frac{|\nabla\rho|^2}{8\rho}}{\frac{3}{10}(3\pi^2)^{2/3}\rho^{5/3}}$;
+    - nspin = 2:
+      - ELF_SPIN1.cube, ELF_SPIN2.cube: ${\rm{ELF}}_\sigma = \frac{1}{1+\chi_\sigma^2}$, $\chi_\sigma = \frac{\frac{1}{2}\sum_{i}{f_i |\nabla\psi_{i,\sigma}|^2} - \frac{|\nabla\rho_\sigma|^2}{8\rho_\sigma}}{\frac{3}{10}(6\pi^2)^{2/3}\rho_\sigma^{5/3}}$;
+      - ELF.cube: ${\rm{ELF}} = \frac{1}{1+\chi^2}$, $\chi = \frac{\frac{1}{2}\sum_{i,\sigma}{f_i |\nabla\psi_{i,\sigma}|^2} - \sum_{\sigma}{\frac{|\nabla\rho_\sigma|^2}{8\rho_\sigma}}}{\sum_{\sigma}{\frac{3}{10}(6\pi^2)^{2/3}\rho_\sigma^{5/3}}}$;
+
+  The second integer controls the precision of the kinetic energy density output, if not given, will use `3` as default. For purpose restarting from this file and other high-precision involved calculation, recommend to use `10`.
+
+  ---
+  In molecular dynamics calculations, the output frequency is controlled by [out_interval](#out_interval).
+- **Default**: 0 3
+
 [back to top](#full-list-of-input-keywords)
 
 ## Density of states
 
@@ -110,23 +110,24 @@ class PyDiagoDavSubspace
         bool scf_type,
         hsolver::diag_comm_info comm_info
     ) {
-        auto hpsi_func = [mm_op] (std::complex<double> *hpsi_out,
-                    std::complex<double> *psi_in, const int nband_in,
-                    const int nbasis_in, const int band_index1,
-                    const int band_index2) 
-        {
+        auto hpsi_func = [mm_op] (
+            std::complex<double> *psi_in,
+            std::complex<double> *hpsi_out, 
+            const int ld_psi,
+            const int nvec
+        ) {
             // Note: numpy's py::array_t is row-major, but
             //       our raw pointer-array is column-major
-            py::array_t<std::complex<double>, py::array::f_style> psi({nbasis_in, band_index2 - band_index1 + 1});
+            py::array_t<std::complex<double>, py::array::f_style> psi({ld_psi, nvec});
             py::buffer_info psi_buf = psi.request();
             std::complex<double>* psi_ptr = static_cast<std::complex<double>*>(psi_buf.ptr);
-            std::copy(psi_in + band_index1 * nbasis_in, psi_in + (band_index2 + 1) * nbasis_in, psi_ptr);
+            std::copy(psi_in, psi_in + nvec * ld_psi, psi_ptr);
 
             py::array_t<std::complex<double>, py::array::f_style> hpsi = mm_op(psi);
 
             py::buffer_info hpsi_buf = hpsi.request();
             std::complex<double>* hpsi_ptr = static_cast<std::complex<double>*>(hpsi_buf.ptr);
-            std::copy(hpsi_ptr, hpsi_ptr + (band_index2 - band_index1 + 1) * nbasis_in, hpsi_out);
+            std::copy(hpsi_ptr, hpsi_ptr + nvec * ld_psi, hpsi_out);
         };
 
         obj = std::make_unique<hsolver::Diago_DavSubspace<std::complex<double>, base_device::DEVICE_CPU>>(
 
@@ -109,25 +109,23 @@ class PyDiagoDavid
         hsolver::diag_comm_info comm_info
     ) {
         auto hpsi_func = [mm_op] (
-            std::complex<double> *hpsi_out,
-            std::complex<double> *psi_in, 
-            const int nband_in, 
-            const int nbasis_in, 
-            const int band_index1, 
-            const int band_index2
+            std::complex<double> *psi_in,
+            std::complex<double> *hpsi_out, 
+            const int ld_psi, 
+            const int nvec
         ) {
             // Note: numpy's py::array_t is row-major, but
             //       our raw pointer-array is column-major
-            py::array_t<std::complex<double>, py::array::f_style> psi({nbasis_in, band_index2 - band_index1 + 1});
+            py::array_t<std::complex<double>, py::array::f_style> psi({ld_psi, nvec});
             py::buffer_info psi_buf = psi.request();
             std::complex<double>* psi_ptr = static_cast<std::complex<double>*>(psi_buf.ptr);
-            std::copy(psi_in + band_index1 * nbasis_in, psi_in + (band_index2 + 1) * nbasis_in, psi_ptr);
+            std::copy(psi_in, psi_in + nvec * ld_psi, psi_ptr);
 
             py::array_t<std::complex<double>, py::array::f_style> hpsi = mm_op(psi);
 
             py::buffer_info hpsi_buf = hpsi.request();
             std::complex<double>* hpsi_ptr = static_cast<std::complex<double>*>(hpsi_buf.ptr);
-            std::copy(hpsi_ptr, hpsi_ptr + (band_index2 - band_index1 + 1) * nbasis_in, hpsi_out);
+            std::copy(hpsi_ptr, hpsi_ptr + nvec * ld_psi, hpsi_out);
         };
 
         auto spsi_func = [this] (
 
@@ -16,7 +16,7 @@ def rank(self) -> int: ...
     def nproc(self) -> int: ...
 
 def dav_subspace(
-    mm_op: Callable[[NDArray[np.complex128]], NDArray[np.complex128]],
+    mvv_op: Callable[[NDArray[np.complex128]], NDArray[np.complex128]],
     init_v: NDArray[np.complex128],
     dim: int,
     num_eigs: int,
@@ -32,9 +32,10 @@ def dav_subspace(
 
     Parameters
     ----------
-    mm_op : Callable[[NDArray[np.complex128]], NDArray[np.complex128]],
-        The operator to be diagonalized, which is a function that takes a matrix as input
-        and returns a matrix mv_op(X) = H * X as output.
+    mvv_op : Callable[[NDArray[np.complex128]], NDArray[np.complex128]],
+        The operator to be diagonalized, which is a function that takes a set of 
+        vectors X = [x1, ..., xN] as input and returns a matrix(vector block)
+        mvv_op(X) = H * X ([Hx1, ..., HxN]) as output.
     init_v : NDArray[np.complex128]
         The initial guess for the eigenvectors.
     dim : int
@@ -68,8 +69,8 @@ def dav_subspace(
     v : NDArray[np.complex128]
         The eigenvectors corresponding to the eigenvalues.
     """
-    if not callable(mm_op):
-        raise TypeError("mm_op must be a callable object.")
+    if not callable(mvv_op):
+        raise TypeError("mvv_op must be a callable object.")
 
     if is_occupied is None:
         is_occupied = [True] * num_eigs
@@ -86,7 +87,7 @@ def dav_subspace(
     assert dav_ndim * num_eigs < dim * comm_info.nproc, "dav_ndim * num_eigs must be less than dim * comm_info.nproc."
 
     _ = _diago_obj_dav_subspace.diag(
-        mm_op,
+        mvv_op,
         pre_condition,
         dav_ndim,
         tol,
@@ -103,7 +104,7 @@ def dav_subspace(
     return e, v
 
 def davidson(
-    mm_op: Callable[[NDArray[np.complex128]], NDArray[np.complex128]],
+    mvv_op: Callable[[NDArray[np.complex128]], NDArray[np.complex128]],
     init_v: NDArray[np.complex128],
     dim: int,
     num_eigs: int,
@@ -119,9 +120,10 @@ def davidson(
 
     Parameters
     ----------
-    mm_op : Callable[[NDArray[np.complex128]], NDArray[np.complex128]],
-        The operator to be diagonalized, which is a function that takes a matrix as input
-        and returns a matrix mv_op(X) = H * X as output.
+    mvv_op : Callable[[NDArray[np.complex128]], NDArray[np.complex128]],
+        The operator to be diagonalized, which is a function that takes a set of 
+        vectors X = [x1, ..., xN] as input and returns a matrix(vector block)
+        mvv_op(X) = H * X ([Hx1, ..., HxN]) as output.
     init_v : NDArray[np.complex128]
         The initial guess for the eigenvectors.
     dim : int
@@ -146,8 +148,8 @@ def davidson(
     v : NDArray[np.complex128]
         The eigenvectors corresponding to the eigenvalues.
     """
-    if not callable(mm_op):
-        raise TypeError("mm_op must be a callable object.")
+    if not callable(mvv_op):
+        raise TypeError("mvv_op must be a callable object.")
 
     if init_v.ndim != 1 or init_v.dtype != np.complex128:
         init_v = init_v.flatten().astype(np.complex128, order='C')
@@ -159,7 +161,7 @@ def davidson(
     comm_info = hsolver.diag_comm_info(0, 1)
 
     _ = _diago_obj_dav_subspace.diag(
-        mm_op,
+        mvv_op,
         pre_condition,
         dav_ndim,
         tol,
 
@@ -213,6 +213,7 @@ OBJS_ELECSTAT=elecstate.o\
     elecstate_print.o\
     elecstate_pw.o\
     elecstate_pw_sdft.o\
+    elecstate_pw_cal_tau.o\
     elecstate_op.o\
     efield.o\
     gatefield.o\
@@ -226,6 +227,7 @@ OBJS_ELECSTAT=elecstate.o\
 
 OBJS_ELECSTAT_LCAO=elecstate_lcao.o\
       elecstate_lcao_tddft.o\
+      elecstate_lcao_cal_tau.o\
       density_matrix.o\
       cal_dm_psi.o\
 
@@ -454,7 +456,12 @@ OBJS_XC=xc_functional.o\
     xc_functional_gradcorr.o\
     xc_functional_wrapper_xc.o\
     xc_functional_wrapper_gcxc.o\
-    xc_functional_wrapper_tauxc.o\
+    xc_functional_libxc.o\
+    xc_functional_libxc_tools.o\
+    xc_functional_libxc_vxc.o\
+    xc_functional_libxc_wrapper_xc.o\
+    xc_functional_libxc_wrapper_gcxc.o\
+    xc_functional_libxc_wrapper_tauxc.o\
     xc_funct_exch_lda.o\
     xc_funct_corr_lda.o\
     xc_funct_exch_gga.o\
@@ -496,6 +503,7 @@ OBJS_IO=input_conv.o\
     winput.o\
     write_cube.o\
     write_elecstat_pot.o\
+    write_elf.o\
     write_dipole.o\
     td_current_io.o\
     write_wfc_r.o\
@@ -523,6 +531,7 @@ OBJS_IO=input_conv.o\
     read_input_item_other.o\
     read_input_item_output.o\
     read_set_globalv.o\
+    orb_io.o\
 
 OBJS_IO_LCAO=cal_r_overlap_R.o\
       write_orb_info.o\
 
@@ -40,7 +40,7 @@ void Driver::driver_run() {
 
     // the life of ucell should begin here, mohan 2024-05-12
     // delete ucell as a GlobalC in near future
-    GlobalC::ucell.setup_cell(PARAM.inp.stru_file, GlobalV::ofs_running);
+    GlobalC::ucell.setup_cell(PARAM.globalv.global_in_stru, GlobalV::ofs_running);
     Check_Atomic_Stru::check_atomic_stru(GlobalC::ucell,
                                          PARAM.inp.min_dist_coef);
 
 
@@ -21,7 +21,6 @@ namespace GlobalV
 int NBANDS = 0;
 int NLOCAL = 0;        // total number of local basis.
 
-int NSPIN = 1;       // LDA
 double nupdown = 0.0;
 
 bool use_uspp = false;
@@ -55,8 +54,6 @@ int GSIZE = DSIZE;
 //----------------------------------------------------------
 // EXPLAIN : The input file name and directory
 //----------------------------------------------------------
-std::string stru_file = "STRU";
-
 std::ofstream ofs_running;
 std::ofstream ofs_warning;
 std::ofstream ofs_info;   // output math lib info
 
@@ -20,8 +20,6 @@ namespace GlobalV
 extern int NBANDS;
 extern int NLOCAL;        // 1.1 // mohan add 2009-05-29
 
-
-extern int NSPIN;       // 7
 extern double nupdown;
 extern bool use_uspp;
 
@@ -80,7 +78,6 @@ extern int KPAR_LCAO;
 // NAME : ofs_running( contain information during runnnig)
 // NAME : ofs_warning( contain warning information, including error)
 //==========================================================
-extern std::string stru_file;
 // extern std::string global_pseudo_type; // mohan add 2013-05-20 (xiaohui add
 // 2013-06-23)
 extern std::ofstream ofs_running;