deepmodeling
diff --git a/‎CMakeLists.txt‎
Lines changed: 1 addition & 0 deletions b/‎CMakeLists.txt‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/advanced/input_files/input-main.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/advanced/input_files/input-main.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎source/Makefile.Objects‎
Lines changed: 3 additions & 6 deletions b/‎source/Makefile.Objects‎
Lines changed: 3 additions & 6 deletions
diff --git a/‎source/driver.cpp‎
Lines changed: 1 addition & 1 deletion b/‎source/driver.cpp‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎source/driver.h‎
Lines changed: 3 additions & 1 deletion b/‎source/driver.h‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎source/driver_run.cpp‎
Lines changed: 12 additions & 13 deletions b/‎source/driver_run.cpp‎
Lines changed: 12 additions & 13 deletions
diff --git a/‎source/module_base/blas_connector.h‎
Lines changed: 15 additions & 1 deletion b/‎source/module_base/blas_connector.h‎
Lines changed: 15 additions & 1 deletion
diff --git a/‎source/module_base/grid/batch.cpp‎
Lines changed: 123 additions & 0 deletions b/‎source/module_base/grid/batch.cpp‎
Lines changed: 123 additions & 0 deletions
diff --git a/‎source/module_base/grid/batch.h‎
Lines changed: 54 additions & 0 deletions b/‎source/module_base/grid/batch.h‎
Lines changed: 54 additions & 0 deletions
diff --git a/‎source/module_base/grid/test/CMakeLists.txt‎
Lines changed: 7 additions & 0 deletions b/‎source/module_base/grid/test/CMakeLists.txt‎
Lines changed: 7 additions & 0 deletions
@@ -709,6 +709,7 @@ target_link_libraries(
   hamilt_stodft
   psi
   psi_initializer
+  psi_overall_init
   esolver
   vdw
   device
 
@@ -989,7 +989,7 @@ calculations.
 
 - **Type**: String
 - **Description**: In our package, the XC functional can either be set explicitly using the `dft_functional` keyword in `INPUT` file. If `dft_functional` is not specified, ABACUS will use the xc functional indicated in the pseudopotential file.
-  On the other hand, if dft_functional is specified, it will overwrite the functional from pseudopotentials and performs calculation with whichever functional the user prefers. We further offer two ways of supplying exchange-correlation functional. The first is using 'short-hand' names such as 'LDA', 'PBE', 'SCAN'. A complete list of 'short-hand' expressions can be found in [the source code](../../../source/module_hamilt_general/module_xc/xc_functional.cpp). The other way is only available when ***compiling with LIBXC***, and it allows for supplying exchange-correlation functionals as combinations of LIBXC keywords for functional components, joined by a plus sign, for example, 'dft_functional='LDA_X_1D_EXPONENTIAL+LDA_C_1D_CSC'. The list of LIBXC keywords can be found on its [website](https://www.tddft.org/programs/libxc/functionals/). In this way, **we support all the LDA,GGA and mGGA functionals provided by LIBXC**.
+  On the other hand, if dft_functional is specified, it will overwrite the functional from pseudopotentials and performs calculation with whichever functional the user prefers. We further offer two ways of supplying exchange-correlation functional. The first is using 'short-hand' names such as 'LDA', 'PBE', 'SCAN'. A complete list of 'short-hand' expressions can be found in [the source code](../../../source/module_hamilt_general/module_xc/xc_functional.cpp). The other way is only available when ***compiling with LIBXC***, and it allows for supplying exchange-correlation functionals as combinations of LIBXC keywords for functional components, joined by a plus sign, for example, dft_functional='LDA_X_1D_EXPONENTIAL+LDA_C_1D_CSC'. The list of LIBXC keywords can be found on its [website](https://libxc.gitlab.io/functionals/). In this way, **we support all the LDA,GGA and mGGA functionals provided by LIBXC**.
 
   Furthermore, the old INPUT parameter exx_hybrid_type for hybrid functionals has been absorbed into dft_functional. Options are `hf` (pure Hartree-Fock), `pbe0`(PBE0), `hse` (Note: in order to use HSE functional, LIBXC is required). Note also that HSE has been tested while PBE0 has NOT been fully tested yet, and the maximum CPU cores for running exx in parallel is $N(N+1)/2$, with N being the number of atoms. And forces for hybrid functionals are not supported yet.
 
@@ -1389,7 +1389,7 @@ These variables are used to control the geometry relaxation.
 ### relax_nmax
 
 - **Type**: Integer
-- **Description**: The maximal number of ionic iteration steps, the minimum value is 1.
+- **Description**: The maximal number of ionic iteration steps. If set to 0, the code performs a quick "dry run", stopping just after initialization. This is useful to check for input correctness and to have the summary printed.
 - **Default**: 1 for SCF, 50 for relax and cell-relax calcualtions
 
 ### relax_cg_thr
@@ -1760,14 +1760,14 @@ The band (KS orbital) energy for each (k-point, spin, band) will be printed in t
 
 - **Type**: Boolean
 - **Availability**: Numerical atomic orbital basis
-- **Description**: Whether to print Hamiltonian matrices H(R)/density matrics DM(R) in npz format. This feature does not work for gamma-only calculations. Currently only intended for internal usage.
+- **Description**: Whether to print Hamiltonian matrices $H(R)$/density matrics $DM(R)$ in npz format. This feature does not work for gamma-only calculations. Currently only intended for internal usage.
 - **Default**: False
 
 ### dm_to_rho
 
 - **Type**: Boolean
 - **Availability**: Numerical atomic orbital basis
-- **Description**: Reads density matrix DM(R) in npz format and creates electron density on grids. This feature does not work for gamma-only calculations. Only supports serial calculations. Currently only intended for internal usage.
+- **Description**: Reads density matrix $DM(R)$ in npz format and creates electron density on grids. This feature does not work for gamma-only calculations. Only supports serial calculations. Currently only intended for internal usage.
 - **Default**: False
 
 ### out_app_flag
 
@@ -226,6 +226,7 @@ OBJS_ELECSTAT=elecstate.o\
     H_Hartree_pw.o\
     H_TDDFT_pw.o\
     pot_xc.o\
+    cal_ux.o\
 
 OBJS_ELECSTAT_LCAO=elecstate_lcao.o\
       elecstate_lcao_cal_tau.o\
@@ -245,12 +246,10 @@ OBJS_ESOLVER=esolver.o\
     esolver_of.o\
     esolver_of_tool.o\
     esolver_of_interface.o\
-    pw_init_globalc.o\
     pw_others.o\
 
 OBJS_ESOLVER_LCAO=esolver_ks_lcao.o\
       esolver_ks_lcao_tddft.o\
-      dpks_cal_e_delta_band.o\
       lcao_before_scf.o\
       esolver_gets.o\
       lcao_others.o\
@@ -403,9 +402,7 @@ OBJS_PSI_INITIALIZER=psi_initializer.o\
                      psi_initializer_nao.o\
                      psi_initializer_nao_random.o\
 
-OBJS_PW=fft.o\
-    fft_bundle.o\
-    fft_base.o\
+OBJS_PW=fft_bundle.o\
     fft_cpu.o\
     pw_basis.o\
     pw_basis_k.o\
@@ -668,7 +665,7 @@ OBJS_SRCPW=H_Ewald_pw.o\
     symmetry_rhog.o\
     wavefunc.o\
     wf_atomic.o\
-    wfinit.o\
+    psi_init.o\
     elecond.o\
     sto_tool.o\
     sto_elecond.o\
 
@@ -183,7 +183,7 @@ void Driver::atomic_world()
     //--------------------------------------------------
 
     // where the actual stuff is done
-    this->driver_run();
+    this->driver_run(GlobalC::ucell);
 
     ModuleBase::timer::finish(GlobalV::ofs_running);
     ModuleBase::Memory::print_all(GlobalV::ofs_running);
 
@@ -1,6 +1,8 @@
 #ifndef DRIVER_H
 #define DRIVER_H
 
+#include "module_cell/unitcell.h"
+
 class Driver
 {
   public:
@@ -34,7 +36,7 @@ class Driver
     void atomic_world();
 
     // the actual calculations
-    void driver_run();
+    void driver_run(UnitCell& ucell);
 };
 
 #endif
@@ -24,7 +24,8 @@
  * the configuration-changing subroutine takes force and stress and updates the
  * configuration
  */
-void Driver::driver_run() {
+void Driver::driver_run(UnitCell& ucell)
+{
     ModuleBase::TITLE("Driver", "driver_line");
     ModuleBase::timer::tick("Driver", "driver_line");
 
@@ -39,37 +40,35 @@ void Driver::driver_run() {
 #endif
 
     // the life of ucell should begin here, mohan 2024-05-12
-    // delete ucell as a GlobalC in near future
-    GlobalC::ucell.setup_cell(PARAM.globalv.global_in_stru, GlobalV::ofs_running);
-    Check_Atomic_Stru::check_atomic_stru(GlobalC::ucell,
-                                         PARAM.inp.min_dist_coef);
+    ucell.setup_cell(PARAM.globalv.global_in_stru, GlobalV::ofs_running);
+    Check_Atomic_Stru::check_atomic_stru(ucell, PARAM.inp.min_dist_coef);
 
     //! 2: initialize the ESolver (depends on a set-up ucell after `setup_cell`)
-    ModuleESolver::ESolver* p_esolver = ModuleESolver::init_esolver(PARAM.inp, GlobalC::ucell);
+    ModuleESolver::ESolver* p_esolver = ModuleESolver::init_esolver(PARAM.inp, ucell);
 
     //! 3: initialize Esolver and fill json-structure
-    p_esolver->before_all_runners(PARAM.inp, GlobalC::ucell);
+    p_esolver->before_all_runners(ucell, PARAM.inp);
 
     // this Json part should be moved to before_all_runners, mohan 2024-05-12
 #ifdef __RAPIDJSON
-    Json::gen_stru_wrapper(&GlobalC::ucell);
+    Json::gen_stru_wrapper(&ucell);
 #endif
 
     const std::string cal_type = PARAM.inp.calculation;
 
     //! 4: different types of calculations
     if (cal_type == "md")
     {
-        Run_MD::md_line(GlobalC::ucell, p_esolver, PARAM);
+        Run_MD::md_line(ucell, p_esolver, PARAM);
     }
     else if (cal_type == "scf" || cal_type == "relax" || cal_type == "cell-relax" || cal_type == "nscf")
     {
         Relax_Driver rl_driver;
-        rl_driver.relax_driver(p_esolver);
+        rl_driver.relax_driver(p_esolver, ucell);
     }
     else if (cal_type == "get_S")
     {
-        p_esolver->runner(0, GlobalC::ucell);
+        p_esolver->runner(ucell, 0);
     }
     else
     {
@@ -79,11 +78,11 @@ void Driver::driver_run() {
         //! test_neighbour(LCAO),
         //! gen_bessel(PW), et al.
         const int istep = 0;
-        p_esolver->others(istep);
+        p_esolver->others(ucell, istep);
     }
 
     //! 5: clean up esolver
-    p_esolver->after_all_runners();
+    p_esolver->after_all_runners(ucell);
 
     ModuleESolver::clean_esolver(p_esolver);
 
 
@@ -39,6 +39,20 @@ extern "C"
 	double dnrm2_( const int *n, const double *X, const int *incX );
 	double dznrm2_( const int *n, const std::complex<double> *X, const int *incX );
 
+    // symmetric rank-k update
+    void dsyrk_(
+        const char* uplo,
+        const char* trans,
+        const int* n,
+        const int* k,
+        const double* alpha,
+        const double* a,
+        const int* lda,
+        const double* beta,
+        double* c,
+        const int* ldc
+    );
+
 	// level 2: matrix-std::vector operations, O(n^2) data and O(n^2) work.
 	void sgemv_(const char*const transa, const int*const m, const int*const n,
 		const float*const alpha, const float*const a, const int*const lda, const float*const x, const int*const incx,
@@ -267,4 +281,4 @@ void zgemv_i(const char *trans,
 */
 
 #endif // GATHER_INFO
-#endif // BLAS_CONNECTOR_H
+#endif // BLAS_CONNECTOR_H
@@ -0,0 +1,123 @@
+#include "module_base/grid/batch.h"
+
+#include <algorithm>
+#include <cassert>
+#include <iterator>
+
+#include "module_base/blas_connector.h"
+#include "module_base/lapack_connector.h"
+
+namespace {
+
+/**
+ * @brief Divide a set of points into two subsets by the "MaxMin" algorithm.
+ *
+ * This function divides a given set of grid points by a cut plane
+ * {x|n^T*(x-c) = 0} where the normal vector n and the point c are
+ * determined by the "MaxMin" problem:
+ *
+ *      max min sum_{i=1}^{m} [n^T * (r[idx[i]] - c)]^2
+ *       n   c
+ *
+ * here r[j] = (grid[3*j], grid[3*j+1], grid[3*j+2]) is the position of
+ * the j-th point.
+ *
+ * It can be shown that the optimal c is the centroid of the points, and
+ * the optimal n is the eigenvector corresponding to the largest eigenvalue
+ * of the matrix R*R^T, where the i-th column of R is r[idx[i]] - c.
+ *
+ * @param[in]       grid    Coordinates of all grid points.
+ *                          grid[3*j], grid[3*j+1], grid[3*j+2] are the
+ *                          x, y, z coordinates of the j-th point.
+ * @param[in,out]   idx     Indices of the selected points within grid.
+ *                          On return, idx will be rearranged such that
+ *                          points belonging to the same subset have their
+ *                          indices placed together.
+ * @param[in]       m       Number of selected points (length of idx).
+ *
+ * @return The number of points in the first subset within idx.
+ *
+ */
+int _maxmin_divide(const double* grid, int* idx, int m) {
+    assert(m > 1);
+    if (m == 2) {
+        return 1;
+    }
+
+    std::vector<double> centroid(3, 0.0);
+    for (int i = 0; i < m; ++i) {
+        int j = idx[i];
+        centroid[0] += grid[3*j    ];
+        centroid[1] += grid[3*j + 1];
+        centroid[2] += grid[3*j + 2];
+    }
+    centroid[0] /= m;
+    centroid[1] /= m;
+    centroid[2] /= m;
+
+    // positions w.r.t. the centroid
+    std::vector<double> R(3*m, 0.0);
+    for (int i = 0; i < m; ++i) {
+        int j = idx[i];
+        R[3*i    ] = grid[3*j    ] - centroid[0];
+        R[3*i + 1] = grid[3*j + 1] - centroid[1];
+        R[3*i + 2] = grid[3*j + 2] - centroid[2];
+    }
+
+    // The normal vector of the cut plane is taken to be the eigenvector
+    // corresponding to the largest eigenvalue of the 3x3 matrix A = R*R^T.
+    std::vector<double> A(9, 0.0);
+    int i3 = 3, i1 = 1;
+    double d0 = 0.0, d1 = 1.0;
+    dsyrk_("U", "N", &i3, &m, &d1, R.data(), &i3, &d0, A.data(), &i3);
+
+    int info = 0, lwork = 102 /* determined by a work space query */;
+    std::vector<double> e(3), work(lwork);
+    dsyev_("V", "U", &i3, A.data(), &i3, e.data(), work.data(), &lwork, &info);
+    double* n = A.data() + 6; // normal vector of the cut plane
+
+    // Rearrange the indices to put points in each subset together by
+    // examining the signed distances of points to the cut plane (R^T*n).
+    std::vector<double> dist(m);
+    dgemv_("T", &i3, &m, &d1, R.data(), &i3, n, &i1, &d0, dist.data(), &i1);
+
+    int *head = idx;
+    std::reverse_iterator<int*> tail(idx + m), rend(idx);
+    auto is_negative = [&dist, &idx](int& j) { return dist[&j - idx] < 0; };
+    while ( ( head = std::find_if(head, idx + m, is_negative) ) <
+            ( tail = std::find_if_not(tail, rend, is_negative) ).base() ) {
+        std::swap(*head, *tail);
+        std::swap(dist[head - idx], dist[tail.base() - idx - 1]);
+        ++head;
+        ++tail;
+    }
+
+    return head - idx;
+}
+
+} // end of anonymous namespace
+
+
+std::vector<int> Grid::Batch::maxmin(
+    const double* grid,
+    int* idx,
+    int m,
+    int m_thr
+) {
+    if (m <= m_thr) {
+        return std::vector<int>{0};
+    }
+
+    int m_left = _maxmin_divide(grid, idx, m);
+
+    std::vector<int> left = maxmin(grid, idx, m_left, m_thr);
+    std::vector<int> right = maxmin(grid, idx + m_left, m - m_left, m_thr);
+    std::for_each(right.begin(), right.end(),
+        [m_left](int& x) { x += m_left; }
+    );
+
+    left.insert(left.end(), right.begin(), right.end());
+    return left;
+}
+
+
@@ -0,0 +1,54 @@
+#ifndef GRID_BATCH_H
+#define GRID_BATCH_H
+
+#include <vector>
+
+namespace Grid {
+namespace Batch {
+
+/**
+ * @brief Divide a set of points into batches by the "MaxMin" algorithm.
+ *
+ * This function recursively uses cut planes to divide grid points into
+ * two subsets using the "MaxMin" algorithm, until the number of points
+ * in each subset (batch) is no more than m_thr.
+ *
+ * @param[in]       grid        Coordinates of all grid points.
+ *                              grid[3*j], grid[3*j+1], grid[3*j+2] are
+ *                              the x, y, z coordinates of the j-th point.
+ * @param[in,out]   idx         Indices of the initial set within grid.
+ *                              On return, idx will be rearranged such
+ *                              that points belonging to the same batch
+ *                              have their indices placed together.
+ * @param[in]       m           Number of points in the initial set.
+ *                              (length of idx)
+ * @param[in]       m_thr       Size limit of a batch.
+ *
+ * @return          Indices (for idx) of the first point in each batch.
+ *
+ * For example, given grid (illustrated by their indices) located as follows:
+ *
+ *      0  1  16          2  3  18
+ *      4  5  17            6  7
+ *
+ *
+ *      8  9             20 10 11
+ *     12 13 19           14 15
+ *
+ * a possible outcome with m_thr = 4 and idx(in) = {0, 1, 2, ..., 15}
+ * (idx may correspond to a subset of grid and does not have to be sorted,
+ * but it must not contain duplicates) is:
+ *
+ * idx(out): 0, 1, 4, 5, 8, 9, 12, 13, 2, 3, 6, 7, 10, 11, 14, 15
+ * return  : {0, 4, 8, 12}
+ *
+ * which means the selected set (labeled 0-15) is divided into 4 batches:
+ * {0, 1, 4, 5}, {8, 9, 12, 13}, {2, 3, 6, 7}, {10, 11, 14, 15}.
+ *
+ */
+std::vector<int> maxmin(const double* grid, int* idx, int m, int m_thr);
+
+} // end of namespace Batch
+} // end of namespace Grid
+
+#endif
@@ -20,3 +20,10 @@ AddTest(
   ../radial.cpp
   ../delley.cpp
 )
+
+AddTest(
+  TARGET test_batch
+  SOURCES test_batch.cpp
+  ../batch.cpp
+  LIBS ${math_libs}
+)