deepmodeling
diff --git a/‎CMakeLists.txt‎
Lines changed: 5 additions & 0 deletions b/‎CMakeLists.txt‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎docs/advanced/input_files/input-main.md‎
Lines changed: 8 additions & 0 deletions b/‎docs/advanced/input_files/input-main.md‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎docs/advanced/opt.md‎
Lines changed: 3 additions & 1 deletion b/‎docs/advanced/opt.md‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎source/module_base/module_device/device.h‎
Lines changed: 1 addition & 1 deletion b/‎source/module_base/module_device/device.h‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎source/module_cell/module_neighbor/sltk_grid_driver.cpp‎
Lines changed: 5 additions & 7 deletions b/‎source/module_cell/module_neighbor/sltk_grid_driver.cpp‎
Lines changed: 5 additions & 7 deletions
diff --git a/‎source/module_cell/module_neighbor/sltk_grid_driver.h‎
Lines changed: 1 addition & 1 deletion b/‎source/module_cell/module_neighbor/sltk_grid_driver.h‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎source/module_cell/setup_nonlocal.cpp‎
Lines changed: 1 addition & 1 deletion b/‎source/module_cell/setup_nonlocal.cpp‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎source/module_elecstate/elecstate.cpp‎
Lines changed: 3 additions & 2 deletions b/‎source/module_elecstate/elecstate.cpp‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎source/module_elecstate/elecstate.h‎
Lines changed: 5 additions & 2 deletions b/‎source/module_elecstate/elecstate.h‎
Lines changed: 5 additions & 2 deletions
diff --git a/‎source/module_elecstate/elecstate_energy.cpp‎
Lines changed: 2 additions & 2 deletions b/‎source/module_elecstate/elecstate_energy.cpp‎
Lines changed: 2 additions & 2 deletions
@@ -40,6 +40,7 @@ option(ENABLE_CNPY "Enable cnpy usage." OFF)
 option(ENABLE_PEXSI "Enable support for PEXSI." OFF)
 option(ENABLE_CUSOLVERMP "Enable cusolvermp." OFF)
 option(USE_DSP "Enable DSP usage." OFF)
+option(USE_CUDA_ON_DCU "Enable CUDA on DCU" OFF)
 
 # enable json support
 if(ENABLE_RAPIDJSON)
@@ -126,6 +127,10 @@ if (USE_DSP)
   set(ABACUS_BIN_NAME abacus_dsp)
 endif()
 
+if (USE_CUDA_ON_DCU)
+  add_compile_definitions(__CUDA_ON_DCU)
+endif()
+
 list(APPEND CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake)
 
 if(ENABLE_COVERAGE)
 
@@ -37,6 +37,7 @@
     - [ndx, ndy, ndz](#ndx-ndy-ndz)
     - [pw\_seed](#pw_seed)
     - [pw\_diag\_thr](#pw_diag_thr)
+    - [diago\_smooth\_ethr](#diago_smooth_ethr)
     - [pw\_diag\_nmax](#pw_diag_nmax)
     - [pw\_diag\_ndim](#pw_diag_ndim)
     - [erf\_ecut](#erf_ecut)
@@ -777,6 +778,12 @@ These variables are used to control the plane wave related parameters.
 - **Description**: Only used when you use `ks_solver = cg/dav/dav_subspace/bpcg`. It indicates the threshold for the first electronic iteration, from the second iteration the pw_diag_thr will be updated automatically. **For nscf calculations with planewave basis set, pw_diag_thr should be <= 1e-3.**
 - **Default**: 0.01
 
+### diago_smooth_ethr
+
+- **Type**: bool
+- **Description**: If `TRUE`, the smooth threshold strategy, which applies a larger threshold (10e-5) for the empty states, will be implemented in the diagonalization methods. (This strategy should not affect total energy, forces, and other ground-state properties, but computational efficiency will be improved.) If `FALSE`, the smooth threshold strategy will not be applied.
+- **Default**: false
+
 ### pw_diag_nmax
 
 - **Type**: Integer
@@ -1375,6 +1382,7 @@ These variables are used to control the geometry relaxation.
 - **Description**: The methods to do geometry optimization.
   - cg: using the conjugate gradient (CG) algorithm. Note that there are two implementations of the conjugate gradient (CG) method, see [relax_new](#relax_new).
   - bfgs: using the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm.
+  - bfgs_trad: using the traditional Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm. 
   - cg_bfgs: using the CG method for the initial steps, and switching to BFGS method when the force convergence is smaller than [relax_cg_thr](#relax_cg_thr).
   - sd: using the steepest descent (SD) algorithm.
   - fire: the Fast Inertial Relaxation Engine method (FIRE), a kind of molecular-dynamics-based relaxation algorithm, is implemented in the molecular dynamics (MD) module. The algorithm can be used by setting [calculation](#calculation) to `md` and [md_type](#md_type) to `fire`. Also ionic velocities should be set in this case. See [fire](../md.md#fire) for more details.
 
@@ -22,7 +22,9 @@ In the nested procedure mentioned above, we used CG method to perform cell relax
 
 The [BFGS method](https://en.wikipedia.org/wiki/Broyden%E2%80%93Fletcher%E2%80%93Goldfarb%E2%80%93Shanno_algorithm) is a quasi-Newton method for solving nonlinear optimization problem. It belongs to the class of quasi-Newton method where the Hessian matrix is approximated during the optimization process. If the initial point is not far from the extrema, BFGS tends to work better than gradient-based methods.
 
-In ABACUS, we implemented the BFGS method for doing fixed-cell structural relaxation.
+There is an alternative traditional BFGS method, which can be called by using the keyword 'bfgs_trad'. The bfgs_trad method is a quasi-Newton method that substitute an approximate matrix B for the Hessian matrix. The main difference between 'bfgs' and 'bfgs_trad' is that 'bfgs' updates the inverse of matrix B while 'bfgs_trad' updates matrix B and obtains the inverse of B by solving the matrix eigenvalues and taking the reciprocal of the eigenvalues. Both methods are mathematically equivalent, but in some cases, 'bfgs_trad' performs better.
+
+In ABACUS, we implemented the BFGS method for doing fixed-cell structural relaxation. Users can choose which implementation of BFGS to call by adding the 'bfgs_trad' or 'bfgs' parameter.
 
 ### SD method
 
 
@@ -86,7 +86,7 @@ void record_device_memory(const Device* dev, std::ofstream& ofs_device, std::str
  * @brief for compatibility with __CUDA_ARCH__ 600 and earlier
  *
  */
-#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ < 600
+#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ < 600 && !defined(__CUDA_ON_DCU)
 static __inline__ __device__ double atomicAdd(double* address, double val)
 {
     unsigned long long int* address_as_ull = (unsigned long long int*)address;
 
@@ -21,13 +21,11 @@ Grid_Driver::~Grid_Driver()
 {
 }
 
-
-void Grid_Driver::Find_atom(
-	const UnitCell &ucell, 
-	const ModuleBase::Vector3<double> &cartesian_pos, 
-	const int &ntype, 
-	const int &nnumber,
-	AdjacentAtomInfo *adjs)
+void Grid_Driver::Find_atom(const UnitCell& ucell,
+                            const ModuleBase::Vector3<double>& cartesian_pos,
+                            const int& ntype,
+                            const int& nnumber,
+                            AdjacentAtomInfo* adjs) const
 {
 	ModuleBase::timer::tick("Grid_Driver","Find_atom");
 //	std::cout << "lenght in Find atom = " << atomlink[offset].fatom.getAdjacentSet()->getLength() << std::endl;
 
@@ -70,7 +70,7 @@ class Grid_Driver : public Grid
                    const ModuleBase::Vector3<double>& cartesian_posi,
                    const int& ntype,
                    const int& nnumber,
-                   AdjacentAtomInfo* adjs = nullptr);
+                   AdjacentAtomInfo* adjs = nullptr) const;
 
     //==========================================================
     // EXPLAIN : The adjacent information for the input
 
@@ -34,7 +34,7 @@ void InfoNonlocal::Set_NonLocal(const int& it,
     ModuleBase::TITLE("InfoNonlocal", "Set_NonLocal");
 
     // set a pointer
-    // Atom* atom = &GlobalC::ucell.atoms[it];
+    // Atom* atom = &ucell.atoms[it];
 
     // get the number of non-local projectors
     n_projectors = atom->ncpp.nbeta;
 
@@ -207,6 +207,7 @@ void ElecState::calEBand()
 
 
 void ElecState::init_scf(const int istep, 
+                         const UnitCell& ucell,
                          const ModuleBase::ComplexMatrix& strucfac, 
                          const bool* numeric,
                          ModuleSymmetry::Symmetry& symm, 
@@ -215,7 +216,7 @@ void ElecState::init_scf(const int istep,
     //! core correction potential.
     if (!PARAM.inp.use_paw)
     {
-        this->charge->set_rho_core(strucfac, numeric);
+        this->charge->set_rho_core(ucell,strucfac, numeric);
     }
     else
     {
@@ -226,7 +227,7 @@ void ElecState::init_scf(const int istep,
     // choose charge density from ionic step 0.
     if (istep == 0)
     {
-        this->charge->init_rho(this->eferm, strucfac, symm, (const void*)this->klist, wfcpw);
+        this->charge->init_rho(this->eferm,ucell, strucfac, symm, (const void*)this->klist, wfcpw);
         this->charge->check_rho(); // check the rho
     }
 
 
@@ -104,11 +104,13 @@ class ElecState
      * @brief Init rho_core, init rho, renormalize rho, init pot
      * 
      * @param istep i-th step
+     * @param ucell unit cell
      * @param strucfac structure factor
      * @param symm symmetry
      * @param wfcpw PW basis for wave function if needed
      */
     void init_scf(const int istep,
+                  const UnitCell& ucell,
                   const ModuleBase::ComplexMatrix& strucfac,
                   const bool* numeric,
                   ModuleSymmetry::Symmetry& symm,
@@ -126,7 +128,7 @@ class ElecState
     void cal_bandgap();
     void cal_bandgap_updw();
 
-    double cal_delta_eband() const;
+    double cal_delta_eband(const UnitCell& ucell) const;
     double cal_delta_escf() const;
 
     ModuleBase::matrix vnew;
@@ -171,7 +173,8 @@ class ElecState
     ModuleBase::matrix wg;  ///< occupation weight for each k-point and band
 
   public: // print something. See elecstate_print.cpp
-    void print_etot(const bool converged,
+    void print_etot(const Magnetism& magnet,
+                    const bool converged,
                     const int& iter,
                     const double& scf_thr,
                     const double& scf_thr_kin,
 
@@ -90,7 +90,7 @@ void ElecState::cal_bandgap_updw()
 }
 
 /// @brief calculate deband
-double ElecState::cal_delta_eband() const
+double ElecState::cal_delta_eband(const UnitCell& ucell) const
 {
     // out potentials from potential mixing
     // total energy and band energy corrections
@@ -109,7 +109,7 @@ double ElecState::cal_delta_eband() const
     {
         ModuleBase::matrix v_xc;
         const std::tuple<double, double, ModuleBase::matrix> etxc_vtxc_v
-            = XC_Functional::v_xc(this->charge->nrxx, this->charge, &GlobalC::ucell);
+            = XC_Functional::v_xc(this->charge->nrxx, this->charge, &ucell);
         v_xc = std::get<2>(etxc_vtxc_v);
 
         for (int ir = 0; ir < this->charge->rhopw->nrxx; ir++)
Original file line number	Diff line number	Diff line change
`@@ -86,7 +86,7 @@ void record_device_memory(const Device* dev, std::ofstream& ofs_device, std::str`
`86`	`86`	`* @brief for compatibility with __CUDA_ARCH__ 600 and earlier`
`87`	`87`	`*`
`88`	`88`	`*/`
`89`		`-#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ < 600`
	`89`	`+#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ < 600 && !defined(__CUDA_ON_DCU)`
`90`	`90`	`static __inline__ __device__ double atomicAdd(double* address, double val)`
`91`	`91`	`{`
`92`	`92`	`unsigned long long int* address_as_ull = (unsigned long long int*)address;`
Original file line number	Diff line number	Diff line change
`@@ -21,13 +21,11 @@ Grid_Driver::~Grid_Driver()`
`21`	`21`	`{`
`22`	`22`	`}`
`23`	`23`
`24`		`-`
`25`		`-void Grid_Driver::Find_atom(`
`26`		`- const UnitCell &ucell,`
`27`		`- const ModuleBase::Vector3<double> &cartesian_pos,`
`28`		`- const int &ntype,`
`29`		`- const int &nnumber,`
`30`		`- AdjacentAtomInfo *adjs)`
	`24`	`+void Grid_Driver::Find_atom(const UnitCell& ucell,`
	`25`	`+ const ModuleBase::Vector3<double>& cartesian_pos,`
	`26`	`+ const int& ntype,`
	`27`	`+ const int& nnumber,`
	`28`	`+ AdjacentAtomInfo* adjs) const`
`31`	`29`	`{`
`32`	`30`	`ModuleBase::timer::tick("Grid_Driver","Find_atom");`
`33`	`31`	`// std::cout << "lenght in Find atom = " << atomlink[offset].fatom.getAdjacentSet()->getLength() << std::endl;`
Original file line number	Diff line number	Diff line change
`@@ -207,6 +207,7 @@ void ElecState::calEBand()`
`207`	`207`
`208`	`208`
`209`	`209`	`void ElecState::init_scf(const int istep,`
	`210`	`+ const UnitCell& ucell,`
`210`	`211`	`const ModuleBase::ComplexMatrix& strucfac,`
`211`	`212`	`const bool* numeric,`
`212`	`213`	`ModuleSymmetry::Symmetry& symm,`
`@@ -215,7 +216,7 @@ void ElecState::init_scf(const int istep,`
`215`	`216`	`//! core correction potential.`
`216`	`217`	`if (!PARAM.inp.use_paw)`
`217`	`218`	`{`
`218`		`- this->charge->set_rho_core(strucfac, numeric);`
	`219`	`+ this->charge->set_rho_core(ucell,strucfac, numeric);`
`219`	`220`	`}`
`220`	`221`	`else`
`221`	`222`	`{`
`@@ -226,7 +227,7 @@ void ElecState::init_scf(const int istep,`
`226`	`227`	`// choose charge density from ionic step 0.`
`227`	`228`	`if (istep == 0)`
`228`	`229`	`{`
`229`		`- this->charge->init_rho(this->eferm, strucfac, symm, (const void*)this->klist, wfcpw);`
	`230`	`+ this->charge->init_rho(this->eferm,ucell, strucfac, symm, (const void*)this->klist, wfcpw);`
`230`	`231`	`this->charge->check_rho(); // check the rho`
`231`	`232`	`}`
`232`	`233`
Original file line number	Diff line number	Diff line change
`@@ -90,7 +90,7 @@ void ElecState::cal_bandgap_updw()`
`90`	`90`	`}`
`91`	`91`
`92`	`92`	`/// @brief calculate deband`
`93`		`-double ElecState::cal_delta_eband() const`
	`93`	`+double ElecState::cal_delta_eband(const UnitCell& ucell) const`
`94`	`94`	`{`
`95`	`95`	`// out potentials from potential mixing`
`96`	`96`	`// total energy and band energy corrections`
`@@ -109,7 +109,7 @@ double ElecState::cal_delta_eband() const`
`109`	`109`	`{`
`110`	`110`	`ModuleBase::matrix v_xc;`
`111`	`111`	`const std::tuple<double, double, ModuleBase::matrix> etxc_vtxc_v`
`112`		`- = XC_Functional::v_xc(this->charge->nrxx, this->charge, &GlobalC::ucell);`
	`112`	`+ = XC_Functional::v_xc(this->charge->nrxx, this->charge, &ucell);`
`113`	`113`	`v_xc = std::get<2>(etxc_vtxc_v);`
`114`	`114`
`115`	`115`	`for (int ir = 0; ir < this->charge->rhopw->nrxx; ir++)`