deepmodeling
diff --git a/‎docs/advanced/input_files/input-main.md‎
Lines changed: 3 additions & 5 deletions b/‎docs/advanced/input_files/input-main.md‎
Lines changed: 3 additions & 5 deletions
diff --git a/‎docs/quick_start/input.md‎
Lines changed: 18 additions & 17 deletions b/‎docs/quick_start/input.md‎
Lines changed: 18 additions & 17 deletions
diff --git a/‎examples/lr-tddft/lcao_H2O/INPUT‎
Lines changed: 3 additions & 0 deletions b/‎examples/lr-tddft/lcao_H2O/INPUT‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎examples/lr-tddft/lcao_Si2/INPUT‎
Lines changed: 4 additions & 1 deletion b/‎examples/lr-tddft/lcao_Si2/INPUT‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎source/Makefile.Objects‎
Lines changed: 7 additions & 5 deletions b/‎source/Makefile.Objects‎
Lines changed: 7 additions & 5 deletions
diff --git a/‎source/module_base/blas_connector.cpp‎
Lines changed: 142 additions & 1 deletion b/‎source/module_base/blas_connector.cpp‎
Lines changed: 142 additions & 1 deletion
@@ -1561,10 +1561,8 @@ These variables are used to control the output of properties.
 ### out_freq_elec
 
 - **Type**: Integer
-- **Description**: The output frequency of the charge density (controlled by [out_chg](#out_chg)), wavefunction (controlled by [out_wfc_pw](#out_wfc_pw) or [out_wfc_r](#out_wfc_r)), and density matrix of localized orbitals (controlled by [out_dm](#out_dm)).
-  - \>0: Output them every `out_freq_elec` iteration numbers in electronic iterations.
-  - 0: Output them when the electronic iteration is converged or reaches the maximal iteration number.
-- **Default**: 0
+- **Description**: Output the charge density (only binary format, controlled by [out_chg](#out_chg)), wavefunction (controlled by [out_wfc_pw](#out_wfc_pw) or [out_wfc_r](#out_wfc_r)) per `out_freq_elec` electronic iterations. Note that they are always output when converged or reach the maximum iterations [scf_nmax](#scf_nmax).
+- **Default**: [scf_nmax](#scf_nmax)
 
 ### out_chg
 
@@ -2060,7 +2058,7 @@ Warning: this function is not robust enough for the current version. Please try
 - **Type**: int
 - **Availability**: numerical atomic orbital basis
 - **Description**: Include V_delta label for DeePKS training. When `deepks_out_labels` is true and `deepks_v_delta` > 0, ABACUS will output h_base.npy, v_delta.npy and h_tot.npy(h_tot=h_base+v_delta). 
-  Meanwhile, when `deepks_v_delta` equals 1, ABACUS will also output v_delta_precalc.npy, which is used to calculate V_delta during DeePKS training. However, when the number of atoms grows, the size of v_delta_precalc.npy will be very large. In this case, it's recommended to set `deepks_v_delta` as 2, and ABACUS will output psialpha.npy and grad_evdm.npy but not v_delta_precalc.npy. These two files are small and can be used to calculate v_delta_precalc in the procedure of training DeePKS.
+  Meanwhile, when `deepks_v_delta` equals 1, ABACUS will also output v_delta_precalc.npy, which is used to calculate V_delta during DeePKS training. However, when the number of atoms grows, the size of v_delta_precalc.npy will be very large. In this case, it's recommended to set `deepks_v_delta` as 2, and ABACUS will output phialpha.npy and grad_evdm.npy but not v_delta_precalc.npy. These two files are small and can be used to calculate v_delta_precalc in the procedure of training DeePKS.
 - **Default**: 0
 
 ### deepks_out_unittest
 
@@ -8,17 +8,17 @@ The `INPUT` file contains parameters that control the type of calculation as wel
 
 Below is an example `INPUT` file with some of the most important parameters that need to be set:
 
-```
+```plaintext
 INPUT_PARAMETERS
 suffix                  MgO
 ntype                   2
 pseudo_dir              ./
-orbital_dir		./
-ecutwfc                 100             # Rydberg
-scf_thr                 1e-4		# Rydberg
-basis_type              lcao            
-calculation             scf		# this is the key parameter telling abacus to do a scf calculation
-out_chg			True
+orbital_dir             ./
+ecutwfc                 100  # in Rydberg
+scf_thr                 1e-4 # Rydberg
+basis_type              lcao
+calculation             scf  # this is the key parameter telling abacus to do a scf calculation
+out_chg                 True
 ```
 
 The parameter list always starts with key word `INPUT_PARAMETERS`. Any content before `INPUT_PARAMETERS` will be ignored.
@@ -40,22 +40,23 @@ In the above example, the meanings of the parameters are:
 - `ntype` : how many types of elements in the unit cell
 - `pseudo_dir` : the directory where pseudopotential files are provided
 - `orbital_dir` : the directory where orbital files are provided
-- `ecutwfc` : the plane-wave energy cutoff for the wave function expansion (UNIT: Rydberg)    
-- `scf_thr` : the threshold for the convergence of charge density (UNIT: Rydberg)    
+- `ecutwfc` : the plane-wave energy cutoff for the wave function expansion (UNIT: Rydberg)
+- `scf_thr` : the threshold for the convergence of charge density (UNIT: Rydberg)
 - `basis_type` : the type of basis set for expanding the electronic wave functions
 - `calculation` : the type of calculation to be performed by ABACUS
-- `out_chg` : if true, output thee charge density oon real space grid
+- `out_chg` : if true, output the charge density on real space grid
 
 For a complete list of input parameters, please consult this [instruction](../advanced/input_files/input-main.md).
 
-> **Note:** Users cannot change the filename “INPUT” to other names. Boolean paramerters such as `out_chg` can be set by using `True` and `False`, `1` and `0`, or `T` and `F`. It is case insensitive so that other preferences such as `true` and `false`, `TRUE` and `FALSE`, and `t` and `f` for setting boolean values are also supported.
+> **Note:** Users cannot change the filename “INPUT” to other names. Boolean paramerters such as `out_chg` can be set by using `True` and `False`, `1` and `0`, or `T` and `F`. It is case insensitive so that other preferences such as `true` and `false`, `TRUE` and `FALSE`, and `t` and `f` for setting boolean values are also supported. Specifically for the `out_chg`, `-1` option is also available, which means turn off the checkpoint of charge density in binary (always dumped in `OUT.{suffix}`, whose name ends with `CHARGE-DENSITY.restart`). Some parameters controlling the output also support a second option to control the output precision, e.g., `out_chg True 8` will output the charge density on realspace grid with 8 digits after the decimal point.
 
 ## *STRU*
 
-The structure file contains structural information about the system, e.g., lattice constant, lattice vectors, and positions of the atoms within a unit cell. The positions can be given either in direct or Cartesian coordinates. 
+The structure file contains structural information about the system, e.g., lattice constant, lattice vectors, and positions of the atoms within a unit cell. The positions can be given either in direct or Cartesian coordinates.
 
 An example of the `STRU` file is given as follows :
-```
+
+```plaintext
 #This is the atom file containing all the information
 #about the lattice structure.
 
@@ -68,7 +69,7 @@ Mg_gga_8au_100Ry_4s2p1d.orb
 O_gga_8au_100Ry_2s2p1d.orb
 
 LATTICE_CONSTANT
-1.8897259886 		# 1.8897259886 Bohr =  1.0 Angstrom
+1.8897259886 # 1.8897259886 Bohr =  1.0 Angstrom
 
 LATTICE_VECTORS
 4.25648 0.00000 0.00000  
@@ -100,9 +101,10 @@ For a more detailed description of STRU file, please consult [here](../advanced/
 ## *KPT*
 
 This file contains information of the kpoint grid setting for the Brillouin zone sampling.
-    
+
 An example of the `KPT` file is given below:
-```
+
+```plaintext
 K_POINTS
 0 
 Gamma
@@ -111,7 +113,6 @@ Gamma
 
 > **Note:** users may choose a different name for their k-point file using keyword `kpoint_file`
 
-
 For a more detailed description, please consult [here](../advanced/input_files/kpt.md).
 
 - The pseudopotential files
 
@@ -6,6 +6,7 @@ orbital_dir                 ../../../tests/PP_ORB
 calculation             scf
 nbands                23
 symmetry               	-1
+nspin                   2 
 
 #Parameters (2.Iteration)
 ecutwfc                  60 ###Energy cutoff needs to be tested to ensure your calculation is reliable.[1]
@@ -30,6 +31,7 @@ xc_kernel lda
 lr_solver dav
 lr_thr 1e-2
 pw_diag_ndim 2
+# lr_unrestricted 1  ### use this to do TDUKS calculation for closeshell systems (openshell system will force TDUKS)
 
 esolver_type ks-lr
 out_alllog	1
@@ -39,6 +41,7 @@ out_alllog	1
 nvirt 19
 abs_wavelen_range  40 180
 abs_broadening 0.01
+abs_gauge length
 
 ### [1] Energy cutoff determines the quality of numerical quadratures in your calculations.
 ###     So it is strongly recommended to test whether your result (such as converged SCF energies) is
 
@@ -5,7 +5,8 @@ pseudo_dir              ../../../tests/PP_ORB
 orbital_dir                 ../../../tests/PP_ORB
 calculation             scf
 nbands                23
-symmetry               	0
+symmetry               	-1
+nspin                    2
 
 #Parameters (2.Iteration)
 ecutwfc                  60 ###Energy cutoff needs to be tested to ensure your calculation is reliable.[1]
@@ -37,6 +38,8 @@ out_alllog	1
 
 nvirt 19
 abs_wavelen_range  100 175
+abs_broadening 0.01 # in Ry
+abs_gauge velocity ### velocity gauge is recommended for periodic systems
 
 
 ### [1] Energy cutoff determines the quality of numerical quadratures in your calculations.
 
@@ -187,20 +187,21 @@ OBJS_CELL=atom_pseudo.o\
     klist.o\
     cell_index.o\
     check_atomic_stru.o\
+    update_cell.o\
+    bcast_cell.o\
 
 OBJS_DEEPKS=LCAO_deepks.o\
-        deepks_fgamma.o\
-        deepks_fk.o\
-        LCAO_deepks_odelta.o\
+        deepks_force.o\
+        deepks_orbital.o\
         LCAO_deepks_io.o\
         LCAO_deepks_mpi.o\
         LCAO_deepks_pdm.o\
-        LCAO_deepks_psialpha.o\
+        LCAO_deepks_phialpha.o\
         LCAO_deepks_torch.o\
         LCAO_deepks_vdelta.o\
         deepks_hmat.o\
         LCAO_deepks_interface.o\
-        orbital_precalc.o\
+        deepks_orbpre.o\
         cal_gdmx.o\
         cal_gedm.o\
         cal_gvx.o\
@@ -731,6 +732,7 @@ OBJS_TENSOR=tensor.o\
     xc_kernel.o\
     pot_hxc_lrtd.o\
     lr_spectrum.o\
+    lr_spectrum_velocity.o\
     hamilt_casida.o\
     esolver_lrtd_lcao.o\
 
 
@@ -82,6 +82,7 @@ double BlasConnector::dot( const int n, const double *X, const int incX, const d
 }
 
 // C = a * A.? * B.? + b * C
+// Row-Major part
 void BlasConnector::gemm(const char transa, const char transb, const int m, const int n, const int k,
 	const float alpha, const float *a, const int lda, const float *b, const int ldb,
 	const float beta, float *c, const int ldc, base_device::AbacusDevice_t device_type)
@@ -154,6 +155,147 @@ void BlasConnector::gemm(const char transa, const char transb, const int m, cons
 	#endif
 }
 
+// Col-Major part
+void BlasConnector::gemm_cm(const char transa, const char transb, const int m, const int n, const int k,
+	const float alpha, const float *a, const int lda, const float *b, const int ldb,
+	const float beta, float *c, const int ldc, base_device::AbacusDevice_t device_type)
+{
+	if (device_type == base_device::AbacusDevice_t::CpuDevice) {
+		sgemm_(&transa, &transb, &m, &n, &k,
+		&alpha, a, &lda, b, &ldb,
+		&beta, c, &ldc);
+	}
+	#ifdef __DSP
+	else if (device_type == base_device::AbacusDevice_t::DspDevice){
+		sgemm_mth_(&transb, &transa, &m, &n, &k,
+		&alpha, a, &lda, b, &ldb,
+		&beta, c, &ldc, GlobalV::MY_RANK);
+	}
+	#endif
+}
+
+void BlasConnector::gemm_cm(const char transa, const char transb, const int m, const int n, const int k,
+	const double alpha, const double *a, const int lda, const double *b, const int ldb,
+	const double beta, double *c, const int ldc, base_device::AbacusDevice_t device_type)
+{
+	if (device_type == base_device::AbacusDevice_t::CpuDevice) {
+		dgemm_(&transa, &transb, &m, &n, &k,
+		&alpha, a, &lda, b, &ldb,
+		&beta, c, &ldc);
+	}
+	#ifdef __DSP
+	else if (device_type == base_device::AbacusDevice_t::DspDevice){
+		dgemm_mth_(&transa, &transb, &m, &n, &k,
+		&alpha, a, &lda, b, &ldb,
+		&beta, c, &ldc, GlobalV::MY_RANK);
+	}
+	#endif
+}
+
+void BlasConnector::gemm_cm(const char transa, const char transb, const int m, const int n, const int k,
+    const std::complex<float> alpha, const std::complex<float> *a, const int lda, const std::complex<float> *b, const int ldb,
+    const std::complex<float> beta, std::complex<float> *c, const int ldc, base_device::AbacusDevice_t device_type)
+{
+	if (device_type == base_device::AbacusDevice_t::CpuDevice) {
+    	cgemm_(&transa, &transb, &m, &n, &k,
+        &alpha, a, &lda, b, &ldb,
+        &beta, c, &ldc);
+	}
+	#ifdef __DSP
+	else if (device_type == base_device::AbacusDevice_t::DspDevice) {
+    	cgemm_mth_(&transa, &transb, &m, &n, &k,
+        &alpha, a, &lda, b, &ldb,
+        &beta, c, &ldc, GlobalV::MY_RANK);
+	}
+	#endif
+}
+
+void BlasConnector::gemm_cm(const char transa, const char transb, const int m, const int n, const int k,
+	const std::complex<double> alpha, const std::complex<double> *a, const int lda, const std::complex<double> *b, const int ldb,
+	const std::complex<double> beta, std::complex<double> *c, const int ldc, base_device::AbacusDevice_t device_type)
+{
+	if (device_type == base_device::AbacusDevice_t::CpuDevice) {
+		zgemm_(&transa, &transb, &m, &n, &k,
+		&alpha, a, &lda, b, &ldb,
+		&beta, c, &ldc);
+	}
+	#ifdef __DSP
+	else if (device_type == base_device::AbacusDevice_t::DspDevice) {
+    	zgemm_mth_(&transa, &transb, &m, &n, &k,
+        &alpha, a, &lda, b, &ldb,
+        &beta, c, &ldc, GlobalV::MY_RANK);
+	}
+	#endif
+}
+
+// Symm and Hemm part. Only col-major is supported.
+
+void BlasConnector::symm_cm(const char side, const char uplo, const int m, const int n,
+	const float alpha, const float *a, const int lda, const float *b, const int ldb,
+	const float beta, float *c, const int ldc, base_device::AbacusDevice_t device_type)
+{
+	if (device_type == base_device::AbacusDevice_t::CpuDevice) {
+		ssymm_(&side, &uplo, &m, &n,
+		&alpha, a, &lda, b, &ldb,
+		&beta, c, &ldc);
+	}
+}
+
+void BlasConnector::symm_cm(const char side, const char uplo, const int m, const int n,
+	const double alpha, const double *a, const int lda, const double *b, const int ldb,
+	const double beta, double *c, const int ldc, base_device::AbacusDevice_t device_type)
+{
+	if (device_type == base_device::AbacusDevice_t::CpuDevice) {
+		dsymm_(&side, &uplo, &m, &n,
+		&alpha, a, &lda, b, &ldb,
+		&beta, c, &ldc);
+	}
+}
+
+void BlasConnector::symm_cm(const char side, const char uplo, const int m, const int n,
+    const std::complex<float> alpha, const std::complex<float> *a, const int lda, const std::complex<float> *b, const int ldb,
+    const std::complex<float> beta, std::complex<float> *c, const int ldc, base_device::AbacusDevice_t device_type)
+{
+	if (device_type == base_device::AbacusDevice_t::CpuDevice) {
+    	csymm_(&side, &uplo, &m, &n,
+		&alpha, a, &lda, b, &ldb,
+		&beta, c, &ldc);
+	}
+}
+
+void BlasConnector::symm_cm(const char side, const char uplo, const int m, const int n,
+	const std::complex<double> alpha, const std::complex<double> *a, const int lda, const std::complex<double> *b, const int ldb,
+	const std::complex<double> beta, std::complex<double> *c, const int ldc, base_device::AbacusDevice_t device_type)
+{
+	if (device_type == base_device::AbacusDevice_t::CpuDevice) {
+		zsymm_(&side, &uplo, &m, &n,
+		&alpha, a, &lda, b, &ldb,
+		&beta, c, &ldc);
+	}
+}
+
+void BlasConnector::hemm_cm(char side, char uplo, int m, int n,
+    std::complex<float> alpha, std::complex<float> *a, int lda, std::complex<float> *b, int ldb,
+    std::complex<float> beta, std::complex<float> *c, int ldc, base_device::AbacusDevice_t device_type)
+{
+	if (device_type == base_device::AbacusDevice_t::CpuDevice) {
+    	chemm_(&side, &uplo, &m, &n,
+		&alpha, a, &lda, b, &ldb,
+		&beta, c, &ldc);
+	}
+}
+
+void BlasConnector::hemm_cm(char side, char uplo, int m, int n,
+	std::complex<double> alpha, std::complex<double> *a, int lda, std::complex<double> *b, int ldb,
+	std::complex<double> beta, std::complex<double> *c, int ldc, base_device::AbacusDevice_t device_type)
+{
+	if (device_type == base_device::AbacusDevice_t::CpuDevice) {
+		zhemm_(&side, &uplo, &m, &n,
+		&alpha, a, &lda, b, &ldb,
+		&beta, c, &ldc);
+	}
+}
+
 void BlasConnector::gemv(const char trans, const int m, const int n,
     const float alpha, const float* A, const int lda, const float* X, const int incx,
     const float beta, float* Y, const int incy, base_device::AbacusDevice_t device_type)
@@ -190,7 +332,6 @@ void BlasConnector::gemv(const char trans, const int m, const int n,
 }
 }
 
-
 // out = ||x||_2
 float BlasConnector::nrm2( const int n, const float *X, const int incX, base_device::AbacusDevice_t device_type )
 {