deepmodeling
diff --git a/‎.gitignore‎
Lines changed: 0 additions & 1 deletion b/‎.gitignore‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎CMakeLists.txt‎
Lines changed: 4 additions & 1 deletion b/‎CMakeLists.txt‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎docs/advanced/acceleration/cuda.md‎
Lines changed: 7 additions & 2 deletions b/‎docs/advanced/acceleration/cuda.md‎
Lines changed: 7 additions & 2 deletions
diff --git a/‎docs/advanced/input_files/input-main.md‎
Lines changed: 24 additions & 3 deletions b/‎docs/advanced/input_files/input-main.md‎
Lines changed: 24 additions & 3 deletions
diff --git a/‎docs/advanced/md.md‎
Lines changed: 1 addition & 50 deletions b/‎docs/advanced/md.md‎
Lines changed: 1 addition & 50 deletions
diff --git a/‎source/Makefile.Objects‎
Lines changed: 3 additions & 2 deletions b/‎source/Makefile.Objects‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎source/driver.cpp‎
Lines changed: 1 addition & 1 deletion b/‎source/driver.cpp‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎source/driver_run.cpp‎
Lines changed: 2 additions & 2 deletions b/‎source/driver_run.cpp‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎source/module_base/global_file.cpp‎
Lines changed: 3 additions & 3 deletions b/‎source/module_base/global_file.cpp‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎source/module_base/global_file.h‎
Lines changed: 1 addition & 1 deletion b/‎source/module_base/global_file.h‎
Lines changed: 1 addition & 1 deletion
@@ -23,4 +23,3 @@ time.json
 __pycache__
 abacus.json
 *.npy
-
@@ -492,7 +492,10 @@ if (ENABLE_CNPY)
     include_directories(${cnpy_INCLUDE_DIR})
   endif()
   include_directories(${cnpy_SOURCE_DIR})
-  target_link_libraries(${ABACUS_BIN_NAME} cnpy)
+  
+  # find ZLIB and link
+  find_package(ZLIB REQUIRED)
+  target_link_libraries(${ABACUS_BIN_NAME} cnpy ZLIB::ZLIB)
   add_compile_definitions(__USECNPY)
 endif()
 
 
@@ -29,11 +29,13 @@ To compile and use ABACUS in CUDA mode, you currently need to have an NVIDIA GPU
 
 Check the [Advanced Installation Options](https://abacus-rtd.readthedocs.io/en/latest/advanced/install.html#build-with-cuda-support) for the installation of CUDA version support.
 
+When the compilation parameter USE_ELPA is ON (which is the default value) and USE_CUDA is also set to ON, the ELPA library needs to [enable GPU support](https://github.com/marekandreas/elpa/blob/master/documentation/INSTALL.md) at compile time.
+
 ## Run with the GPU support by editing the INPUT script:
 
 In `INPUT` file we need to set the input parameter [device](../input_files/input-main.md#device) to `gpu`. If this parameter is not set, ABACUS will try to determine if there are available GPUs.
-- Set `ks_solver`: For the PW basis, CG, BPCG and Davidson methods are supported on GPU; set the input parameter [ks_solver](../input_files/input-main.md#ks_solver) to `cg`, `bpcg` or `dav`. For the LCAO basis, `cusolver` is supported on GPU.
-- **multi-card**: ABACUS allows for multi-GPU acceleration. If you have multiple GPU cards, you can run ABACUS with several MPI processes, and each process will utilize one GPU card. For example, the command `mpirun -n 2 abacus` will by default launch two GPUs for computation. If you only have one card, this command will only start one GPU.
+- Set `ks_solver`: For the PW basis, CG, BPCG and Davidson methods are supported on GPU; set the input parameter [ks_solver](../input_files/input-main.md#ks_solver) to `cg`, `bpcg` or `dav`. For the LCAO basis, `cusolver` and `elpa` is supported on GPU.
+- **multi-card**: ABACUS allows for multi-GPU acceleration. If you have multiple GPU cards, you can run ABACUS with several MPI processes, and each process will utilize one GPU card. For example, the command `mpirun -n 2 abacus` will by default launch two GPUs for computation. If you only have one card, this command will only start one GPU. 
 
 ## Examples
 We provides [examples](https://github.com/deepmodeling/abacus-develop/tree/develop/examples/gpu) of gpu calculations.
@@ -42,3 +44,6 @@ We provides [examples](https://github.com/deepmodeling/abacus-develop/tree/devel
 PW basis:
 - Only k point parallelization is supported, so the input keyword `kpar` will be set to match the number of MPI tasks automatically.
 - By default, CUDA architectures 60, 70, 75, 80, 86, and 89 are compiled (if supported). It can be overriden using the CMake variable [`CMAKE_CUDA_ARCHITECTURES`](https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html) or the environmental variable [`CUDAARCHS`](https://cmake.org/cmake/help/latest/envvar/CUDAARCHS.html).
+LCAO basis:
+- Unless there is a specific reason, avoid using multiple GPUs, as it can be slower than using a single GPU. This is because the generalized eigenvalue solution of the LCAO basis set will incur additional communication overhead when calculated on multiple cards. When the memory limit of a GPU card makes it insufficient to complete the task, it is recommended to use multiple cards for calculation.
+- When using elpa on GPUs, some ELPA internal logs will be output.
@@ -1515,17 +1515,17 @@ These variables are used to control the output of properties.
 
 - **Type**: Integer \[Integer\](optional)
 - **Description**: 
-  
   The first integer controls whether to output the charge density on real space grids:
   - 1. Output the charge density (in Bohr^-3) on real space grids into the density files in the folder `OUT.${suffix}`. The files are named as:
     - nspin = 1: SPIN1_CHG.cube;
     - nspin = 2: SPIN1_CHG.cube, and SPIN2_CHG.cube;
     - nspin = 4: SPIN1_CHG.cube, SPIN2_CHG.cube, SPIN3_CHG.cube, and SPIN4_CHG.cube.
-  - 2. On top of 1, also output the initial charge density. The files are named as:
+  - 2: On top of 1, also output the initial charge density. The files are named as:
     - nspin = 1: SPIN1_CHG_INI.cube
     - nspin = 2: SPIN1_CHG_INI.cube, and SPIN2_CHG_INI.cube;
     - nspin = 4: SPIN1_CHG_INI.cube, SPIN2_CHG_INI.cube, SPIN3_CHG_INI.cube, and SPIN4_CHG_INI.cube.
-  
+  - -1: disable the charge density auto-back-up file `{suffix}-CHARGE-DENSITY.restart`, useful for large systems.
+    
   The second integer controls the precision of the charge density output, if not given, will use `3` as default. For purpose restarting from this file and other high-precision involved calculation, recommend to use `10`.
 
   ---
@@ -2658,6 +2658,27 @@ These variables are used to control molecular dynamics calculations. For more in
 - **Description**: The filename of DP potential files, see [md.md](../md.md#dpmd) in detail.
 - **Default**: graph.pb
 
+### dp_rescaling
+
+- **Type**: Real
+- **Availability**: [esolver_type](#esolver_type) = `dp`.
+- **Description**: Rescaling factor to use a temperature-dependent DP. Energy, stress and force calculated by DP will be multiplied by this factor.
+- **Default**: 1.0
+
+### dp_fparam
+
+- **Type**: Real
+- **Availability**: [esolver_type](#esolver_type) = `dp`.
+- **Description**: The frame parameter for dp potential. The array size is dim_fparam, then all frames are assumed to be provided with the same fparam.
+- **Default**: {}
+
+### dp_aparam
+
+- **Type**: Real
+- **Availability**: [esolver_type](#esolver_type) = `dp`.
+- **Description**: The atomic parameter for dp potential. The array size can be (1) natoms x dim_aparam, then all frames are assumed to be provided with the same aparam; (2) dim_aparam, then all frames and atoms are assumed to be provided with the same aparam.
+- **Default**: {}
+
 ### msst_direction
 
 - **Type**: Integer
 
@@ -87,53 +87,4 @@ ABACUS performs the [Multi-Scale Shock Technique (MSST) integration](https://jou
 Compiling ABACUS with [DeePMD-kit](https://github.com/deepmodeling/deepmd-kit), MD calculations based on machine learning DP model is enabled.
 
 To employ DPMD calculations, [esolver_type](./input_files/input-main.md#esolver_type) should be set to `dp`.
-And the filename of DP model is specified by keyword [pot_file](./input_files/input-main.md#pot_file).
-
-First, we can find whether contains keyword `type_map` in the DP model through the shell command:
-```bash
-strings Al-SCAN.pb | grep type_map
-```
-
-```json
-{"model": {"type_map": ["Al"], "descriptor": {"type": "se_e2_a", "sel": [150], "rcut_smth": 0.5, "rcut": 6.0, "neuron": [25, 50, 100], "resnet_dt": false, "axis_neuron": 16, "seed": 1, "activation_function": "tanh", "type_one_side": false, "precision": "default", "trainable": true, "exclude_types": [], "set_davg_zero": false}, "fitting_net": {"neuron": [240, 240, 240], "resnet_dt": true, "seed": 1, "type": "ener", "numb_fparam": 0, "numb_aparam": 0, "activation_function": "tanh", "precision": "default", "trainable": true, "rcond": 0.001, "atom_ener": []}, "data_stat_nbatch": 10, "data_stat_protect": 0.01}, "learning_rate": {"type": "exp", "decay_steps": 5000, "start_lr": 0.001, "stop_lr": 3.51e-08, "scale_by_worker": "linear"}, "loss": {"type": "ener", "start_pref_e": 0.02, "limit_pref_e": 1, "start_pref_f": 1000, "limit_pref_f": 1, "start_pref_v": 0, "limit_pref_v": 0, "start_pref_ae": 0.0, "limit_pref_ae": 0.0, "start_pref_pf": 0.0, "limit_pref_pf": 0.0, "enable_atom_ener_coeff": false}, "training": {"training_data": {"systems": ["../deepmd_data/"], "batch_size": "auto", "set_prefix": "set", "auto_prob": "prob_sys_size", "sys_probs": null}, "validation_data": {"systems": ["../deepmd_validation"], "batch_size": 1, "numb_btch": 3, "set_prefix": "set", "auto_prob": "prob_sys_size", "sys_probs": null}, "numb_steps": 1000000, "seed": 10, "disp_file": "lcurve.out", "disp_freq": 100, "save_freq": 1000, "save_ckpt": "model.ckpt", "disp_training": true, "time_training": true, "profiling": false, "profiling_file": "timeline.json", "enable_profiler": false, "tensorboard": false, "tensorboard_log_dir": "log", "tensorboard_freq": 1}}
-```
-
-If the keyword `type_map` is found, ABACUS will match the atom types between `STRU` and DP model.
-
-Otherwise, all atom types must be specified in the `STRU` in the order consistent with that of the DP model, even if the number of atoms is zero!
-
-For example, there is a Al-Cu-Mg ternary-alloy DP model, but the simulated cell is a Al-Cu binary alloy. Then the `STRU` should be written as follows:
-
-```
-ATOMIC_SPECIES
-Al  26.982
-Cu  63.546
-Mg  24.305
-
-LATTICE_CONSTANT
-1.889727000000
-
-LATTICE_VECTORS
-4.0  0.0  0.0
-0.0  4.0  0.0
-0.0  0.0  4.0
-
-ATOMIC_POSITIONS
-Cartesian
-
-Al
-0
-2
-0.0  0.0  0.0
-0.5  0.5  0.0
-
-Cu
-0
-2
-0.5  0.0  0.5
-0.0  0.5  0.5
-
-Mg
-0
-0
-```
+And the filename of DP model is specified by keyword [pot_file](./input_files/input-main.md#pot_file).
@@ -183,7 +183,6 @@ OBJS_CELL=atom_pseudo.o\
     klist.o\
     cell_index.o\
     check_atomic_stru.o\
-    print_cif.o\
 
 OBJS_DEEPKS=LCAO_deepks.o\
         deepks_fgamma.o\
@@ -347,6 +346,7 @@ OBJS_HSOLVER_LCAO=hsolver_lcao.o\
       diago_scalapack.o\
       diago_lapack.o\
       diago_elpa.o\
+      diago_elpa_native.o\
       elpa_new.o\
       elpa_new_real.o\
       elpa_new_complex.o\
@@ -468,6 +468,7 @@ OBJS_IO=input_conv.o\
     write_dos_pw.o\
     nscf_band.o\
     cal_dos.o\
+    cif_io.o\
     dos_nao.o\
     numerical_descriptor.o\
     numerical_basis.o\
@@ -476,7 +477,7 @@ OBJS_IO=input_conv.o\
     print_info.o\
     read_cube.o\
     read_rho.o\
-    read_rhog.o\
+    rhog_io.o\
     read_exit_file.o\
     read_wfc_pw.o\
     restart.o\
 
@@ -44,7 +44,7 @@ void Driver::init()
     Print_Info::print_time(time_start, time_finish);
 
     // (4) close all of the running logs
-    ModuleBase::Global_File::close_all_log(GlobalV::MY_RANK, PARAM.inp.out_alllog);
+    ModuleBase::Global_File::close_all_log(GlobalV::MY_RANK, PARAM.inp.out_alllog,PARAM.inp.calculation);
 
     // (5) output the json file
     // Json::create_Json(&GlobalC::ucell.symm,GlobalC::ucell.atoms,&INPUT);
 
@@ -32,7 +32,7 @@ void Driver::driver_run() {
 
     // this warning should not be here, mohan 2024-05-22
 #ifndef __LCAO
-    if (GlobalV::BASIS_TYPE == "lcao_in_pw" || GlobalV::BASIS_TYPE == "lcao") {
+    if (PARAM.inp.basis_type == "lcao_in_pw" || PARAM.inp.basis_type == "lcao") {
         ModuleBase::WARNING_QUIT("driver",
                                  "to use LCAO basis, compile with __LCAO");
     }
@@ -55,7 +55,7 @@ void Driver::driver_run() {
     Json::gen_stru_wrapper(&GlobalC::ucell);
 #endif
 
-    const std::string cal_type = GlobalV::CALCULATION;
+    const std::string cal_type = PARAM.inp.calculation;
 
     //! 4: different types of calculations
     if (cal_type == "md")
 
@@ -243,7 +243,7 @@ void ModuleBase::Global_File::close_log( std::ofstream &ofs,const std::string &f
     return;
 }
 
-void ModuleBase::Global_File::close_all_log(const int rank, const bool out_alllog)
+void ModuleBase::Global_File::close_all_log(const int rank, const bool out_alllog,const std::string &calculation)
 {
 //----------------------------------------------------------
 // USE GLOBAL VARIABLES :
@@ -258,7 +258,7 @@ void ModuleBase::Global_File::close_all_log(const int rank, const bool out_alllo
     std::stringstream ss;
 	if(out_alllog)
 	{
-    	ss << "running_" << GlobalV::CALCULATION << "_cpu" << rank << ".log";
+    	ss << "running_" << calculation << "_cpu" << rank << ".log";
     	close_log(GlobalV::ofs_running,ss.str());
         #if defined(__CUDA) || defined(__ROCM)
         close_log(GlobalV::ofs_device, "device" + std::to_string(rank));
@@ -268,7 +268,7 @@ void ModuleBase::Global_File::close_all_log(const int rank, const bool out_alllo
 	{
 		if(rank==0)
 		{
-    		ss << "running_" << GlobalV::CALCULATION << ".log";
+    		ss << "running_" << calculation << ".log";
     		close_log(GlobalV::ofs_running,ss.str());
             #if defined(__CUDA) || defined(__ROCM)
             close_log(GlobalV::ofs_device, "device");
 
@@ -30,7 +30,7 @@ namespace Global_File
 	void make_dir_atom(const std::string &label);
 	void open_log ( std::ofstream &ofs, const std::string &fn, const std::string &calculation, const bool &restart);
 	void close_log( std::ofstream &ofs, const std::string &fn);
-	void close_all_log(const int rank, const bool out_alllog = false);
+	void close_all_log(const int rank, const bool out_alllog = false,const std::string &calculation = "md");
 
     /**
      * @brief delete tmperary files
Original file line number	Diff line number	Diff line change
`@@ -243,7 +243,7 @@ void ModuleBase::Global_File::close_log( std::ofstream &ofs,const std::string &f`
`243`	`243`	`return;`
`244`	`244`	`}`
`245`	`245`
`246`		`-void ModuleBase::Global_File::close_all_log(const int rank, const bool out_alllog)`
	`246`	`+void ModuleBase::Global_File::close_all_log(const int rank, const bool out_alllog,const std::string &calculation)`
`247`	`247`	`{`
`248`	`248`	`//----------------------------------------------------------`
`249`	`249`	`// USE GLOBAL VARIABLES :`
`@@ -258,7 +258,7 @@ void ModuleBase::Global_File::close_all_log(const int rank, const bool out_alllo`
`258`	`258`	`std::stringstream ss;`
`259`	`259`	`if(out_alllog)`
`260`	`260`	`{`
`261`		`- ss << "running_" << GlobalV::CALCULATION << "_cpu" << rank << ".log";`
	`261`	`+ ss << "running_" << calculation << "_cpu" << rank << ".log";`
`262`	`262`	`close_log(GlobalV::ofs_running,ss.str());`
`263`	`263`	`#if defined(__CUDA) \|\| defined(__ROCM)`
`264`	`264`	`close_log(GlobalV::ofs_device, "device" + std::to_string(rank));`
`@@ -268,7 +268,7 @@ void ModuleBase::Global_File::close_all_log(const int rank, const bool out_alllo`
`268`	`268`	`{`
`269`	`269`	`if(rank==0)`
`270`	`270`	`{`
`271`		`- ss << "running_" << GlobalV::CALCULATION << ".log";`
	`271`	`+ ss << "running_" << calculation << ".log";`
`272`	`272`	`close_log(GlobalV::ofs_running,ss.str());`
`273`	`273`	`#if defined(__CUDA) \|\| defined(__ROCM)`
`274`	`274`	`close_log(GlobalV::ofs_device, "device");`