-
Notifications
You must be signed in to change notification settings - Fork 40
Open
Description
The current version of deepks-kit is hard to use and maintain, here are some problems & suggestions for improvement.
New functions for users
- Transition of final model. When the trainning process is finished for deepks+abacus, there should be a transition from
model.pthtomodel.pthin the final step, since the final model is the most frequently used one for latter works. - The visualization of scf process. In deepks-abacus, almost the vast majority of the time in the whole iteration is spent in the scf process. However, one can only see the general progress of iteration by watching
RECORDfile. And thetag_0_finishedfiles are only generated in the init step, which make it quite difficult to check the trainning process. Accordingly, there should be a convenient way to stop running and restart at any point. - The mpi/openmp parallelization of deepks-kit running.
- Check of data files in
.npy. There should be a test to check the size of each npy file at the very beginning of running, which could lessen the tedious checking works made by users theirselves. - A function to automatically spilt a whole dataset to train set and test set. Currently, users is required to prepare separate npy files for trainning and testing, which brings additional works. It's better to add a function and input parameter for users to split the dataset in ways they prefer directly in deepks-kit.
- Update of docs. Both user docs and developer docs should be updated.
- Compact input file. The number of input files is too large, and the parameter list is too long. Users may only need to modify a few parts in actual use, thus it is better to modify the reference file of the input file to retain only the necessary parameters, and put the complete parameter list and explanation in the user document.
- Dependence update. The current deepks-kit does not support newest version of ruamel-yaml and numpy.
Refactor suggestions
- File structure optimization. At present, the outermost structure is relatively clear, but the specific implementation of each file contains too many functions, resulting in a lot of file content is very long, contains too much content, inconvenient maintenance. It is recommended to separate utils folders and files based on functionality. (For example, train.py contains all training related functions and classes, the class implementation should be split out into a separate file like evaluator.py, etc.)
- Independent default value files. Currently, function realizations and default value settings (capital naming variables) are written together in different files. It's better to combile all default value lists into one file, which makes it easy to maintain in the future.
- Make the function and usage of some functions more clear. Some functions integrate multiple functions, but only by the type of input parameters, which makes it difficult to tell what function is used in the actual application of these functions, such as check_share_folder(), such functions need to be rethought for the more reasonable implementation.
- Simplify some functions. Some of the functions, such as gather_stats_abacus(), are written in long segments but for similar operations, it's better to simplify.
- Add the necessary comments and headers.
Bugs
- Support for pyscf. Both the master branch and the develop branch do not support the newest pyscf. Whether to continue the support for pyscf should be taken into concern.
Metadata
Metadata
Assignees
Labels
No labels