|
| 1 | +--- |
| 2 | +title: "DP Tutorial 1: How to Setup a DeePMD-kit Training within 5 Minutes?" |
| 3 | +date: 2021-06-12 |
| 4 | +category: tutorial |
| 5 | +--- |
| 6 | + |
| 7 | +DeePMD-kit is a software to implement Deep Potential. |
| 8 | +There is a lot of information on the Internet, but there are not so many tutorials for the new hand, and the official guide is too long. |
| 9 | +Today, I’ll take you 5 minutes to get started with DeePMD-kit. |
| 10 | + |
| 11 | +Let's take a look at the training process of DeePMD-kit: |
| 12 | + |
| 13 | +{% mermaid graph LR %} |
| 14 | +A[Prepare data] --> B[Training] |
| 15 | +B --> C[Freeze the model] |
| 16 | +{% endmermaid %} |
| 17 | + |
| 18 | +What? Only three steps? |
| 19 | +Yes, it's that simple. |
| 20 | +Preparing data is converting the computational results of DFT to data that can be recongized by the DeePMD-kit. |
| 21 | +Training is train a Deep Potential model using the DeePMD-kit with data prepared in the previous step. |
| 22 | +Finally, what we need to do is to freeze the restart file in the training process into a model, in other words is to extract the neural network parameters into a file for subsequent use. |
| 23 | +I believe you can't wait to get started. Let's go! |
| 24 | + |
| 25 | +The data format of the DeePMD-kit is introduced in the [official document](https://deepmd.readthedocs.io/) but seems complex. |
| 26 | +Don't worry, I'd like to introduce a data processing tool: dpdata! |
| 27 | +You can use only one line Python scripts to process data. |
| 28 | +So easy! |
| 29 | + |
| 30 | +```py |
| 31 | +import dpdata |
| 32 | +dpdata.LabeledSystem('OUTCAR').to('deepmd/npy', 'data', set_size=200) |
| 33 | +``` |
| 34 | + |
| 35 | +In this example, we converted the computaional results of the VASP in the `OUTCAR` to the data format of the DeePMD-kit and saved in to a directory named `data`, |
| 36 | +where `npy` is the compressed format of the numpy, which is required by the DeePMD-kit training. |
| 37 | +We assume `OUTCAR` stores 1000 frames of molecular dynamics trajectory, then where will be 1000 points after converting. |
| 38 | +`set_size=200` means these 1000 points will be divided into 5 subsets, which is named as `data/set.000`~`data/set.004`, respectively. |
| 39 | +The size of each set is 200. |
| 40 | +In these 5 sets, `data/set.000`~`data/set.003` will be considered as the trainign set by the DeePMD-kit, and `data/set.004` will be considered as the test set. |
| 41 | +The last set will be considered as the test set by the DeePMD-kit by default. |
| 42 | +If there is only one set, the set will be both the training set and the test set. (Of course, such the test set is meaningless.) |
| 43 | +It's required to prepare an input script to start the DeePMD-kit training. |
| 44 | +Are you still out of the fear of being dominated by INCAR script? |
| 45 | +Don't worry, it's much easier to configure the DeePMD-kit than configuring the VASP. |
| 46 | +First, let's download an example and save to `input.json`: |
| 47 | + |
| 48 | +```sh |
| 49 | +wget https://raw.githubusercontent.com/deepmodeling/deepmd-kit/v1.3.3/examples/water/train/water_se_a.json -O input.json |
| 50 | +``` |
| 51 | + |
| 52 | +The strength of the DeePMD-kit is that the same training parameters are suitable for different systems, so we only need to slightly modify `input.json` to start training. |
| 53 | +Here is the first parameter to modify: |
| 54 | + |
| 55 | +```json |
| 56 | +"type_map": ["O", "H"], |
| 57 | +``` |
| 58 | + |
| 59 | +In the DeePMD-kit data, each atom type is numbered as an integer starting from 0. |
| 60 | +The parameter gices an element name to each atom in the numbering system. |
| 61 | +Here, we can copy from the content of `data/type_map.raw`. |
| 62 | +For example, |
| 63 | + |
| 64 | +```json |
| 65 | +"type_map": ["A", "B","C"], |
| 66 | +``` |
| 67 | + |
| 68 | +Next, we are going to modify the neighbour searching parameter: |
| 69 | + |
| 70 | +```json |
| 71 | +"sel": [46, 92], |
| 72 | +``` |
| 73 | + |
| 74 | +Each number in this list gives the maximum number of atoms of each type among neighbor atoms of an atom. |
| 75 | +For example, `46` means there are at most 46 `O` (type `0`) neighbours. |
| 76 | +Here, our elements were modified to `A`, `B`, and `C`, so this parameters is also required to modify. |
| 77 | +What to do if you don’t know the maximum number of neighbors? |
| 78 | +You can be roughly estimate one by the density of the system, or try a number blindly. |
| 79 | +If it is not big enough, the DeePMD-kit will tell you. |
| 80 | +Below we changed it to |
| 81 | + |
| 82 | +```json |
| 83 | +"sel": [64, 64, 64] |
| 84 | +``` |
| 85 | + |
| 86 | +In addtion, we need to modify |
| 87 | + |
| 88 | +``` |
| 89 | +"systems": ["../data/"], |
| 90 | +``` |
| 91 | + |
| 92 | +to |
| 93 | + |
| 94 | +``` |
| 95 | +"systems": ["./data/"], |
| 96 | +``` |
| 97 | + |
| 98 | +It is the reason that the directory to write to is `./data/` in the current directory. |
| 99 | +Here I'd like to introduce the defination of the data system. |
| 100 | +The DeePMD-kit considers that data with the same atomic number and elements can form a system. |
| 101 | +Our data is generated from a molecular dynamics simulation and meets this condition, so we can put them into one system. |
| 102 | +Dpdata also did so. |
| 103 | +If data cannot be put into a system, multiple systems is required to be set as a list. |
| 104 | + |
| 105 | +Finnally, we are likely to modify another two parameters: |
| 106 | + |
| 107 | +```json |
| 108 | +"stop_batch": 1000000, |
| 109 | +"batch_size": 1, |
| 110 | +``` |
| 111 | +`stop_batch` is the numebr of training step using the SGD method of deep learning, and `batch_size` is the mini-batch size of data in each step. |
| 112 | +If we want to reduce `stop_batch` and use `batch_size` that the DeePMD-kit recommends, we can use |
| 113 | + |
| 114 | +```json |
| 115 | +"stop_batch": 500000, |
| 116 | +"batch_size": "auto", |
| 117 | +``` |
| 118 | + |
| 119 | +Now we have succesfully set a input file! To start training, we execuate |
| 120 | + |
| 121 | +```sh |
| 122 | +dp train input.json |
| 123 | +``` |
| 124 | + |
| 125 | +and wait for results. During the training process, we can see `lcurve.out` to observe the error reduction. |
| 126 | +Among them, Column 4 and 5 are the test and training errors of energy (normalize the number of atoms), and Column 6 and 7 are the test and training errors of the force. |
| 127 | + |
| 128 | +After training, we can use the following script to freeze the model: |
| 129 | + |
| 130 | +```sh |
| 131 | +dp freeze |
| 132 | +``` |
| 133 | + |
| 134 | +The default filename of the output model is `frozen_model.pb`. As so, we have got a good or bad DP model. |
| 135 | +As for the reliability of this model and how to use it, I will give you a detailed tutorial in the next post. |
| 136 | + |
0 commit comments