Skip to content

Commit 7f8cdfb

Browse files
authored
add tutorial1
1 parent c7522bd commit 7f8cdfb

File tree

1 file changed

+136
-0
lines changed

1 file changed

+136
-0
lines changed

source/_posts/tutorial1.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
---
2+
title: "DP Tutorial 1: How to Setup a DeePMD-kit Training within 5 Minutes?"
3+
date: 2021-06-12
4+
category: tutorial
5+
---
6+
7+
DeePMD-kit is a software to implement Deep Potential.
8+
There is a lot of information on the Internet, but there are not so many tutorials for the new hand, and the official guide is too long.
9+
Today, I’ll take you 5 minutes to get started with DeePMD-kit.
10+
11+
Let's take a look at the training process of DeePMD-kit:
12+
13+
{% mermaid graph LR %}
14+
A[Prepare data] --> B[Training]
15+
B --> C[Freeze the model]
16+
{% endmermaid %}
17+
18+
What? Only three steps?
19+
Yes, it's that simple.
20+
Preparing data is converting the computational results of DFT to data that can be recongized by the DeePMD-kit.
21+
Training is train a Deep Potential model using the DeePMD-kit with data prepared in the previous step.
22+
Finally, what we need to do is to freeze the restart file in the training process into a model, in other words is to extract the neural network parameters into a file for subsequent use.
23+
I believe you can't wait to get started. Let's go!
24+
25+
The data format of the DeePMD-kit is introduced in the [official document](https://deepmd.readthedocs.io/) but seems complex.
26+
Don't worry, I'd like to introduce a data processing tool: dpdata!
27+
You can use only one line Python scripts to process data.
28+
So easy!
29+
30+
```py
31+
import dpdata
32+
dpdata.LabeledSystem('OUTCAR').to('deepmd/npy', 'data', set_size=200)
33+
```
34+
35+
In this example, we converted the computaional results of the VASP in the `OUTCAR` to the data format of the DeePMD-kit and saved in to a directory named `data`,
36+
where `npy` is the compressed format of the numpy, which is required by the DeePMD-kit training.
37+
We assume `OUTCAR` stores 1000 frames of molecular dynamics trajectory, then where will be 1000 points after converting.
38+
`set_size=200` means these 1000 points will be divided into 5 subsets, which is named as `data/set.000`~`data/set.004`, respectively.
39+
The size of each set is 200.
40+
In these 5 sets, `data/set.000`~`data/set.003` will be considered as the trainign set by the DeePMD-kit, and `data/set.004` will be considered as the test set.
41+
The last set will be considered as the test set by the DeePMD-kit by default.
42+
If there is only one set, the set will be both the training set and the test set. (Of course, such the test set is meaningless.)
43+
It's required to prepare an input script to start the DeePMD-kit training.
44+
Are you still out of the fear of being dominated by INCAR script?
45+
Don't worry, it's much easier to configure the DeePMD-kit than configuring the VASP.
46+
First, let's download an example and save to `input.json`:
47+
48+
```sh
49+
wget https://raw.githubusercontent.com/deepmodeling/deepmd-kit/v1.3.3/examples/water/train/water_se_a.json -O input.json
50+
```
51+
52+
The strength of the DeePMD-kit is that the same training parameters are suitable for different systems, so we only need to slightly modify `input.json` to start training.
53+
Here is the first parameter to modify:
54+
55+
```json
56+
"type_map": ["O", "H"],
57+
```
58+
59+
In the DeePMD-kit data, each atom type is numbered as an integer starting from 0.
60+
The parameter gices an element name to each atom in the numbering system.
61+
Here, we can copy from the content of `data/type_map.raw`.
62+
For example,
63+
64+
```json
65+
"type_map": ["A", "B","C"],
66+
```
67+
68+
Next, we are going to modify the neighbour searching parameter:
69+
70+
```json
71+
"sel": [46, 92],
72+
```
73+
74+
Each number in this list gives the maximum number of atoms of each type among neighbor atoms of an atom.
75+
For example, `46` means there are at most 46 `O` (type `0`) neighbours.
76+
Here, our elements were modified to `A`, `B`, and `C`, so this parameters is also required to modify.
77+
What to do if you don’t know the maximum number of neighbors?
78+
You can be roughly estimate one by the density of the system, or try a number blindly.
79+
If it is not big enough, the DeePMD-kit will tell you.
80+
Below we changed it to
81+
82+
```json
83+
"sel": [64, 64, 64]
84+
```
85+
86+
In addtion, we need to modify
87+
88+
```
89+
"systems": ["../data/"],
90+
```
91+
92+
to
93+
94+
```
95+
"systems": ["./data/"],
96+
```
97+
98+
It is the reason that the directory to write to is `./data/` in the current directory.
99+
Here I'd like to introduce the defination of the data system.
100+
The DeePMD-kit considers that data with the same atomic number and elements can form a system.
101+
Our data is generated from a molecular dynamics simulation and meets this condition, so we can put them into one system.
102+
Dpdata also did so.
103+
If data cannot be put into a system, multiple systems is required to be set as a list.
104+
105+
Finnally, we are likely to modify another two parameters:
106+
107+
```json
108+
"stop_batch": 1000000,
109+
"batch_size": 1,
110+
```
111+
`stop_batch` is the numebr of training step using the SGD method of deep learning, and `batch_size` is the mini-batch size of data in each step.
112+
If we want to reduce `stop_batch` and use `batch_size` that the DeePMD-kit recommends, we can use
113+
114+
```json
115+
"stop_batch": 500000,
116+
"batch_size": "auto",
117+
```
118+
119+
Now we have succesfully set a input file! To start training, we execuate
120+
121+
```sh
122+
dp train input.json
123+
```
124+
125+
and wait for results. During the training process, we can see `lcurve.out` to observe the error reduction.
126+
Among them, Column 4 and 5 are the test and training errors of energy (normalize the number of atoms), and Column 6 and 7 are the test and training errors of the force.
127+
128+
After training, we can use the following script to freeze the model:
129+
130+
```sh
131+
dp freeze
132+
```
133+
134+
The default filename of the output model is `frozen_model.pb`. As so, we have got a good or bad DP model.
135+
As for the reliability of this model and how to use it, I will give you a detailed tutorial in the next post.
136+

0 commit comments

Comments
 (0)