Skip to content

Commit 066089d

Browse files
authored
Merge pull request #94 from deepmodeling/devel
merge the development on the devel branch into master
2 parents 3be3cf9 + 7d0d617 commit 066089d

28 files changed

+8315
-32
lines changed

README.md

Lines changed: 30 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,34 @@ The labels provided in the `OUTCAR`, i.e. energies, forces and virials (if any),
5151

5252
The `System` or `LabeledSystem` can be constructed from the following file formats with the `format key` in the table passed to argument `fmt`:
5353

54+
| Software| format | multi frames | labeled | class | format key |
55+
| ------- | :--- | :---: | :---: | :--- | :--- |
56+
| vasp | poscar | False | False | System | 'vasp/poscar' |
57+
| vasp | outcar | True | True | LabeledSystem | 'vasp/outcar' |
58+
| vasp | xml | True | True | LabeledSystem | 'vasp/xml' |
59+
| lammps | lmp | False | False | System | 'lammps/lmp' |
60+
| lammps | dump | True | False | System | 'lammps/dump' |
61+
| deepmd | raw | True | False | System | 'deepmd/raw' |
62+
| deepmd | npy | True | False | System | 'deepmd/npy' |
63+
| deepmd | raw | True | True | LabeledSystem | 'deepmd/raw' |
64+
| deepmd | npy | True | True | LabeledSystem | 'deepmd/npy' |
65+
| gaussian| log | False | True | LabeledSystem | 'gaussian/log'|
66+
| gaussian| log | True | True | LabeledSystem | 'gaussian/md' |
67+
| siesta | output | False | True | LabeledSystem | 'siesta/output'|
68+
| siesta | aimd_output | True | True | LabeledSystem | 'siesta/aimd_output' |
69+
| cp2k | output | False | True | LabeledSystem | 'cp2k/output' |
70+
| cp2k | aimd_output | True | True | LabeledSystem | 'cp2k/aimd_output' |
71+
| QE | log | False | True | LabeledSystem | 'qe/pw/scf' |
72+
| QE | log | True | False | System | 'qe/cp/traj' |
73+
| QE | log | True | True | LabeledSystem | 'qe/cp/traj' |
74+
|quip/gap|xyz|True|True|MultiSystems|'quip/gap/xyz'|
75+
| PWmat | atom.config | False | False | System | 'pwmat/atom.config' |
76+
| PWmat | movement | True | True | LabeledSystem | 'pwmat/movement' |
77+
| PWmat | OUT.MLMD | True | True | LabeledSystem | 'pwmat/out.mlmd' |
78+
| Amber | multi | True | True | LabeledSystem | 'amber/md' |
79+
| Gromacs | gro | False | False | System | 'gromacs/gro' |
80+
81+
5482
The Class `dpdata.MultiSystems` can read data from a dir which may contains many files of different systems, or from single xyz file which contains different systems.
5583

5684
Use `dpdata.MultiSystems.from_dir` to read from a directory, `dpdata.MultiSystems` will walk in the directory
@@ -82,34 +110,8 @@ xyz_multi_systems.systems['B1C9'].to_deepmd_raw('./my_work_dir/B1C9_raw')
82110

83111
# dump all systems
84112
xyz_multi_systems.to_deepmd_raw('./my_deepmd_data/')
85-
86-
87113
```
88114

89-
| Software| format | multi frames | labeled | class | format key |
90-
| ------- | :--- | :---: | :---: | :--- | :--- |
91-
| vasp | poscar | False | False | System | 'vasp/poscar' |
92-
| vasp | outcar | True | True | LabeledSystem | 'vasp/outcar' |
93-
| vasp | xml | True | True | LabeledSystem | 'vasp/xml' |
94-
| lammps | lmp | False | False | System | 'lammps/lmp' |
95-
| lammps | dump | True | False | System | 'lammps/dump' |
96-
| deepmd | raw | True | False | System | 'deepmd/raw' |
97-
| deepmd | npy | True | False | System | 'deepmd/npy' |
98-
| deepmd | raw | True | True | LabeledSystem | 'deepmd/raw' |
99-
| deepmd | npy | True | True | LabeledSystem | 'deepmd/npy' |
100-
| gaussian| log | False | True | LabeledSystem | 'gaussian/log'|
101-
| gaussian| log | True | True | LabeledSystem | 'gaussian/md' |
102-
| siesta | output | False | True | LabeledSystem | 'siesta/output'|
103-
| siesta | aimd_output | True | True | LabeledSystem | 'siesta/aimd_output' |
104-
| cp2k | output | False | True | LabeledSystem | 'cp2k/output' |
105-
| cp2k | aimd_output | True | True | LabeledSystem | 'cp2k/aimd_output' |
106-
| QE | log | False | True | LabeledSystem | 'qe/pw/scf' |
107-
| QE | log | True | False | System | 'qe/cp/traj' |
108-
| QE | log | True | True | LabeledSystem | 'qe/cp/traj' |
109-
|quip/gap|xyz|True|True|MultiSystems|'quip/gap/xyz'|
110-
| PWmat | atom.config | False | False | System | 'pwmat/atom.config' |
111-
| PWmat | movement | True | True | LabeledSystem | 'pwmat/movement' |
112-
| PWmat | OUT.MLMD | True | True | LabeledSystem | 'pwmat/out.mlmd' |
113115
## Access data
114116
These properties stored in `System` and `LabeledSystem` can be accessed by operator `[]` with the key of the property supplied, for example
115117
```python
@@ -129,7 +131,6 @@ Available properties are (nframe: number of frames in the system, natoms: total
129131
| 'virials' | np.ndarray | nframes x 3 x 3 | True | The virial tensor of each frame
130132

131133

132-
133134
## Dump data
134135
The data stored in `System` or `LabeledSystem` can be dumped in 'lammps/lmp' or 'vasp/poscar' format, for example:
135136
```python
@@ -141,7 +142,6 @@ d_outcar.to('vasp/poscar', 'POSCAR', frame_idx=-1)
141142
```
142143
The last frames of `d_outcar` will be dumped to 'POSCAR'.
143144

144-
145145
The data stored in `LabeledSystem` can be dumped to deepmd-kit raw format, for example
146146
```python
147147
d_outcar.to('deepmd/raw', 'dpmd_raw')
@@ -156,13 +156,15 @@ dpdata.LabeledSystem('OUTCAR').sub_system([0,-1]).to('deepmd/raw', 'dpmd_raw')
156156
```
157157
by which only the first and last frames are dumped to `dpmd_raw`.
158158

159+
159160
## replicate
160161
dpdata will create a super cell of the current atom configuration.
161162
```python
162163
dpdata.System('./POSCAR').replicate((1,2,3,) )
163164
```
164165
tuple(1,2,3) means don't copy atom configuration in x direction, make 2 copys in y direction, make 3 copys in z direction.
165166

167+
166168
## perturb
167169
By the following example, each frame of the original system (`dpdata.System('./POSCAR')`) is perturbed to generate three new frames. For each frame, the cell is perturbed by 5% and the atom positions are perturbed by 0.6 Angstrom. `atom_pert_style` indicates that the perturbation to the atom positions is subject to normal distribution. Other available options to `atom_pert_style` are`uniform` (uniform in a ball), and `const` (uniform on a sphere).
168170
```python

dpdata/amber/md.py

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
import re
2+
from scipy.io import netcdf
3+
import numpy as np
4+
5+
kcalmol2eV= 0.04336410390059322
6+
7+
energy_convert = kcalmol2eV
8+
force_convert = energy_convert
9+
10+
11+
def read_amber_traj(parm7_file, nc_file, mdfrc_file, mden_file):
12+
"""The amber trajectory includes:
13+
* nc, NetCDF format, stores coordinates
14+
* mdfrc, NetCDF format, stores forces
15+
* mden, text format, stores energies
16+
* parm7, text format, stores types
17+
"""
18+
19+
flag=False
20+
amber_types = []
21+
with open(parm7_file) as f:
22+
for line in f:
23+
if line.startswith("%FLAG"):
24+
flag = line.startswith("%FLAG AMBER_ATOM_TYPE")
25+
elif flag:
26+
if line.startswith("%FORMAT"):
27+
fmt = re.findall(r'\d+', line)
28+
fmt0 = int(fmt[0])
29+
fmt1 = int(fmt[1])
30+
else:
31+
for ii in range(fmt0):
32+
start_index = ii * fmt1
33+
end_index = (ii + 1) * fmt1
34+
if end_index >= len(line):
35+
continue
36+
amber_types.append(line[start_index:end_index].strip())
37+
38+
with netcdf.netcdf_file(nc_file, 'r') as f:
39+
coords = np.array(f.variables["coordinates"][:])
40+
cell_lengths = np.array(f.variables["cell_lengths"][:])
41+
cell_angles = np.array(f.variables["cell_angles"][:])
42+
if np.all(cell_angles > 89.99 ) and np.all(cell_angles < 90.01):
43+
# only support 90
44+
# TODO: support other angles
45+
shape = cell_lengths.shape
46+
cells = np.zeros((shape[0], 3, 3))
47+
for ii in range(3):
48+
cells[:, ii, ii] = cell_lengths[:, ii]
49+
else:
50+
raise RuntimeError("Unsupported cells")
51+
52+
with netcdf.netcdf_file(mdfrc_file, 'r') as f:
53+
forces = np.array(f.variables["forces"][:])
54+
55+
# energy
56+
energies = []
57+
with open(mden_file) as f:
58+
for line in f:
59+
if line.startswith("L6"):
60+
s = line.split()
61+
if s[2] != "E_pot":
62+
energies.append(float(s[2]))
63+
64+
atom_names, atom_types, atom_numbs = np.unique(amber_types, return_inverse=True, return_counts=True)
65+
66+
data = {}
67+
data['atom_names'] = list(atom_names)
68+
data['atom_numbs'] = list(atom_numbs)
69+
data['atom_types'] = atom_types
70+
data['forces'] = forces * force_convert
71+
data['energies'] = np.array(energies) * energy_convert
72+
data['coords'] = coords
73+
data['cells'] = cells
74+
data['orig'] = np.array([0, 0, 0])
75+
return data
76+

dpdata/fhi_aims/__init__.py

Whitespace-only changes.

dpdata/fhi_aims/output.py

Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
import numpy as np
2+
import re
3+
4+
latt_patt="\|\s+([0-9]{1,}[.][0-9]*)\s+([0-9]{1,}[.][0-9]*)\s+([0-9]{1,}[.][0-9]*)"
5+
pos_patt_first="\|\s+[0-9]{1,}[:]\s\w+\s(\w+)(\s.*[-]?[0-9]{1,}[.][0-9]*)(\s+[-]?[0-9]{1,}[.][0-9]*)(\s+[-]?[0-9]{1,}[.][0-9]*)"
6+
pos_patt_other="\s+[a][t][o][m]\s+([-]?[0-9]{1,}[.][0-9]*)\s+([-]?[0-9]{1,}[.][0-9]*)\s+([-]?[0-9]{1,}[.][0-9]*)\s+(\w{1,2})"
7+
force_patt="\|\s+[0-9]{1,}\s+([-]?[0-9]{1,}[.][0-9]*[E][+-][0-9]{1,})\s+([-]?[0-9]{1,}[.][0-9]*[E][+-][0-9]{1,})\s+([-]?[0-9]{1,}[.][0-9]*[E][+-][0-9]{1,})"
8+
eng_patt="Total energy uncorrected.*([-]?[0-9]{1,}[.][0-9]*[E][+-][0-9]{1,})\s+eV"
9+
#atom_numb_patt="Number of atoms.*([0-9]{1,})"
10+
11+
def get_info (lines, type_idx_zero = False) :
12+
13+
atom_types = []
14+
atom_names = []
15+
cell = []
16+
atom_numbs = None
17+
_atom_names = []
18+
19+
contents="\n".join(lines)
20+
#cell
21+
#_tmp=re.findall(latt_patt,contents)
22+
#for ii in _tmp:
23+
# vect=[float(kk) for kk in ii]
24+
# cell.append(vect)
25+
#------------------
26+
for ln,l in enumerate(lines):
27+
if l.startswith(' | Unit cell'):
28+
break
29+
_tmp=lines[ln+1:ln+4]
30+
for ii in _tmp:
31+
v_str=ii.split('|')[1].split()
32+
vect=[float(kk) for kk in v_str]
33+
cell.append(vect)
34+
# print(cell)
35+
#atom name
36+
_tmp=re.findall(pos_patt_first,contents)
37+
for ii in _tmp:
38+
_atom_names.append(ii[0])
39+
atom_names=[]
40+
for ii in _atom_names:
41+
if not ii in atom_names:
42+
atom_names.append(ii)
43+
#atom number
44+
#_atom_numb_patt=re.compile(atom_numb_patt)
45+
atom_numbs =[_atom_names.count(ii) for ii in atom_names]
46+
assert(atom_numbs is not None), "cannot find ion type info in aims output"
47+
48+
for idx,ii in enumerate(atom_numbs) :
49+
for jj in range(ii) :
50+
if type_idx_zero :
51+
atom_types.append(idx)
52+
else :
53+
atom_types.append(idx+1)
54+
55+
return [cell, atom_numbs, atom_names, atom_types ]
56+
57+
58+
def get_fhi_aims_block(fp) :
59+
blk = []
60+
for ii in fp :
61+
if not ii :
62+
return blk
63+
blk.append(ii.rstrip('\n'))
64+
if 'Begin self-consistency loop: Re-initialization' in ii:
65+
return blk
66+
return blk
67+
68+
def get_frames (fname, md=True, begin = 0, step = 1) :
69+
fp = open(fname)
70+
blk = get_fhi_aims_block(fp)
71+
ret = get_info(blk, type_idx_zero = True)
72+
73+
cell, atom_numbs, atom_names, atom_types =ret[0],ret[1],ret[2],ret[3]
74+
ntot = sum(atom_numbs)
75+
76+
all_coords = []
77+
all_cells = []
78+
all_energies = []
79+
all_forces = []
80+
all_virials = []
81+
82+
cc = 0
83+
while len(blk) > 0 :
84+
# with open(str(cc),'w') as f:
85+
# f.write('\n'.join(blk))
86+
if cc >= begin and (cc - begin) % step == 0 :
87+
if cc==0:
88+
coord, _cell, energy, force, virial, is_converge = analyze_block(blk, first_blk=True, md=md)
89+
else:
90+
coord, _cell, energy, force, virial, is_converge = analyze_block(blk, first_blk=False)
91+
if is_converge :
92+
if len(coord) == 0:
93+
break
94+
all_coords.append(coord)
95+
96+
if _cell:
97+
all_cells.append(_cell)
98+
else:
99+
all_cells.append(cell)
100+
101+
all_energies.append(energy)
102+
all_forces.append(force)
103+
if virial is not None :
104+
all_virials.append(virial)
105+
blk = get_fhi_aims_block(fp)
106+
cc += 1
107+
108+
if len(all_virials) == 0 :
109+
all_virials = None
110+
else :
111+
all_virials = np.array(all_virials)
112+
fp.close()
113+
return atom_names, atom_numbs, np.array(atom_types), np.array(all_cells), np.array(all_coords), np.array(all_energies), np.array(all_forces), all_virials
114+
115+
116+
def analyze_block(lines, first_blk=False, md=True) :
117+
coord = []
118+
cell = []
119+
energy = None
120+
force = []
121+
virial = None
122+
atom_names=[]
123+
_atom_names=[]
124+
125+
contents="\n".join(lines)
126+
try:
127+
natom=int(re.findall("Number of atoms.*([0-9]{1,})",lines)[0])
128+
except:
129+
natom=0
130+
131+
if first_blk:
132+
133+
if md:
134+
_tmp=re.findall(pos_patt_other,contents)[:]
135+
for ii in _tmp[slice(int(len(_tmp)/2),len(_tmp))]:
136+
coord.append([float(kk) for kk in ii[:-1]])
137+
else:
138+
_tmp=re.findall(pos_patt_first,contents)
139+
for ii in _tmp:
140+
coord.append([float(kk) for kk in ii[1:]])
141+
else:
142+
_tmp=re.findall(pos_patt_other,contents)
143+
for ii in _tmp:
144+
coord.append([float(kk) for kk in ii[:-1]])
145+
146+
_tmp=re.findall(force_patt,contents)
147+
for ii in _tmp:
148+
force.append([float(kk) for kk in ii])
149+
150+
if "Self-consistency cycle converged" in contents:
151+
is_converge=True
152+
else:
153+
is_converge=False
154+
155+
try:
156+
_eng_patt=re.compile(eng_patt)
157+
energy=float(_eng_patt.search(contents).group().split()[-2])
158+
except:
159+
energy=None
160+
161+
if not energy:
162+
is_converge = False
163+
164+
if energy:
165+
assert((force is not None) and len(coord) > 0 )
166+
167+
return coord, cell, energy, force, virial, is_converge
168+
169+
if __name__=='__main__':
170+
import sys
171+
ret=get_frames (sys.argv[1], begin = 0, step = 1)
172+
print(ret)

dpdata/gromacs/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)