Skip to content

Commit 6854119

Browse files
[Add]Moflow代码paddlescience复现 (#962)
* Update mkdocs.yml * DrugDiscovery Review changed * Fix DrugDiscovery Review changed * Add files via upload * Update moflow_optimize.yaml hidden: '16' -> 16 * Update optimize_moflow.py hidden: '16' ->16, 直接赋值 * Update moflow_basic.py repair create_parameter * Update moflow_dataset.py 特殊依赖包rdkit的处理,在try中导入 * Update moflow_net.py loss return by Dict * Update moflow_net.py * [Delete] ppsci/arch/fpscores.pkl.gz * [Delete] ppsci/arch/moflow_utils.py * [Add] moflow_utils.py * [Fix] moflow_test.yaml Add batch_size * [FIX] optimize_moflow.py 导入moflow_utils修改 * [Fix] test_generate.py 导入moflow_utils修改 * [Fix] train_moflow.py 导入moflow_utils修改 * [FIX] OrderedSet替换成set OrderedSet替换成set,删除对应的包 * 【FIX】删除OrderedSet包 * [FIX] 文档显示问题 * [Fix] Re-upload DrugDiscovery Review changed * [Fix] Repair code-style-check * [Fix] Repair train_moflow & mkdocs code-style-check * [Fix] docs/zh/examples/moflow.md Co-authored-by: HydrogenSulfate <[email protected]> * [Fix] hidden_size error fix 'int' vh = (self.latent_size,) + tuple(hidden_size) + (1,) object is not iterable * [fix] moflow.md view * [FIX] delete space in P moflow.md mat * [fix] repair view Update moflow.md * [fix] change result save path * [fix] change result save path * [Fix] ConstantInitializer -> Constant * [Fix] 增加生成方式说明 * [Fix] optimize model path EVAL->TRAIN Co-authored-by: HydrogenSulfate <[email protected]> * [Fix]pretrained model path Co-authored-by: HydrogenSulfate <[email protected]> * [FIX]添加模型生成模式和优化相关说明 添加模型推理评估命令的四种模式的说明,优化后产生的模型和使用说明 * [FIX]修复运行命令 根据增加的模型修改运行的命令和说明 * [FIX] Repair code-style-check * [Fix] Repair code-style-check --------- Co-authored-by: HydrogenSulfate <[email protected]>
1 parent 6c720a4 commit 6854119

17 files changed

+4372
-0
lines changed

docs/zh/examples/moflow.md

Lines changed: 361 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
defaults:
2+
- ppsci_default
3+
- TRAIN: train_default
4+
- TRAIN/ema: ema_default
5+
- TRAIN/swa: swa_default
6+
- EVAL: eval_default
7+
- INFER: infer_default
8+
- hydra/job/config/override_dirname/exclude_keys: exclude_keys_default
9+
- _self_
10+
hydra:
11+
run:
12+
# dynamic output directory according to running time and override name
13+
dir: outputs_moflow_optimize/${data_name}
14+
job:
15+
name: ${mode} # name of logfile
16+
chdir: false # keep current working directory unchanged
17+
sweep:
18+
# output directory for multirun
19+
dir: ${hydra.run.dir}
20+
subdir: ./
21+
22+
# general settings
23+
mode: train # running mode: train/eval
24+
data_name: zinc250k # data select:qm9/zinc250k
25+
seed: 1
26+
output_dir: ${hydra:run.dir}
27+
log_freq: 20
28+
29+
# set training hyper-parameters
30+
qm9:
31+
b_n_flow: 10
32+
b_n_block: 1
33+
b_hidden_ch: [128,128]
34+
a_n_flow: 27
35+
a_n_block: 1
36+
a_hidden_gnn: [64]
37+
a_hidden_lin: [128,64]
38+
mask_row_size_list: [1]
39+
mask_row_stride_list: [1]
40+
learn_dist: True
41+
noise_scale: 0.6
42+
b_conv_lu: 1
43+
atomic_num_list: [6, 7, 8, 9, 0]
44+
b_n_type: 4
45+
b_n_squeeze: 3
46+
a_n_node: 9
47+
valid_idx: valid_idx_qm9.json
48+
label_keys: ['A', 'B', 'C', 'mu', 'alpha', 'homo', 'lumo', 'gap', 'r2', 'zpve', 'U0', 'U', 'H', 'G', 'Cv']
49+
smiles_col: SMILES1
50+
51+
zinc250k:
52+
b_n_flow: 10
53+
b_n_block: 1
54+
b_hidden_ch: [512,512]
55+
a_n_flow: 38
56+
a_n_block: 1
57+
a_hidden_gnn: [256]
58+
a_hidden_lin: [512,64]
59+
mask_row_size_list: [1]
60+
mask_row_stride_list: [1]
61+
learn_dist: True
62+
noise_scale: 0.6
63+
b_conv_lu: 2
64+
atomic_num_list: [6, 7, 8, 9, 15, 16, 17, 35, 53, 0]
65+
b_n_type: 4
66+
b_n_squeeze: 19
67+
a_n_node: 38
68+
valid_idx: valid_idx_zinc.json
69+
label_keys: ['logP', 'qed', 'SAS']
70+
smiles_col: smiles
71+
72+
# set data path
73+
FILE_PATH: ./datasets/moflow
74+
75+
# model settings
76+
MODEL:
77+
input_keys: ["nodes", "edges"]
78+
output_keys: ["output", "sum_log_det"]
79+
hyper_params: null
80+
81+
MODEL_Prop:
82+
input_keys: ["nodes", "edges"]
83+
output_keys: ["output", "sum_log_det"]
84+
model: null
85+
hidden_size: null
86+
87+
# evaluation settings
88+
EVAL:
89+
pretrained_model_path: null
90+
compute_metric_by_batch: false
91+
eval_with_no_grad: true
92+
batch_size: 100
93+
94+
# optimize settings
95+
OPTIMIZE:
96+
property_name: plogp # qed/plogp
97+
batch_size: 256
98+
topk: 800
99+
debug: false
100+
topscore: false
101+
max_epochs: 3
102+
learning_rate: 0.001
103+
weight_decay: 1e-2
104+
hidden: [16] # Hidden dimension list for output regression
105+
temperature: 1.0
106+
consopt: true
107+
sim_cutoff: 0.0

examples/moflow/conf/moflow_test.yaml

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
defaults:
2+
- ppsci_default
3+
- TRAIN: train_default
4+
- TRAIN/ema: ema_default
5+
- TRAIN/swa: swa_default
6+
- EVAL: eval_default
7+
- INFER: infer_default
8+
- hydra/job/config/override_dirname/exclude_keys: exclude_keys_default
9+
- _self_
10+
hydra:
11+
run:
12+
# dynamic output directory according to running time and override name
13+
dir: outputs_moflow_test/${data_name}
14+
job:
15+
name: ${mode} # name of logfile
16+
chdir: false # keep current working directory unchanged
17+
sweep:
18+
# output directory for multirun
19+
dir: ${hydra.run.dir}
20+
subdir: ./
21+
22+
# general settings
23+
mode: eval # running mode: train/eval
24+
data_name: zinc250k # data select:qm9/zinc250k
25+
seed: 1
26+
output_dir: ${hydra:run.dir}
27+
save_score: true
28+
29+
# set testing hyper-parameters
30+
qm9:
31+
b_n_flow: 10
32+
b_n_block: 1
33+
b_hidden_ch: [128,128]
34+
a_n_flow: 27
35+
a_n_block: 1
36+
a_hidden_gnn: [64]
37+
a_hidden_lin: [128,64]
38+
mask_row_size_list: [1]
39+
mask_row_stride_list: [1]
40+
learn_dist: True
41+
noise_scale: 0.6
42+
b_conv_lu: 1
43+
atomic_num_list: [6, 7, 8, 9, 0]
44+
b_n_type: 4
45+
b_n_squeeze: 3
46+
a_n_node: 9
47+
valid_idx: valid_idx_qm9.json
48+
label_keys: ['A', 'B', 'C', 'mu', 'alpha', 'homo', 'lumo', 'gap', 'r2', 'zpve', 'U0', 'U', 'H', 'G', 'Cv']
49+
smiles_col: SMILES1
50+
51+
zinc250k:
52+
b_n_flow: 10
53+
b_n_block: 1
54+
b_hidden_ch: [512,512]
55+
a_n_flow: 38
56+
a_n_block: 1
57+
a_hidden_gnn: [256]
58+
a_hidden_lin: [512,64]
59+
mask_row_size_list: [1]
60+
mask_row_stride_list: [1]
61+
learn_dist: True
62+
noise_scale: 0.6
63+
b_conv_lu: 2
64+
atomic_num_list: [6, 7, 8, 9, 15, 16, 17, 35, 53, 0]
65+
b_n_type: 4
66+
b_n_squeeze: 19
67+
a_n_node: 38
68+
valid_idx: valid_idx_zinc.json
69+
label_keys: ['logP', 'qed', 'SAS']
70+
smiles_col: smiles
71+
72+
# set data path
73+
FILE_PATH: ./datasets/moflow
74+
75+
# model settings
76+
MODEL:
77+
input_keys: ["nodes", "edges"]
78+
output_keys: ["output", "sum_log_det"]
79+
hyper_params: null
80+
81+
# evaluation settings
82+
EVAL:
83+
pretrained_model_path: null
84+
batch_size: 256
85+
num_workers: 0
86+
reconstruct: false
87+
int2point: false
88+
intgrid: false
89+
inter_times: 5
90+
correct_validity: true
91+
temperature: 1.0
92+
delta: 0.1
93+
n_experiments:
94+
save_fig: true
95+
96+
EVAL_mode: Intergrid #select EVAL_mode: Reconstruct/Random/Inter2point/Intergrid
97+
98+
Reconstruct: #重建生成,针对不同数据集的分子进行重建生成
99+
batch_size: 256
100+
reconstruct: true
101+
n_experiments: 0
102+
103+
Random: #随机生成,针对不同的数据集从潜在空间进行随机生成,10000个样本生成5次
104+
batch_size: 10000
105+
temperature: 0.85
106+
delta: 0.05
107+
n_experiments: 5
108+
save_fig: false
109+
correct_validity: true
110+
111+
Inter2point: #在潜在空间进行插值,两个分子之间插值可视化生成分子图
112+
batch_size: 1000
113+
int2point: true
114+
temperature: 0.65
115+
inter_times: 50
116+
correct_validity: true
117+
n_experiments: 0
118+
119+
Intergrid: #在潜在空间进行插值,分子网格进行可视化生成分子图
120+
batch_size: 1000
121+
temperature: 0.65
122+
delta: 5
123+
intgrid: true
124+
inter_times: 40
125+
correct_validity: true
126+
n_experiments: 0
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
defaults:
2+
- ppsci_default
3+
- TRAIN: train_default
4+
- TRAIN/ema: ema_default
5+
- TRAIN/swa: swa_default
6+
- EVAL: eval_default
7+
- INFER: infer_default
8+
- hydra/job/config/override_dirname/exclude_keys: exclude_keys_default
9+
- _self_
10+
hydra:
11+
run:
12+
# dynamic output directory according to running time and override name
13+
dir: outputs_moflow_train/${data_name}
14+
job:
15+
name: ${mode} # name of logfile
16+
chdir: false # keep current working directory unchanged
17+
sweep:
18+
# output directory for multirun
19+
dir: ${hydra.run.dir}
20+
subdir: ./
21+
22+
# general settings
23+
mode: train # running mode: train/eval
24+
data_name: qm9 # data select:qm9/zinc250k
25+
seed: 1
26+
output_dir: ${hydra:run.dir}
27+
log_freq: 20
28+
29+
# set training hyper-parameters
30+
qm9:
31+
b_n_flow: 10
32+
b_n_block: 1
33+
b_hidden_ch: [128,128]
34+
a_n_flow: 27
35+
a_n_block: 1
36+
a_hidden_gnn: [64]
37+
a_hidden_lin: [128,64]
38+
mask_row_size_list: [1]
39+
mask_row_stride_list: [1]
40+
learn_dist: True
41+
noise_scale: 0.6
42+
b_conv_lu: 1
43+
atomic_num_list: [6, 7, 8, 9, 0]
44+
b_n_type: 4
45+
b_n_squeeze: 3
46+
a_n_node: 9
47+
valid_idx: valid_idx_qm9.json
48+
label_keys: ['A', 'B', 'C', 'mu', 'alpha', 'homo', 'lumo', 'gap', 'r2', 'zpve', 'U0', 'U', 'H', 'G', 'Cv']
49+
smiles_col: SMILES1
50+
51+
zinc250k:
52+
b_n_flow: 10
53+
b_n_block: 1
54+
b_hidden_ch: [512,512]
55+
a_n_flow: 38
56+
a_n_block: 1
57+
a_hidden_gnn: [256]
58+
a_hidden_lin: [512,64]
59+
mask_row_size_list: [1]
60+
mask_row_stride_list: [1]
61+
learn_dist: True
62+
noise_scale: 0.6
63+
b_conv_lu: 2
64+
atomic_num_list: [6, 7, 8, 9, 15, 16, 17, 35, 53, 0]
65+
b_n_type: 4
66+
b_n_squeeze: 19
67+
a_n_node: 38
68+
valid_idx: valid_idx_zinc.json
69+
label_keys: ['logP', 'qed', 'SAS']
70+
smiles_col: smiles
71+
72+
# set data path
73+
FILE_PATH: ./datasets/moflow
74+
75+
# model settings
76+
MODEL:
77+
input_keys: ["nodes", "edges"]
78+
output_keys: ["output", "sum_log_det"]
79+
hyper_params: null
80+
81+
# training settings
82+
TRAIN:
83+
epochs: 200
84+
save_freq: 50
85+
eval_during_train: true
86+
eval_freq: 20
87+
learning_rate: 0.001
88+
lr_decay: 0.999995
89+
batch_size: 256
90+
num_workers: 8
91+
pretrained_model_path: null
92+
checkpoint_path: null
93+
94+
# evaluation settings
95+
EVAL:
96+
pretrained_model_path: null
97+
compute_metric_by_batch: false
98+
eval_with_no_grad: true
99+
batch_size: 100

0 commit comments

Comments
 (0)