Skip to content

Commit bb52504

Browse files
committed
creating new benchmark
1 parent 00482f4 commit bb52504

9 files changed

+922
-0
lines changed

Pilot1/ST1/README.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
CANDLE benchmark versions are now available.
2+
3+
The original examples are retained and can be run as noted below.
4+
5+
The CANDLE versions make use of the common network design in smiles_transformer.py, and implement the
6+
models in `sct_baseline_keras.py` and `srt_baseline_keras.py`, for classification and regression, respectively.
7+
All the relevant arguments are contained in the respective default model files,
8+
`class_default_model.txt` and `regress_default_model.txt`.
9+
They can be invoked with `python sct_baseline_keras.py` and all variables can be overwritten from the command line.
10+
The datasets will be automatically downloaded and stored in the `../../Data/Pilot1 directory`.
11+
12+
Running the original versions:
13+
14+
The datasets
15+
16+
Classification problem.
17+
18+
CHEMBL -- 1.5M training examples.. for Lipinski (1/0) (Lipinski criteria for drug likeness) validation 100K samples non-overlapping
19+
20+
Classification validation accuracy is about 91% after 10-20 epochs
21+
22+
Regression problem
23+
24+
CHEMBL -- 1.5M training examples (shuffled and resampled so not same 1.5M as classification) .. predicting molecular Weight validation
25+
is also 100K samples non-overlapping.
26+
27+
Regression problem achieves R^2 about .95 after ~20 epochs.
28+
29+
See the log files for trace.
30+
31+
We save the best validation loss in the *.h5 dumps.
32+
33+
34+
To run the models
35+
36+
`CUDA_VISIBLE_DEVICES=1 python smiles_class_transformer.py --in_train chm.lipinski.trn.csv --in_vali chm.lipinski.val.csv --ep 25`
37+
38+
or
39+
40+
`CUDA_VISIBLE_DEVICES=0 python smiles_regress_transformer.py --in_train chm.weight.trn.csv --in_vali chm.weight.val.csv --ep 25`
41+
42+
43+
44+
Regression output should look something like
45+
46+
```
47+
Epoch 1/25
48+
2022-03-21 12:53:11.402337: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
49+
46875/46875 [==============================] - 4441s 95ms/step - loss: 11922.3690 - mae: 77.4828 - r2: 0.3369 - val_loss: 1460.3314 - val_mae: 21.4797 - val_r2: 0.9164
50+
51+
Epoch 00001: val_loss improved from inf to 1460.33142, saving model to smile_regress.autosave.model.h5
52+
Epoch 2/25
53+
46875/46875 [==============================] - 4431s 95ms/step - loss: 5460.8232 - mae: 55.1082 - r2: 0.6845 - val_loss: 1451.3109 - val_mae: 27.5647 - val_r2: 0.9163
54+
55+
Epoch 00002: val_loss improved from 1460.33142 to 1451.31091, saving model to smile_regress.autosave.model.h5
56+
Epoch 3/25
57+
46875/46875 [==============================] - 4428s 94ms/step - loss: 5007.1718 - mae: 52.5552 - r2: 0.7112 - val_loss: 1717.9198 - val_mae: 31.0688 - val_r2: 0.9004
58+
59+
Epoch 00003: val_loss did not improve from 1451.31091
60+
Epoch 4/25
61+
46875/46875 [==============================] - 4419s 94ms/step - loss: 4844.3624 - mae: 51.5771 - r2: 0.7206 - val_loss: 1854.3645 - val_mae: 35.4912 - val_r2: 0.8908
62+
63+
Epoch 00004: val_loss did not improve from 1451.31091
64+
Epoch 5/25
65+
46875/46875 [==============================] - 4715s 101ms/step - loss: 4754.6214 - mae: 51.0409 - r2: 0.7277 - val_loss: 1404.8254 - val_mae: 25.8863 - val_r2: 0.9200
66+
67+
Epoch 00005: val_loss improved from 1451.31091 to 1404.82544, saving model to smile_regress.autosave.model.h5
68+
Epoch 6/25
69+
46875/46875 [==============================] - 4474s 95ms/step - loss: 4679.2172 - mae: 50.7018 - r2: 0.7314 - val_loss: 1353.6165 - val_mae: 23.1900 - val_r2: 0.9252
70+
71+
Epoch 00006: val_loss improved from 1404.82544 to 1353.61646, saving model to smile_regress.autosave.model.h5
72+
Epoch 7/25
73+
46875/46875 [==============================] - 4421s 94ms/step - loss: 4585.5881 - mae: 50.3534 - r2: 0.7359 - val_loss: 1312.0532 - val_mae: 27.0301 - val_r2: 0.9234
74+
75+
Epoch 00007: val_loss improved from 1353.61646 to 1312.05322, saving model to smile_regress.autosave.model.h5
76+
Epoch 8/25
77+
46875/46875 [==============================] - 4420s 94ms/step - loss: 4587.0872 - mae: 50.2153 - r2: 0.7348 - val_loss: 1064.2247 - val_mae: 19.8623 - val_r2: 0.9399
78+
79+
Epoch 00008: val_loss improved from 1312.05322 to 1064.22473, saving model to smile_regress.autosave.model.h5
80+
Epoch 9/25
81+
46875/46875 [==============================] - 4423s 94ms/step - loss: 4587.0479 - mae: 50.0801 - r2: 0.7369 - val_loss: 1592.4563 - val_mae: 29.7814 - val_r2: 0.9085
82+
83+
Epoch 00009: val_loss did not improve from 1064.22473
84+
Epoch 10/25
85+
46875/46875 [==============================] - 4425s 94ms/step - loss: 4558.5997 - mae: 49.9495 - r2: 0.7383 - val_loss: 1683.6605 - val_mae: 33.6503 - val_r2: 0.9023
86+
87+
Epoch 00010: val_loss did not improve from 1064.22473
88+
Epoch 11/25
89+
46875/46875 [==============================] - 4422s 94ms/step - loss: 4546.4993 - mae: 49.8544 - r2: 0.7401 - val_loss: 950.7401 - val_mae: 17.9388 - val_r2: 0.9465
90+
91+
Epoch 00011: val_loss improved from 1064.22473 to 950.74005, saving model to smile_regress.autosave.model.h5
92+
Epoch 12/25
93+
46875/46875 [==============================] - 4425s 94ms/step - loss: 4442.8416 - mae: 49.5132 - r2: 0.7436 - val_loss: 1060.0773 - val_mae: 18.5870 - val_r2: 0.9397
94+
95+
Epoch 00012: val_loss did not improve from 950.74005
96+
Epoch 13/25
97+
46875/46875 [==============================] - 4429s 94ms/step - loss: 4481.1873 - mae: 49.5325 - r2: 0.7434 - val_loss: 878.7765 - val_mae: 14.4166 - val_r2: 0.9515
98+
99+
Epoch 00013: val_loss improved from 950.74005 to 878.77649, saving model to smile_regress.autosave.model.h5
100+
Epoch 14/25
101+
46875/46875 [==============================] - 4428s 94ms/step - loss: 4474.0927 - mae: 49.4286 - r2: 0.7445 - val_loss: 1116.5386 - val_mae: 23.1262 - val_r2: 0.9360
102+
103+
Epoch 00014: val_loss did not improve from 878.77649
104+
Epoch 15/25
105+
46875/46875 [==============================] - 4422s 94ms/step - loss: 4426.6833 - mae: 49.2094 - r2: 0.7446 - val_loss: 883.4428 - val_mae: 13.1046 - val_r2: 0.9507
106+
107+
Epoch 00015: val_loss did not improve from 878.77649
108+
Epoch 16/25
109+
46875/46875 [==============================] - 4416s 94ms/step - loss: 4386.3840 - mae: 49.0872 - r2: 0.7476 - val_loss: 981.3953 - val_mae: 14.7772 - val_r2: 0.9451
110+
111+
Epoch 00016: val_loss did not improve from 878.77649
112+
Epoch 17/25
113+
46875/46875 [==============================] - 4423s 94ms/step - loss: 4384.2891 - mae: 49.0281 - r2: 0.7464 - val_loss: 887.1695 - val_mae: 14.6686 - val_r2: 0.9506
114+
115+
Epoch 00017: val_loss did not improve from 878.77649
116+
Epoch 18/25
117+
46875/46875 [==============================] - 4417s 94ms/step - loss: 4398.2363 - mae: 48.9583 - r2: 0.7487 - val_loss: 849.0564 - val_mae: 12.2192 - val_r2: 0.9528
118+
119+
Epoch 00018: val_loss improved from 878.77649 to 849.05640, saving model to smile_regress.autosave.model.h5
120+
Epoch 19/25
121+
46875/46875 [==============================] - 4422s 94ms/step - loss: 4350.2122 - mae: 48.8221 - r2: 0.7505 - val_loss: 847.2310 - val_mae: 13.4015 - val_r2: 0.9533
122+
123+
Epoch 00019: val_loss improved from 849.05640 to 847.23096, saving model to smile_regress.autosave.model.h5
124+
Epoch 20/25
125+
46875/46875 [==============================] - 4428s 94ms/step - loss: 4346.0122 - mae: 48.7704 - r2: 0.7513 - val_loss: 964.1797 - val_mae: 17.7863 - val_r2: 0.9453
126+
127+
Epoch 00020: val_loss did not improve from 847.23096
128+
Epoch 21/25
129+
46875/46875 [==============================] - 4424s 94ms/step - loss: 4293.9141 - mae: 48.5882 - r2: 0.7521 - val_loss: 800.8525 - val_mae: 12.8857 - val_r2: 0.9556
130+
131+
Epoch 00021: val_loss improved from 847.23096 to 800.85254, saving model to smile_regress.autosave.model.h5
132+
Epoch 22/25
133+
46875/46875 [==============================] - 4427s 94ms/step - loss: 4323.8214 - mae: 48.5665 - r2: 0.7524 - val_loss: 835.3901 - val_mae: 14.5708 - val_r2: 0.9534
134+
135+
Epoch 00022: val_loss did not improve from 800.85254
136+
Epoch 23/25
137+
46875/46875 [==============================] - 4429s 94ms/step - loss: 4311.1271 - mae: 48.5622 - r2: 0.7528 - val_loss: 820.6389 - val_mae: 14.3753 - val_r2: 0.9547
138+
139+
Epoch 00023: val_loss did not improve from 800.85254
140+
Epoch 24/25
141+
46875/46875 [==============================] - 4427s 94ms/step - loss: 4286.2930 - mae: 48.3628 - r2: 0.7548 - val_loss: 815.2863 - val_mae: 12.7142 - val_r2: 0.9549
142+
143+
Epoch 00024: val_loss did not improve from 800.85254
144+
Epoch 25/25
145+
46875/46875 [==============================] - 4422s 94ms/step - loss: 4259.8291 - mae: 48.3112 - r2: 0.7556 - val_loss: 813.1475 - val_mae: 12.2975 - val_r2: 0.9564
146+
147+
Epoch 00025: val_loss did not improve from 800.85254
148+
```
149+

Pilot1/ST1/class_default_model.txt

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
[Global_Params]
2+
data_url = "https://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Examples/xform-smiles-data/"
3+
train_data = 'chm.lipinski.trn.csv'
4+
val_data = 'chm.lipinski.val.csv'
5+
dropout = 0.1
6+
dense = [1024, 256, 64, 16]
7+
activation = 'relu'
8+
out_layer = 2
9+
out_activation = 'softmax'
10+
transformer_depth = 4
11+
embed_dim = 128
12+
ff_dim = 128
13+
maxlen = 250
14+
num_heads = 16
15+
vocab_size = 40000
16+
epochs = 400
17+
batch_size = 32
18+
loss = 'sparse_categorical_crossentropy'
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
[Global_Params]
2+
data_url = "https://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Examples/xform-smiles-data/"
3+
train_data = 'chm.weight.trn.csv'
4+
val_data = 'chm.weight.val.csv'
5+
dropout = 0.1
6+
dense = [1024, 256, 64, 16]
7+
activation = 'relu'
8+
out_layer = 1
9+
out_activation = 'relu'
10+
transformer_depth = 4
11+
embed_dim = 128
12+
ff_dim = 128
13+
maxlen = 250
14+
num_heads = 16
15+
vocab_size = 40000
16+
epochs = 400
17+
batch_size = 32
18+
loss = 'mean_squared_error'
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
[Global_Params]
2+
data_url = "https://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Examples/xform-smiles-data/"
3+
train_data = 'Nsp13.helicase_m1_pocket2.train'
4+
val_data = 'Nsp13.helicase_m1_pocket2.val'
5+
dropout = 0.1
6+
dense = [1024, 256, 64, 16]
7+
activation = 'relu'
8+
out_layer = 1
9+
out_activation = 'relu'
10+
transformer_depth = 4
11+
embed_dim = 128
12+
ff_dim = 128
13+
maxlen = 250
14+
num_heads = 16
15+
vocab_size = 40000
16+
epochs = 400
17+
batch_size = 32
18+
loss = 'mean_squared_error'
19+
optimizer='Adam'

Pilot1/ST1/sct_baseline_keras2.py

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Setup
2+
3+
import os
4+
import sys
5+
# import gzip
6+
7+
# import math
8+
# import matplotlib
9+
# matplotlib.use('Agg')
10+
11+
# import matplotlib.pyplot as plt
12+
13+
from tensorflow.keras import backend as K
14+
from tensorflow.keras.optimizers import Adam # RMSprop, SGD
15+
from tensorflow.keras.callbacks import ModelCheckpoint, CSVLogger, ReduceLROnPlateau, EarlyStopping
16+
17+
file_path = os.path.dirname(os.path.realpath(__file__))
18+
lib_path = os.path.abspath(os.path.join(file_path, '..', '..', 'common'))
19+
sys.path.append(lib_path)
20+
21+
import candle
22+
import smiles_transformer as st
23+
24+
25+
def initialize_parameters(default_model='class_default_model.txt'):
26+
27+
# Build benchmark object
28+
sctBmk = st.BenchmarkST(st.file_path, default_model, 'keras',
29+
prog='p1b1_baseline',
30+
desc='Multi-task (DNN) for data extraction from clinical reports - Pilot 3 Benchmark 1')
31+
32+
# Initialize parameters
33+
gParameters = candle.finalize_parameters(sctBmk)
34+
35+
return gParameters
36+
37+
38+
# Train and Evaluate
39+
40+
def run(params):
41+
42+
x_train, y_train, x_val, y_val = st.load_data(params)
43+
44+
model = st.transformer_model(params)
45+
46+
model.compile(loss='sparse_categorical_crossentropy',
47+
optimizer=Adam(lr=0.000001),
48+
metrics=['accuracy'])
49+
50+
# set up a bunch of callbacks to do work during model training..
51+
52+
checkpointer = ModelCheckpoint(filepath='smile_class.autosave.model.h5', verbose=1, save_weights_only=True, save_best_only=True)
53+
csv_logger = CSVLogger('smile_class.training.log')
54+
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.75, patience=20, verbose=1, mode='auto', epsilon=0.0001, cooldown=3, min_lr=0.000000001)
55+
early_stop = EarlyStopping(monitor='val_loss', patience=100, verbose=1, mode='auto')
56+
57+
history = model.fit(x_train, y_train,
58+
batch_size=params['batch_size'],
59+
epochs=params['epochs'],
60+
verbose=1,
61+
validation_data=(x_val, y_val),
62+
callbacks=[checkpointer, csv_logger, reduce_lr, early_stop])
63+
64+
model.load_weights('smile_class.autosave.model.h5')
65+
66+
67+
def main():
68+
params = initialize_parameters()
69+
run(params)
70+
71+
72+
if __name__ == '__main__':
73+
main()
74+
if K.backend() == 'tensorflow':
75+
K.clear_session()

0 commit comments

Comments
 (0)