Skip to content

Commit d518162

Browse files
committed
merge release_03 branch into master branch
2 parents 5d2c892 + 1031bcb commit d518162

File tree

190 files changed

+15746
-1484
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

190 files changed

+15746
-1484
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
*.pyc
22
__pycache__/
3+
Data

Pilot1/Attn/README.md

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
The Pilot1 Attn Benchmark requires an hdf5 file specified by the hyperparameter "in", name of this file for default case is: top_21_1fold_001.h5
2+
3+
Benchmark auto downloads the file below:
4+
http://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/uno/top_21_1fold_001.h5 (~4GB)
5+
6+
Any file of the form top*21_1fold*"ijk".h5 can be used as input
7+
8+
## Sample run:
9+
10+
```
11+
python attn_baseline_keras2.py
12+
Params: {'model_name': 'attn', 'dense': [2000, 600], 'batch_size': 32, 'epochs': 1, 'activation': 'relu', 'loss': 'categorical_crossentropy', 'optimizer': 'sgd', 'drop': 0.2, 'learning_rate': 1e-05, 'momentum': 0.7, 'scaling': 'minmax', 'validation_split': 0.1, 'epsilon_std': 1.0, 'rng_seed': 2017, 'initialization': 'glorot_uniform', 'latent_dim': 2, 'batch_normalization': False, 'in': 'top_21_1fold_001.h5', 'save_path': 'candle_save', 'save_dir': './save/001/', 'use_cp': False, 'early_stop': True, 'reduce_lr': True, 'feature_subsample': 0, 'nb_classes': 2, 'timeout': 3600, 'verbose': None, 'logfile': None, 'train_bool': True, 'experiment_id': 'EXP000', 'run_id': 'RUN000', 'shuffle': False, 'gpus': [], 'profiling': False, 'residual': False, 'warmup_lr': False, 'use_tb': False, 'tsne': False, 'datatype': <class 'numpy.float32'>, 'output_dir': '/nfs2/jain/Benchmarks/Pilot1/Attn/Output/EXP000/RUN000'}
13+
...
14+
...
15+
processing h5 in file top_21_1fold_001.h5
16+
17+
x_train shape: (271915, 6212)
18+
x_test shape: (33989, 6212)
19+
Examples:
20+
Total: 339893
21+
Positive: 12269 (3.61% of total)
22+
23+
X_train shape: (271915, 6212)
24+
X_test shape: (33989, 6212)
25+
Y_train shape: (271915, 2)
26+
Y_test shape: (33989, 2)
27+
Instructions for updating:
28+
If using Keras pass *_constraint arguments to layers.
29+
Model: "model_1"
30+
__________________________________________________________________________________________________
31+
Layer (type) Output Shape Param # Connected to
32+
==================================================================================================
33+
input_1 (InputLayer) (None, 6212) 0
34+
__________________________________________________________________________________________________
35+
dense_1 (Dense) (None, 1000) 6213000 input_1[0][0]
36+
__________________________________________________________________________________________________
37+
batch_normalization_1 (BatchNor (None, 1000) 4000 dense_1[0][0]
38+
__________________________________________________________________________________________________
39+
dense_2 (Dense) (None, 1000) 1001000 batch_normalization_1[0][0]
40+
__________________________________________________________________________________________________
41+
batch_normalization_2 (BatchNor (None, 1000) 4000 dense_2[0][0]
42+
__________________________________________________________________________________________________
43+
dense_3 (Dense) (None, 1000) 1001000 batch_normalization_1[0][0]
44+
__________________________________________________________________________________________________
45+
multiply_1 (Multiply) (None, 1000) 0 batch_normalization_2[0][0]
46+
dense_3[0][0]
47+
__________________________________________________________________________________________________
48+
dense_4 (Dense) (None, 500) 500500 multiply_1[0][0]
49+
__________________________________________________________________________________________________
50+
batch_normalization_3 (BatchNor (None, 500) 2000 dense_4[0][0]
51+
__________________________________________________________________________________________________
52+
dropout_1 (Dropout) (None, 500) 0 batch_normalization_3[0][0]
53+
__________________________________________________________________________________________________
54+
dense_5 (Dense) (None, 250) 125250 dropout_1[0][0]
55+
__________________________________________________________________________________________________
56+
batch_normalization_4 (BatchNor (None, 250) 1000 dense_5[0][0]
57+
__________________________________________________________________________________________________
58+
dropout_2 (Dropout) (None, 250) 0 batch_normalization_4[0][0]
59+
__________________________________________________________________________________________________
60+
dense_6 (Dense) (None, 125) 31375 dropout_2[0][0]
61+
__________________________________________________________________________________________________
62+
batch_normalization_5 (BatchNor (None, 125) 500 dense_6[0][0]
63+
__________________________________________________________________________________________________
64+
dropout_3 (Dropout) (None, 125) 0 batch_normalization_5[0][0]
65+
__________________________________________________________________________________________________
66+
dense_7 (Dense) (None, 60) 7560 dropout_3[0][0]
67+
__________________________________________________________________________________________________
68+
batch_normalization_6 (BatchNor (None, 60) 240 dense_7[0][0]
69+
__________________________________________________________________________________________________
70+
dropout_4 (Dropout) (None, 60) 0 batch_normalization_6[0][0]
71+
__________________________________________________________________________________________________
72+
dense_8 (Dense) (None, 30) 1830 dropout_4[0][0]
73+
__________________________________________________________________________________________________
74+
batch_normalization_7 (BatchNor (None, 30) 120 dense_8[0][0]
75+
__________________________________________________________________________________________________
76+
dropout_5 (Dropout) (None, 30) 0 batch_normalization_7[0][0]
77+
__________________________________________________________________________________________________
78+
dense_9 (Dense) (None, 2) 62 dropout_5[0][0]
79+
==================================================================================================
80+
81+
Total params: 8,893,437
82+
Trainable params: 8,887,507
83+
Non-trainable params: 5,930
84+
..
85+
..
86+
271915/271915 [==============================] - 631s 2ms/step - loss: 0.8681 - acc: 0.5548 - tf_auc: 0.5371 - val_loss: 0.6010 - val_acc: 0.8365 - val_tf_auc: 0.5743
87+
Current time ....631.567
88+
89+
Epoch 00001: val_loss improved from inf to 0.60103, saving model to ./save/001/Agg_attn_bin.autosave.model.h5
90+
creating table of predictions
91+
creating figure 1 at ./save/001/Agg_attn_bin.auroc.pdf
92+
creating figure 2 at ./save/001/Agg_attn_bin.auroc2.pdf
93+
f1=0.234 auroc=0.841 aucpr=0.990
94+
creating figure 3 at ./save/001/Agg_attn_bin.aurpr.pdf
95+
creating figure 4 at ./save/001/Agg_attn_bin.confusion_without_norm.pdf
96+
Confusion matrix, without normalization
97+
[[27591 5190][ 360 848]]
98+
Confusion matrix, without normalization
99+
[[27591 5190][ 360 848]]
100+
Normalized confusion matrix
101+
[[0.84 0.16][0.3 0.7 ]]
102+
Examples:
103+
Total: 339893
104+
Positive: 12269 (3.61% of total)
105+
106+
0.7718316679565835
107+
0.7718316679565836
108+
precision recall f1-score support
109+
110+
0 0.99 0.84 0.91 32781
111+
1 0.14 0.70 0.23 1208
112+
113+
micro avg 0.84 0.84 0.84 33989
114+
macro avg 0.56 0.77 0.57 33989
115+
weighted avg 0.96 0.84 0.88 33989
116+
117+
[[27591 5190][ 360 848]]
118+
score
119+
[0.5760348070144456, 0.8367118835449219, 0.5936741828918457]
120+
Test val_loss: 0.5760348070144456
121+
Test accuracy: 0.8367118835449219
122+
Saved model to disk
123+
Loaded json model from disk
124+
json Validation loss: 0.560062773128295
125+
json Validation accuracy: 0.8367118835449219
126+
json accuracy: 83.67%
127+
Loaded yaml model from disk
128+
yaml Validation loss: 0.560062773128295
129+
yaml Validation accuracy: 0.8367118835449219
130+
yaml accuracy: 83.67%
131+
Yaml_train_shape: (271915, 2)
132+
Yaml_test_shape: (33989, 2)
133+
```

Pilot1/Attn/attn.py

Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
from __future__ import print_function
2+
3+
import os
4+
import sys
5+
import logging
6+
7+
import pandas as pd
8+
import numpy as np
9+
10+
from sklearn.metrics import mean_squared_error
11+
from sklearn.metrics import r2_score
12+
from scipy.stats.stats import pearsonr
13+
14+
file_path = os.path.dirname(os.path.realpath(__file__))
15+
#lib_path = os.path.abspath(os.path.join(file_path, '..'))
16+
#sys.path.append(lib_path)
17+
lib_path2 = os.path.abspath(os.path.join(file_path, '..', '..', 'common'))
18+
sys.path.append(lib_path2)
19+
20+
import candle
21+
22+
logger = logging.getLogger(__name__)
23+
candle.set_parallelism_threads()
24+
25+
additional_definitions = [
26+
{'name':'latent_dim',
27+
'action':'store',
28+
'type': int,
29+
'help':'latent dimensions'},
30+
{'name':'residual',
31+
'type': candle.str2bool,
32+
'default': False,
33+
'help':'add skip connections to the layers'},
34+
{'name':'reduce_lr',
35+
'type': candle.str2bool,
36+
'default': False,
37+
'help':'reduce learning rate on plateau'},
38+
{'name':'warmup_lr',
39+
'type': candle.str2bool,
40+
'default': False,
41+
'help':'gradually increase learning rate on start'},
42+
{'name':'base_lr',
43+
'type': float,
44+
'help':'base learning rate'},
45+
{'name':'epsilon_std',
46+
'type': float,
47+
'help':'epsilon std for sampling latent noise'},
48+
{'name':'use_cp',
49+
'type': candle.str2bool,
50+
'default': False,
51+
'help':'checkpoint models with best val_loss'},
52+
#{'name':'shuffle',
53+
#'type': candle.str2bool,
54+
#'default': False,
55+
#'help':'shuffle data'},
56+
{'name':'use_tb',
57+
'type': candle.str2bool,
58+
'default': False,
59+
'help':'use tensorboard'},
60+
{'name':'tsne',
61+
'type': candle.str2bool,
62+
'default': False,
63+
'help':'generate tsne plot of the latent representation'}
64+
]
65+
66+
required = [
67+
'activation',
68+
'batch_size',
69+
'dense',
70+
'dropout',
71+
'epochs',
72+
'initialization',
73+
'learning_rate',
74+
'loss',
75+
'optimizer',
76+
'rng_seed',
77+
'scaling',
78+
'val_split',
79+
'latent_dim',
80+
'batch_normalization',
81+
'epsilon_std',
82+
'timeout'
83+
]
84+
85+
class BenchmarkAttn(candle.Benchmark):
86+
87+
def set_locals(self):
88+
"""Functionality to set variables specific for the benchmark
89+
- required: set of required parameters for the benchmark.
90+
- additional_definitions: list of dictionaries describing the additional parameters for the
91+
benchmark.
92+
"""
93+
94+
if required is not None:
95+
self.required = set(required)
96+
if additional_definitions is not None:
97+
self.additional_definitions = additional_definitions
98+
99+
100+
def extension_from_parameters(params, framework=''):
101+
"""Construct string for saving model with annotation of parameters"""
102+
ext = framework
103+
for i, n in enumerate(params['dense']):
104+
if n:
105+
ext += '.D{}={}'.format(i+1, n)
106+
ext += '.A={}'.format(params['activation'][0])
107+
ext += '.B={}'.format(params['batch_size'])
108+
ext += '.E={}'.format(params['epochs'])
109+
ext += '.L={}'.format(params['latent_dim'])
110+
ext += '.LR={}'.format(params['learning_rate'])
111+
ext += '.S={}'.format(params['scaling'])
112+
113+
if params['epsilon_std'] != 1.0:
114+
ext += '.EPS={}'.format(params['epsilon_std'])
115+
if params['dropout']:
116+
ext += '.DR={}'.format(params['dropout'])
117+
if params['batch_normalization']:
118+
ext += '.BN'
119+
if params['warmup_lr']:
120+
ext += '.WU_LR'
121+
if params['reduce_lr']:
122+
ext += '.Re_LR'
123+
if params['residual']:
124+
ext += '.Res'
125+
126+
return ext
127+
def load_data(params, seed):
128+
129+
# start change #
130+
if params['train_data'].endswith('h5') or params['train_data'].endswith('hdf5'):
131+
print ('processing h5 in file {}'.format(params['train_data']))
132+
133+
url = params['data_url']
134+
file_train = params['train_data']
135+
train_file = candle.get_file(file_train, url+file_train, cache_subdir='Pilot1')
136+
137+
df_x_train_0 = pd.read_hdf(train_file, 'x_train_0').astype(np.float32)
138+
df_x_train_1 = pd.read_hdf(train_file, 'x_train_1').astype(np.float32)
139+
X_train = pd.concat([df_x_train_0, df_x_train_1], axis=1, sort=False)
140+
del df_x_train_0, df_x_train_1
141+
142+
df_x_test_0 = pd.read_hdf(train_file, 'x_test_0').astype(np.float32)
143+
df_x_test_1 = pd.read_hdf(train_file, 'x_test_1').astype(np.float32)
144+
X_test = pd.concat([df_x_test_0, df_x_test_1], axis=1, sort=False)
145+
del df_x_test_0, df_x_test_1
146+
147+
df_x_val_0 = pd.read_hdf(train_file, 'x_val_0').astype(np.float32)
148+
df_x_val_1 = pd.read_hdf(train_file, 'x_val_1').astype(np.float32)
149+
X_val = pd.concat([df_x_val_0, df_x_val_1], axis=1, sort=False)
150+
del df_x_val_0, df_x_val_1
151+
152+
Y_train = pd.read_hdf(train_file, 'y_train')
153+
Y_test = pd.read_hdf(train_file, 'y_test')
154+
Y_val = pd.read_hdf(train_file, 'y_val')
155+
156+
# assumes AUC is in the third column at index 2
157+
# df_y = df['AUC'].astype('int')
158+
# df_x = df.iloc[:,3:].astype(np.float32)
159+
160+
# assumes dataframe has already been scaled
161+
# scaler = StandardScaler()
162+
# df_x = scaler.fit_transform(df_x)
163+
else:
164+
print ('expecting in file file suffix h5')
165+
sys.exit()
166+
167+
168+
print('x_train shape:', X_train.shape)
169+
print('x_test shape:', X_test.shape)
170+
171+
return X_train, Y_train, X_val, Y_val, X_test, Y_test
172+
173+
# start change #
174+
if train_file.endswith('h5') or train_file.endswith('hdf5'):
175+
print ('processing h5 in file {}'.format(train_file))
176+
177+
df_x_train_0 = pd.read_hdf(train_file, 'x_train_0').astype(np.float32)
178+
df_x_train_1 = pd.read_hdf(train_file, 'x_train_1').astype(np.float32)
179+
X_train = pd.concat([df_x_train_0, df_x_train_1], axis=1, sort=False)
180+
del df_x_train_0, df_x_train_1
181+
182+
df_x_test_0 = pd.read_hdf(train_file, 'x_test_0').astype(np.float32)
183+
df_x_test_1 = pd.read_hdf(train_file, 'x_test_1').astype(np.float32)
184+
X_test = pd.concat([df_x_test_0, df_x_test_1], axis=1, sort=False)
185+
del df_x_test_0, df_x_test_1
186+
187+
df_x_val_0 = pd.read_hdf(train_file, 'x_val_0').astype(np.float32)
188+
df_x_val_1 = pd.read_hdf(train_file, 'x_val_1').astype(np.float32)
189+
X_val = pd.concat([df_x_val_0, df_x_val_1], axis=1, sort=False)
190+
del df_x_val_0, df_x_val_1
191+
192+
Y_train = pd.read_hdf(train_file, 'y_train')
193+
Y_test = pd.read_hdf(train_file, 'y_test')
194+
Y_val = pd.read_hdf(train_file, 'y_val')
195+
196+
# assumes AUC is in the third column at index 2
197+
# df_y = df['AUC'].astype('int')
198+
# df_x = df.iloc[:,3:].astype(np.float32)
199+
200+
# assumes dataframe has already been scaled
201+
# scaler = StandardScaler()
202+
# df_x = scaler.fit_transform(df_x)
203+
204+
else:
205+
print ('expecting in file file suffix h5')
206+
sys.exit()
207+
208+
209+
print('x_train shape:', X_train.shape)
210+
print('x_test shape:', X_test.shape)
211+
212+
return X_train, Y_train, X_val, Y_val, X_test, Y_test
213+
214+
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
[Global_Params]
2+
data_url = 'http://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/uno/'
3+
train_data='top_21_1fold_001.h5'
4+
model_name='attn_abs'
5+
dense=[1000, 1000, 1000, 500, 250, 125, 60, 30, 2]
6+
batch_size=32
7+
epochs=2
8+
activation=['relu', 'relu', 'softmax', 'relu', 'relu', 'relu', 'relu', 'relu', 'softmax']
9+
loss='categorical_crossentropy'
10+
optimizer='sgd'
11+
dropout=0.2
12+
learning_rate=0.00001
13+
momentum=0.9
14+
val_split=0.1
15+
rng_seed=2017
16+
use_cp=False
17+
early_stop=True
18+
reduce_lr=True
19+
feature_subsample=0
20+
output_dir='save_abs/EXP01/'
21+
experiment_id='01'
22+
run_id='1'
23+
save_path='save_abs/EXP01/'
24+
target_abs_acc=0.85
25+
26+
[Monitor_Params]
27+
timeout=3600

0 commit comments

Comments
 (0)