Skip to content

Commit b9b3870

Browse files
committed
Fix ST1 Readme. Multiple fixes to examples.
1 parent 644c22c commit b9b3870

14 files changed

+45
-46
lines changed

Pilot1/ST1/README.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# Simple transformers for classification and regression using SMILE string input
22

33
## Introduction
4-
The ST1 benchmark represent two versions of a simple transformer, one that can perform regression and the other classification. We chose the transformer architecture to see if we could train directly on SMILE strings. This benchmark brings novel capability to the suite of Pilot1 benchmarks in two ways. First, the featureization of a small molecule is simple its SMILE string. The secone novel aspect to the set of Pilot1 benchmarks is that the model is based on the Transformer architecture, albeit this benchmark is a simpler version of the large Transformer models that train on billions and greater parameters.
4+
The ST1 benchmark represent two versions of a simple transformer, one that can perform regression and the other classification. We chose the transformer architecture to see if we could train directly on SMILE strings. This benchmark brings novel capability to the suite of Pilot1 benchmarks in two ways. First, the featureization of a small molecule is simply its SMILE string. The second novel aspect to the set of Pilot1 benchmarks is that the model is based on the Transformer architecture, albeit this benchmark is a simpler version of the large Transformer models that train on billions and greater parameters.
55

6-
Both the original code and the CANDLE versions are available. The original examples are retained and can be run as noted below. The CANDLE versions make use of the common network design in smiles_transformer.py, and implement the models in `sct_baseline2_keras.py` and `srt_baseline_keras2.py`, for classification and regression, respectively.
6+
Both the original code and the CANDLE versions are available. The original examples are retained and can be run as noted below. The CANDLE versions make use of the common network design in `smiles_transformer.py`, and implement the models in `sct_baseline_keras2.py` and `srt_baseline_keras2.py`, for classification and regression, respectively.
77

88
The example classification problem takes as input SMILE strings and trains a model to predict whether or not a compound is 'drug-like' based on Lipinski criteria. The example regression problem takes as input SMILE strings and trains a model to predict the molecular weight. Data are freely downloadable and automatically downloaded by the CANDLE versions.
99

@@ -12,8 +12,10 @@ For the CANDLE versions, all the relevant arguments are contained in the respect
1212
class_default_model.txt
1313
python sct_baseline_keras2.py
1414
15+
```
1516
and
1617

18+
```
1719
regress_default_model.txt
1820
python srt_baseline_keras2.py
1921
```
@@ -23,12 +25,16 @@ The original code demonstrating a simple transformer regressor and a simple tran
2325
```
2426
smiles_regress_transformer.py
2527
28+
```
2629
and
2730

31+
```
2832
smiles_class_transformer.py
2933
```
3034

31-
The example data sets are the same as for the CANDLE versions, and allow one to predict whether a small molecule is "drug-like" based on Lipinski criteria (classification problem), or predict the molecular weight (regression) from a SMILE string as input. The example data sets are downloadable using the information in the regress_default_model.txt or class_default_model.txt files. These data files must be downloaded manually and specified on the command line for execution.
35+
The example data sets are the same as for the CANDLE versions, and allow one to predict whether a small molecule is "drug-like" based on Lipinski criteria (classification problem), or predict the molecular weight (regression) from a SMILE string as input.
36+
The example data sets are downloadable using the information in the `regress_default_model.txt` or `class_default_model.txt` files.
37+
These data files must be downloaded manually and specified on the command line for execution.
3238

3339
```
3440
# for regression
@@ -45,8 +51,10 @@ To run the models
4551
```
4652
CUDA_VISIBLE_DEVICES=1 python smiles_class_transformer.py --in_train chm.lipinski.trn.csv --in_vali chm.lipinski.val.csv --ep 25
4753
54+
```
4855
or
4956

57+
```
5058
CUDA_VISIBLE_DEVICES=0 python smiles_regress_transformer.py --in_train chm.weight.trn.csv --in_vali chm.weight.val.csv --ep 25
5159
```
5260
The model with the best validation loss is saved in the .h5 dumps. Log files contain the trace. Regression output should look something like this.

examples/IGTD/Scripts/Examples_Of_Table_To_Image_Conversion.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
num_col = 30 # Number of pixel columns in image representation
88
num = num_row * num_col # Number of features to be included for analysis, which is also the total number of pixels in image representation
99
save_image_size = 3 # Size of pictures (in inches) saved during the execution of IGTD algorithm.
10-
max_step = 10000 # The maximum number of iterations to run the IGTD algorithm, if it does not converge.
10+
max_step = 1000 # The maximum number of iterations to run the IGTD algorithm, if it does not converge.
1111
val_step = 300 # The number of iterations for determining algorithm convergence. If the error reduction rate is smaller than a pre-set threshold for val_step itertions, the algorithm converges.
1212

1313
# Import the example data and linearly scale each feature so that its minimum and maximum values are 0 and 1, respectively.

examples/IGTD/Scripts/Prediction_Modeling_Functions.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
1-
from keras import backend
2-
from keras import optimizers
3-
from keras.models import Model, load_model
4-
from keras.layers import Input, Dense, Dropout, concatenate, Conv2D, BatchNormalization, ReLU, MaxPooling2D, \
1+
from tensorflow.keras import backend
2+
from tensorflow.keras import optimizers
3+
from tensorflow.keras.models import Model, load_model
4+
from tensorflow.keras.layers import Input, Dense, Dropout, concatenate, Conv2D, BatchNormalization, ReLU, MaxPooling2D, \
55
Flatten, AlphaDropout
6-
from keras.callbacks import ModelCheckpoint, CSVLogger, ReduceLROnPlateau, EarlyStopping
6+
from tensorflow.keras.callbacks import ModelCheckpoint, CSVLogger, ReduceLROnPlateau, EarlyStopping
77
from scipy import stats
88
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error, roc_auc_score, accuracy_score, \
99
matthews_corrcoef
1010

1111
import configparser
1212
import numpy as np
13-
import keras
13+
import tensorflow.keras as keras
1414
import os
1515
import pandas as pd
1616
import shutil

examples/chemrep/convert_smiles.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,9 @@
11
from __future__ import print_function
22

33
import os
4-
import sys
54
import logging
65

76
file_path = os.path.dirname(os.path.realpath(__file__))
8-
lib_path2 = os.path.abspath(os.path.join(file_path, '..', '..', 'common'))
9-
sys.path.append(lib_path2)
107

118
import candle
129

examples/darts/advanced/example_setup.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33

44

55
file_path = os.path.dirname(os.path.realpath(__file__))
6-
lib_path2 = os.path.abspath(os.path.join(file_path, '..', '..', '..', 'common'))
7-
sys.path.append(lib_path2)
6+
#lib_path2 = os.path.abspath(os.path.join(file_path, '..', '..', '..', 'common'))
7+
#sys.path.append(lib_path2)
88

99

1010
import candle

examples/darts/uno/example_setup.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33

44

55
file_path = os.path.dirname(os.path.realpath(__file__))
6-
lib_path2 = os.path.abspath(os.path.join(file_path, '..', '..', '..', 'common'))
7-
sys.path.append(lib_path2)
6+
#lib_path2 = os.path.abspath(os.path.join(file_path, '..', '..', '..', 'common'))
7+
#sys.path.append(lib_path2)
88

99

1010
import candle

examples/rnagen/rnagen.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ def with_prefix(x):
148148
df1 = df_sample_source.merge(df_source, on='Source', how='left').drop('Source', axis=1)
149149
logger.info('Embedding RNAseq data source into features: %d additional columns', df1.shape[1] - 1)
150150

151-
df2 = df.drop('Sample', 1)
151+
df2 = df.drop('Sample', axis=1)
152152
if add_prefix:
153153
df2 = df2.add_prefix('rnaseq.')
154154

examples/rnagen/rnagen_baseline_keras2.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,7 @@ def with_prefix(x):
183183
df1 = df_sample_source.merge(df_source, on='Source', how='left').drop('Source', axis=1)
184184
logger.info('Embedding RNAseq data source into features: %d additional columns', df1.shape[1] - 1)
185185

186-
df2 = df.drop('Sample', 1)
186+
df2 = df.drop('Sample', axis=1)
187187
if add_prefix:
188188
df2 = df2.add_prefix('rnaseq.')
189189

examples/xform-smiles/regress_default_model.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,5 @@ vocab_size = 40000
1616
epochs = 400
1717
batch_size = 32
1818
loss = 'mean_squared_error'
19+
optimizer = 'adam'
20+
learning_rate = 0.00001

examples/xform-smiles/sct_baseline_keras.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
# Setup
22

33
import os
4-
import sys
54
# import gzip
65

76
# import math
@@ -15,8 +14,6 @@
1514
from tensorflow.keras.callbacks import ModelCheckpoint, CSVLogger, ReduceLROnPlateau, EarlyStopping
1615

1716
file_path = os.path.dirname(os.path.realpath(__file__))
18-
lib_path = os.path.abspath(os.path.join(file_path, '..', '..', 'common'))
19-
sys.path.append(lib_path)
2017

2118
import candle
2219
import smiles_transformer as st
@@ -63,6 +60,8 @@ def run(params):
6360

6461
model.load_weights('smile_class.autosave.model.h5')
6562

63+
return history
64+
6665

6766
def main():
6867
params = initialize_parameters()

0 commit comments

Comments
 (0)