Skip to content

Commit 78ad701

Browse files
author
Jamaludin Mohd Yusof
committed
Merge branch 'release_01' of github.com:ECP-CANDLE/Benchmarks into release_01
2 parents 7e5dd75 + 6cb5524 commit 78ad701

38 files changed

+6703
-11
lines changed
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
[Global_Params]
2+
cell_features=['expression']
3+
drug_features=['descriptors']
4+
dense=[1000, 1000, 1000]
5+
dense_feature_layers=[1000, 1000, 1000]
6+
activation='relu'
7+
loss='mse'
8+
optimizer='adam'
9+
scaling='std'
10+
drop=0
11+
epochs=10
12+
batch_size=32
13+
validation_split=0.2
14+
cv=1
15+
cv_partition='overlapping'
16+
max_val_loss=1.0
17+
learning_rate=None
18+
base_lr=None
19+
residual=False
20+
reduce_lr=False
21+
warmup_lr=False
22+
batch_normalization=False
23+
feature_subsample=0
24+
rng_seed=2017
25+
save='save/combo'
26+
gen=False
27+
use_combo_score=False
28+
verbose=False
29+
use_landmark_genes=True
30+
31+
[Monitor_Params]
32+
solr_root=''
33+
timeout=3600
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
[Global_Params]
2+
data_url = 'ftp://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/normal-tumor/'
3+
train_data = 'nt_train2.csv'
4+
test_data = 'nt_test2.csv'
5+
model_name = 'nt3'
6+
conv = [128, 20, 1, 128, 10, 1]
7+
dense = [200,20]
8+
activation = 'relu'
9+
out_act = 'softmax'
10+
loss = 'categorical_crossentropy'
11+
optimizer = 'sgd'
12+
metrics = 'accuracy'
13+
epochs = 50
14+
batch_size = 20
15+
learning_rate = 0.001
16+
drop = 0.1
17+
classes = 2
18+
pool = [1, 10]
19+
save = '.'
20+
timeout = 3600
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
[Global_Params]
2+
dense=[1000, 500, 100, 50]
3+
batch_size=100
4+
epochs=3
5+
activation='relu'
6+
loss='mse'
7+
optimizer='sgd'
8+
learning_rate=0.001
9+
scaling='std'
10+
drop=0.1
11+
feature_subsample=500
12+
validation_split=0.1
13+
rng_seed=2017
14+
initialization='normal'
15+
min_logconc=-5.
16+
max_logconc=-4.
17+
category_cutoffs=[0.]
18+
test_cell_split=0.15
19+
cell_features=['expression']
20+
drug_features=['descriptors']
21+
subsample='naive_balancing'
22+
batch_normalization=False
23+
cell_noise_sigma=0.
24+
output_dir='save'

Pilot1/SCRATCH/UnoMT/README.md

Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
# UnoMT in Pytorch
2+
Multi-tasking (drug response, cell line classification, etc.) Uno Implemented in PyTorch.
3+
https://github.com/xduan7/UnoPytorch
4+
5+
6+
## Todos
7+
* More labels for the network like drug labels;
8+
* Dataloader hanging problem when num_workers set to more than 0;
9+
* Better pre-processing for drug descriptor integer features;
10+
* Network regularization with weight decay and/or dropout;
11+
* Hyper-parameter searching;
12+
13+
## Prerequisites
14+
```
15+
Python 3.6.4
16+
PyTorch 0.4.1
17+
SciPy 1.1.0
18+
pandas 0.23.4
19+
Scikit-Learn 0.19.1
20+
urllib3 1.23
21+
joblib 0.12.2
22+
```
23+
24+
25+
The default network structure is shown below:
26+
<img src="https://github.com/xduan7/UnoPytorch/blob/master/images/default_network.jpg" width="100%">
27+
28+
An example of the program output for training on NCI60 and valdiation on all other data sources is shown below:
29+
```
30+
python3.6 ./launcher.py
31+
Training Arguments:
32+
{
33+
"trn_src": "NCI60",
34+
"val_srcs": [
35+
"NCI60",
36+
"CTRP",
37+
"GDSC",
38+
"CCLE",
39+
"gCSI"
40+
],
41+
"grth_scaling": "none",
42+
"dscptr_scaling": "std",
43+
"rnaseq_scaling": "std",
44+
"dscptr_nan_threshold": 0.0,
45+
"qed_scaling": "none",
46+
"rnaseq_feature_usage": "source_scale",
47+
"drug_feature_usage": "both",
48+
"validation_ratio": 0.2,
49+
"disjoint_drugs": false,
50+
"disjoint_cells": true,
51+
"gene_layer_dim": 1024,
52+
"gene_latent_dim": 512,
53+
"gene_num_layers": 2,
54+
"drug_layer_dim": 4096,
55+
"drug_latent_dim": 2048,
56+
"drug_num_layers": 2,
57+
"autoencoder_init": true,
58+
"resp_layer_dim": 2048,
59+
"resp_num_layers_per_block": 2,
60+
"resp_num_blocks": 4,
61+
"resp_num_layers": 2,
62+
"resp_dropout": 0.0,
63+
"resp_activation": "none",
64+
"cl_clf_layer_dim": 256,
65+
"cl_clf_num_layers": 2,
66+
"drug_target_layer_dim": 512,
67+
"drug_target_num_layers": 2,
68+
"drug_qed_layer_dim": 512,
69+
"drug_qed_num_layers": 2,
70+
"drug_qed_activation": "sigmoid",
71+
"resp_loss_func": "mse",
72+
"resp_opt": "SGD",
73+
"resp_lr": 1e-05,
74+
"cl_clf_opt": "SGD",
75+
"cl_clf_lr": 0.01,
76+
"drug_target_opt": "SGD",
77+
"drug_target_lr": 0.01,
78+
"drug_qed_loss_func": "mse",
79+
"drug_qed_opt": "SGD",
80+
"drug_qed_lr": 0.01,
81+
"resp_val_start_epoch": 0,
82+
"early_stop_patience": 20,
83+
"lr_decay_factor": 0.98,
84+
"trn_batch_size": 32,
85+
"val_batch_size": 256,
86+
"max_num_batches": 1000,
87+
"max_num_epochs": 1000,
88+
"multi_gpu": false,
89+
"no_cuda": false,
90+
"rand_state": 0
91+
}
92+
RespNet(
93+
(_RespNet__gene_encoder): Sequential(
94+
(dense_0): Linear(in_features=942, out_features=1024, bias=True)
95+
(relu_0): ReLU()
96+
(dense_1): Linear(in_features=1024, out_features=1024, bias=True)
97+
(relu_1): ReLU()
98+
(dense_2): Linear(in_features=1024, out_features=512, bias=True)
99+
)
100+
(_RespNet__drug_encoder): Sequential(
101+
(dense_0): Linear(in_features=4688, out_features=4096, bias=True)
102+
(relu_0): ReLU()
103+
(dense_1): Linear(in_features=4096, out_features=4096, bias=True)
104+
(relu_1): ReLU()
105+
(dense_2): Linear(in_features=4096, out_features=2048, bias=True)
106+
)
107+
(_RespNet__resp_net): Sequential(
108+
(dense_0): Linear(in_features=2561, out_features=2048, bias=True)
109+
(activation_0): ReLU()
110+
(residual_block_0): ResBlock(
111+
(block): Sequential(
112+
(res_dense_0): Linear(in_features=2048, out_features=2048, bias=True)
113+
(res_relu_0): ReLU()
114+
(res_dense_1): Linear(in_features=2048, out_features=2048, bias=True)
115+
)
116+
(activation): ReLU()
117+
)
118+
(residual_block_1): ResBlock(
119+
(block): Sequential(
120+
(res_dense_0): Linear(in_features=2048, out_features=2048, bias=True)
121+
(res_relu_0): ReLU()
122+
(res_dense_1): Linear(in_features=2048, out_features=2048, bias=True)
123+
)
124+
(activation): ReLU()
125+
)
126+
(residual_block_2): ResBlock(
127+
(block): Sequential(
128+
(res_dense_0): Linear(in_features=2048, out_features=2048, bias=True)
129+
(res_relu_0): ReLU()
130+
(res_dense_1): Linear(in_features=2048, out_features=2048, bias=True)
131+
)
132+
(activation): ReLU()
133+
)
134+
(residual_block_3): ResBlock(
135+
(block): Sequential(
136+
(res_dense_0): Linear(in_features=2048, out_features=2048, bias=True)
137+
(res_relu_0): ReLU()
138+
(res_dense_1): Linear(in_features=2048, out_features=2048, bias=True)
139+
)
140+
(activation): ReLU()
141+
)
142+
(dense_1): Linear(in_features=2048, out_features=2048, bias=True)
143+
(res_relu_1): ReLU()
144+
(dense_2): Linear(in_features=2048, out_features=2048, bias=True)
145+
(res_relu_2): ReLU()
146+
(dense_out): Linear(in_features=2048, out_features=1, bias=True)
147+
)
148+
)
149+
================================================================================
150+
Training Epoch 1:
151+
Drug Weighted QED Regression Loss: 0.055694
152+
Drug Response Regression Loss: 1871.18
153+
154+
Validation Results:
155+
Cell Line Classification:
156+
Category Accuracy: 98.98%;
157+
Site Accuracy: 80.95%;
158+
Type Accuracy: 82.76%
159+
Drug Target Family Classification Accuracy: 1.85%
160+
Drug Weighted QED Regression
161+
MSE: 0.028476 MAE: 0.137004 R2: +0.17
162+
Drug Response Regression:
163+
NCI60 MSE: 1482.07 MAE: 27.89 R2: +0.53
164+
CTRP MSE: 2554.45 MAE: 38.62 R2: +0.27
165+
GDSC MSE: 2955.78 MAE: 42.73 R2: +0.11
166+
CCLE MSE: 2799.06 MAE: 42.44 R2: +0.31
167+
gCSI MSE: 2601.50 MAE: 38.44 R2: +0.35
168+
Epoch Running Time: 110.0 Seconds.
169+
================================================================================
170+
Training Epoch 2:
171+
...
172+
...
173+
174+
Program Running Time: 8349.6 Seconds.
175+
================================================================================
176+
Overall Validation Results:
177+
178+
Best Results from Different Models (Epochs):
179+
Cell Line Categories Best Accuracy: 99.474% (Epoch = 5)
180+
Cell Line Sites Best Accuracy: 97.401% (Epoch = 60)
181+
Cell Line Types Best Accuracy: 97.368% (Epoch = 40)
182+
Drug Target Family Best Accuracy: 66.667% (Epoch = 23)
183+
Drug Weighted QED Best R2 Score: +0.7422 (Epoch = 59, MSE = 0.008837, MAE = 0.069400)
184+
NCI60 Best R2 Score: +0.8107 (Epoch = 56, MSE = 601.18, MAE = 16.57)
185+
CTRP Best R2 Score: +0.3945 (Epoch = 37, MSE = 2127.28, MAE = 31.44)
186+
GDSC Best R2 Score: +0.2448 (Epoch = 22, MSE = 2506.03, MAE = 35.55)
187+
CCLE Best R2 Score: +0.4729 (Epoch = 4, MSE = 2153.30, MAE = 33.63)
188+
gCSI Best R2 Score: +0.4512 (Epoch = 31, MSE = 2203.04, MAE = 32.63)
189+
190+
Best Results from the Same Model (Epoch = 22):
191+
Cell Line Categories Accuracy: 99.408%
192+
Cell Line Sites Accuracy: 97.138%
193+
Cell Line Types Accuracy: 97.039%
194+
Drug Target Family Accuracy: 57.407%
195+
Drug Weighted QED R2 Score: +0.6033 (MSE = 0.013601, MAE = 0.093341)
196+
NCI60 R2 Score: +0.7885 (MSE = 672.00, MAE = 17.89)
197+
CTRP R2 Score: +0.3841 (MSE = 2163.66, MAE = 32.28)
198+
GDSC R2 Score: +0.2448 (MSE = 2506.03, MAE = 35.55)
199+
CCLE R2 Score: +0.4653 (MSE = 2184.62, MAE = 34.12)
200+
gCSI R2 Score: +0.4271 (MSE = 2299.59, MAE = 32.93)
201+
```
202+
203+
For default hyper parameters, the transfer learning matrix results are shown below:
204+
<p align="center">
205+
<img src="https://github.com/xduan7/UnoPytorch/blob/master/images/default_results.jpg" width="80%">
206+
</p>
207+
208+
Note that the green cells represents R2 score of higher than 0.1, red cells are R2 scores lower than -0.1 and yellows are for all the values in between.

0 commit comments

Comments
 (0)