Skip to content

Commit 39f5cec

Browse files
add simple_mlp (#3)
* add simple_mlp Co-authored-by: Kai Waldrant <[email protected]> * update gitignore and resources * engines → runners * fix config * commit mlp changes * create test resources * update readme * add changelog --------- Co-authored-by: Kai Waldrant <[email protected]>
1 parent 339c965 commit 39f5cec

File tree

18 files changed

+600
-29
lines changed

18 files changed

+600
-29
lines changed

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
# task_predict_modality 0.1.1
2+
3+
## NEW FUNCTIONALITY
4+
5+
* Added Simple MLP method (PR #3).
6+
17
# task_predict_modality 0.1.0
28

39
Initial release after migrating the codebase.

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ data shows that this is not trivial.
4141
| Kaiwen Deng | contributor |
4242
| Louise Deconinck | author |
4343
| Robrecht Cannoodt | author, maintainer |
44+
| Xueer Chen | contributor |
45+
| Jiwei Liu | contributor |
4446

4547
## API
4648

_viash.yaml

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,18 @@ authors:
7373
info:
7474
github: rcannood
7575
orcid: "0000-0003-3641-729X"
76-
76+
- name: Xueer Chen
77+
roles: [ contributor ]
78+
info:
79+
github: xuerchen
80+
81+
- name: Jiwei Liu
82+
roles: [ contributor ]
83+
info:
84+
github: daxiongshu
85+
86+
orcid: "0000-0002-8799-9763"
87+
7788
links:
7889
issue_tracker: https://github.com/openproblems-bio/task_predict_modality/issues
7990
repository: https://github.com/openproblems-bio/task_predict_modality
@@ -84,8 +95,8 @@ info:
8495

8596
test_resources:
8697
- type: s3
87-
path: s3://openproblems-data/resources_test/common/
88-
dest: resources_test/common
98+
path: s3://openproblems-data/resources_test/common/openproblems_neurips2021
99+
dest: resources_test/common/openproblems_neurips2021
89100
- type: s3
90101
path: s3://openproblems-data/resources_test/task_predict_modality/
91102
dest: resources_test/task_predict_modality

common

scripts/create_datasets/test_resources.sh

Lines changed: 18 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -29,31 +29,25 @@ nextflow run . \
2929

3030
echo "Run one method"
3131

32-
viash run src/methods/knnr_py/config.vsh.yaml -- \
33-
--input_train_mod1 $OUTPUT_DIR/openproblems_neurips2021/bmmc_cite/normal/train_mod1.h5ad \
34-
--input_train_mod2 $OUTPUT_DIR/openproblems_neurips2021/bmmc_cite/normal/train_mod2.h5ad \
35-
--input_test_mod1 $OUTPUT_DIR/openproblems_neurips2021/bmmc_cite/normal/test_mod1.h5ad \
36-
--output $OUTPUT_DIR/openproblems_neurips2021/bmmc_cite/normal/prediction.h5ad
37-
38-
viash run src/methods/knnr_py/config.vsh.yaml -- \
39-
--input_train_mod1 $OUTPUT_DIR//openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad \
40-
--input_train_mod2 $OUTPUT_DIR//openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad \
41-
--input_test_mod1 $OUTPUT_DIR//openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad \
42-
--output $OUTPUT_DIR//openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad
43-
44-
viash run src/methods/knnr_py/config.vsh.yaml -- \
45-
--input_train_mod1 $OUTPUT_DIR/openproblems_neurips2021/bmmc_multiome/normal/train_mod1.h5ad \
46-
--input_train_mod2 $OUTPUT_DIR/openproblems_neurips2021/bmmc_multiome/normal/train_mod2.h5ad \
47-
--input_test_mod1 $OUTPUT_DIR/openproblems_neurips2021/bmmc_multiome/normal/test_mod1.h5ad \
48-
--output $OUTPUT_DIR/openproblems_neurips2021/bmmc_multiome/normal/prediction.h5ad
49-
50-
viash run src/methods/knnr_py/config.vsh.yaml -- \
51-
--input_train_mod1 $OUTPUT_DIR/openproblems_neurips2021/bmmc_multiome/swap/train_mod1.h5ad \
52-
--input_train_mod2 $OUTPUT_DIR/openproblems_neurips2021/bmmc_multiome/swap/train_mod2.h5ad \
53-
--input_test_mod1 $OUTPUT_DIR/openproblems_neurips2021/bmmc_multiome/swap/test_mod1.h5ad \
54-
--output $OUTPUT_DIR/openproblems_neurips2021/bmmc_multiome/swap/prediction.h5ad
32+
for name in bmmc_cite/normal bmmc_cite/swap bmmc_multiome/normal bmmc_multiome/swap; do
33+
viash run src/methods/knnr_py/config.vsh.yaml -- \
34+
--input_train_mod1 $OUTPUT_DIR/openproblems_neurips2021/$name/train_mod1.h5ad \
35+
--input_train_mod2 $OUTPUT_DIR/openproblems_neurips2021/$name/train_mod2.h5ad \
36+
--input_test_mod1 $OUTPUT_DIR/openproblems_neurips2021/$name/test_mod1.h5ad \
37+
--output $OUTPUT_DIR/openproblems_neurips2021/$name/prediction.h5ad
38+
39+
# pre-train simple_mlp
40+
rm -r $OUTPUT_DIR/openproblems_neurips2021/$name/models/simple_mlp/
41+
mkdir -p $OUTPUT_DIR/openproblems_neurips2021/$name/models/simple_mlp/
42+
viash run src/methods/simple_mlp/train/config.vsh.yaml -- \
43+
--input_train_mod1 $OUTPUT_DIR/openproblems_neurips2021/$name/train_mod1.h5ad \
44+
--input_train_mod2 $OUTPUT_DIR/openproblems_neurips2021/$name/train_mod2.h5ad \
45+
--input_test_mod1 $OUTPUT_DIR/openproblems_neurips2021/$name/test_mod1.h5ad \
46+
--output $OUTPUT_DIR/openproblems_neurips2021/$name/models/simple_mlp/
47+
done
5548

5649
# only run this if you have access to the openproblems-data bucket
5750
aws s3 sync --profile op \
58-
"$DATASET_DIR" s3://openproblems-data/resources_test/task_predict_modality \
51+
resources_test/task_predict_modality \
52+
s3://openproblems-data/resources_test/task_predict_modality \
5953
--delete --dryrun
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
__merge__: /src/api/comp_method_predict.yaml
2+
name: simplemlp_predict
3+
4+
info:
5+
test_setup:
6+
with_model:
7+
input_model: resources_test/task_predict_modality/openproblems_neurips2021/bmmc_cite/swap/models/simple_mlp
8+
9+
resources:
10+
- type: python_script
11+
path: script.py
12+
- path: ../resources/
13+
14+
engines:
15+
- type: docker
16+
image: openproblems/base_pytorch_nvidia:1.0.0
17+
# run_args: ["--gpus all --ipc=host"]
18+
setup:
19+
- type: python
20+
pypi:
21+
- scikit-learn
22+
- scanpy
23+
- pytorch-lightning
24+
runners:
25+
- type: executable
26+
- type: nextflow
27+
directives:
28+
label: [highmem, hightime, midcpu, gpu, highsharedmem]
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
from glob import glob
2+
import sys
3+
import numpy as np
4+
from scipy.sparse import csc_matrix
5+
import anndata as ad
6+
import torch
7+
from torch.utils.data import TensorDataset,DataLoader
8+
9+
## VIASH START
10+
par = {
11+
'input_train_mod1': 'resources_test/task_predict_modality/openproblems_neurips2021/bmmc_multiome/swap/train_mod1.h5ad',
12+
'input_train_mod2': 'resources_test/task_predict_modality/openproblems_neurips2021/bmmc_multiome/swap/train_mod2.h5ad',
13+
'input_test_mod1': 'resources_test/task_predict_modality/openproblems_neurips2021/bmmc_multiome/swap/test_mod1.h5ad',
14+
'input_model': 'output/model',
15+
'output': 'output/prediction'
16+
}
17+
meta = {
18+
'config': 'target/executable/methods/simplemlp_predict/.config.vsh.yaml',
19+
'resources_dir': 'target/executable/methods/simplemlp_predict',
20+
'cpus': 10
21+
}
22+
## VIASH END
23+
24+
resources_dir = f"{meta['resources_dir']}/resources"
25+
sys.path.append(resources_dir)
26+
from models import MLP
27+
import utils
28+
29+
def _predict(model,dl):
30+
if torch.cuda.is_available():
31+
model = model.cuda()
32+
else:
33+
model = model.cpu()
34+
model.eval()
35+
yps = []
36+
for x in dl:
37+
with torch.no_grad():
38+
if torch.cuda.is_available():
39+
x0 = x[0].cuda()
40+
else:
41+
x0 = x[0].cpu()
42+
yp = model(x0)
43+
yps.append(yp.detach().cpu().numpy())
44+
yp = np.vstack(yps)
45+
return yp
46+
47+
48+
print('Load data', flush=True)
49+
input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
50+
input_test_mod1 = ad.read_h5ad(par['input_test_mod1'])
51+
52+
# determine variables
53+
mod_1 = input_test_mod1.uns['modality']
54+
mod_2 = input_train_mod2.uns['modality']
55+
56+
task = f'{mod_1}2{mod_2}'
57+
58+
print('Load ymean', flush=True)
59+
ymean_path = f"{par['input_model']}/{task}_ymean.npy"
60+
ymean = np.load(ymean_path)
61+
62+
print('Start predict', flush=True)
63+
if task == 'GEX2ATAC':
64+
y_pred = ymean*np.ones([input_test_mod1.n_obs, input_test_mod1.n_vars])
65+
else:
66+
folds = [0, 1, 2]
67+
68+
ymean = torch.from_numpy(ymean).float()
69+
yaml_path=f"{resources_dir}/yaml/mlp_{task}.yaml"
70+
config = utils.load_yaml(yaml_path)
71+
X = input_test_mod1.layers["normalized"].toarray()
72+
X = torch.from_numpy(X).float()
73+
74+
te_ds = TensorDataset(X)
75+
76+
yp = 0
77+
for fold in folds:
78+
# load_path = f"{par['input_model']}/{task}_fold_{fold}/version_0/checkpoints/*"
79+
load_path = f"{par['input_model']}/{task}_fold_{fold}/**.ckpt"
80+
print(load_path)
81+
ckpt = glob(load_path)[0]
82+
model_inf = MLP.load_from_checkpoint(
83+
ckpt,
84+
in_dim=X.shape[1],
85+
out_dim=input_test_mod1.n_vars,
86+
ymean=ymean,
87+
config=config
88+
)
89+
te_loader = DataLoader(
90+
te_ds,
91+
batch_size=config.batch_size,
92+
num_workers=0,
93+
shuffle=False,
94+
drop_last=False
95+
)
96+
yp = yp + _predict(model_inf, te_loader)
97+
98+
y_pred = yp/len(folds)
99+
100+
y_pred = csc_matrix(y_pred)
101+
102+
adata = ad.AnnData(
103+
layers={"normalized": y_pred},
104+
shape=y_pred.shape,
105+
uns={
106+
'dataset_id': input_test_mod1.uns['dataset_id'],
107+
'method_id': meta['functionality_name'],
108+
},
109+
)
110+
111+
print('Write data', flush=True)
112+
adata.write_h5ad(par['output'], compression = "gzip")
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
import torch
2+
import pytorch_lightning as pl
3+
import torch.nn as nn
4+
import torch.nn.functional as F
5+
6+
class MLP(pl.LightningModule):
7+
def __init__(self,in_dim,out_dim,ymean,config):
8+
super(MLP, self).__init__()
9+
if torch.cuda.is_available():
10+
self.ymean = ymean.cuda()
11+
else:
12+
self.ymean = ymean
13+
H1 = config.H1
14+
H2 = config.H2
15+
p = config.dropout
16+
self.config = config
17+
self.fc1 = nn.Linear(in_dim, H1)
18+
self.fc2 = nn.Linear(H1,H2)
19+
self.fc3 = nn.Linear(H1+H2, out_dim)
20+
self.dp2 = nn.Dropout(p=p)
21+
22+
def forward(self, x):
23+
x0 = x
24+
x1 = F.relu(self.fc1(x))
25+
x1 = self.dp2(x1)
26+
x = F.relu(self.fc2(x1))
27+
x = torch.cat([x,x1],dim=1)
28+
x = self.fc3(x)
29+
x = self.apply_mask(x)
30+
return x
31+
32+
def apply_mask(self,yp):
33+
tmp = torch.ones_like(yp).float()*self.ymean
34+
mask = tmp<self.config.threshold
35+
mask = mask.float()
36+
return yp*(1-mask) + tmp*mask
37+
38+
def training_step(self, batch, batch_nb):
39+
x,y = batch
40+
yp = self(x)
41+
criterion = nn.MSELoss()
42+
loss = criterion(yp, y)
43+
self.log('train_loss', loss, prog_bar=True)
44+
return loss
45+
46+
def validation_step(self, batch, batch_idx):
47+
x,y = batch
48+
yp = self(x)
49+
criterion = nn.MSELoss()
50+
loss = criterion(yp, y)
51+
self.log('valid_RMSE', loss**0.5, prog_bar=True)
52+
return loss
53+
54+
def predict_step(self, batch, batch_idx):
55+
if len(batch) == 2:
56+
x,_ = batch
57+
else:
58+
x = batch
59+
return self(x)
60+
61+
def configure_optimizers(self):
62+
lr = self.config.lr
63+
wd = float(self.config.wd)
64+
adam = torch.optim.Adam(self.parameters(), lr=lr, weight_decay=wd)
65+
if self.config.lr_schedule == 'adam':
66+
return adam
67+
elif self.config.lr_schedule == 'adam_cosin':
68+
slr = torch.optim.lr_scheduler.CosineAnnealingLR(adam, self.config.epochs)
69+
return [adam], [slr]
70+
else:
71+
assert 0
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
import yaml
2+
from collections import namedtuple
3+
4+
5+
def to_site_donor(data):
6+
df = data.obs['batch'].copy().to_frame().reset_index()
7+
df.columns = ['index','batch']
8+
df['site'] = df['batch'].apply(lambda x: x[:2])
9+
df['donor'] = df['batch'].apply(lambda x: x[2:])
10+
return df
11+
12+
13+
def split(tr1, tr2, fold):
14+
df = to_site_donor(tr1)
15+
mask = df['site'] == f's{fold+1}'
16+
maskr = ~mask
17+
18+
Xt = tr1[mask].layers["normalized"].toarray()
19+
X = tr1[maskr].layers["normalized"].toarray()
20+
21+
yt = tr2[mask].layers["normalized"].toarray()
22+
y = tr2[maskr].layers["normalized"].toarray()
23+
24+
print(f"{X.shape}, {y.shape}, {Xt.shape}, {yt.shape}")
25+
26+
return X,y,Xt,yt
27+
28+
29+
def load_yaml(path):
30+
with open(path) as f:
31+
x = yaml.safe_load(f)
32+
res = {}
33+
for i in x:
34+
res[i] = x[i]['value']
35+
config = namedtuple('Config', res.keys())(**res)
36+
print(config)
37+
return config
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
2+
# sample config defaults file
3+
epochs:
4+
desc: Number of epochs to train over
5+
value: 10
6+
batch_size:
7+
desc: Size of each mini-batch
8+
value: 512
9+
H1:
10+
desc: Number of hidden neurons in 1st layer of MLP
11+
value: 256
12+
H2:
13+
desc: Number of hidden neurons in 2nd layer of MLP
14+
value: 128
15+
dropout:
16+
desc: probs of zeroing values
17+
value: 0
18+
lr:
19+
desc: learning rate
20+
value: 0.001
21+
wd:
22+
desc: weight decay
23+
value: 1e-5
24+
threshold:
25+
desc: threshold to set values to zero
26+
value: 0
27+
lr_schedule:
28+
desc: learning rate scheduler
29+
value: adam

0 commit comments

Comments
 (0)