Skip to content

Commit 9815442

Browse files
committed
Merge branch 'release_04' of github.com:ECP-CANDLE/Benchmarks into release_04
2 parents 2583a46 + f134c0f commit 9815442

File tree

9 files changed

+68
-683
lines changed

9 files changed

+68
-683
lines changed

Pilot1/Attn/attn_bin_working_jan7_h5.py

Lines changed: 0 additions & 550 deletions
This file was deleted.

Pilot1/Attn/attn_bin_working_jan7_h5.sh

Lines changed: 0 additions & 51 deletions
This file was deleted.

Pilot1/Attn/attn_bsub.sh

Lines changed: 0 additions & 57 deletions
This file was deleted.

Pilot1/Attn/cmd1.sh

Lines changed: 0 additions & 17 deletions
This file was deleted.

Pilot1/Attn/cmd2.sh

Lines changed: 0 additions & 5 deletions
This file was deleted.

examples/histogen/README.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,27 @@
22

33
## Usage
44

5+
The CANDLE-ized versions of the codes can simply be run without any command line arguments, with the default settings being read from the corresponding `default_model` file.
6+
When needed, the CANDLE versions also use the `fetch_file` methods, which store the data in the top-level `Data/Examples` directory.
7+
Any keywords in the `default_model` file can be overwritten with the appropriate command line argument.
8+
The orginal codes and workflow below are preserved for comparison.
9+
New package dependencies are now included in the top-level install instructions.
10+
11+
# CANDLE workflow
12+
13+
Sample images (the trained models will be downloaded automatically).
14+
```
15+
python sample_baseline_pytorch.py
16+
```
17+
Training pipeline
18+
```
19+
python train_vqvae_baseline_pytorch.py -e 1
20+
pythong extract_baseline_pytorch.py
21+
python train_pixelsnail_baseline_pytorch.py
22+
```
23+
24+
# Original workflow
25+
526
Sample histology images from a trained histology image model.
627

728
1. Download trained models into `checkpoint` folder.

examples/image-vae/README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,18 @@
1+
## Usage
2+
3+
The CANDLE-ized versions of the codes can simply be run without any command line arguments, with the default settings being read from the corresponding `default_model` file.
4+
When needed, the CANDLE versions also use the `fetch_file` methods, which store the data in the top-level `Data/Examples` directory.
5+
Any keywords in the `default_model` file can be overwritten with the appropriate command line argument.
6+
The orginal codes and workflow below are preserved for comparison.
7+
New package dependencies are now included in the top-level install instructions.
8+
9+
# CANDLE workflow
10+
11+
```
12+
python image_vae_baseline_pytorch.py
13+
python sample_baseline_pytorch.py
14+
```
15+
116
# Image VAE
217

318
2D-Images are a relatively unexplored representation for molecular learning tasks. We create a molecular generator and embedding based on 2D-depictions of molecules. We use a variational autoencoder (VAE) to encode 2D-images of molecules to a latent space of 512, and with a gaussian prior sample the space and decode directly to images. A modified ResNet is used to encode molecular depictions to a latent space. A decoder is created by performing the inverse operations of ResNet (i.e. run blocks in reverse order replacing convolutional layers with deconvolution layers (transpose convolution). One can embed molecules in this space by use of only the decoder, or by generating random gaussian noise one can decode a latent vector into a molecular image. In the latent space, generation can also be steered through interpolation or epsilon-sampling. VAEs are prone to mode collapse and exploding gradients. Mode collapse occurs when enforcing a normal prior on the latent space causes any learning to “collapse” and the model ceases to learn. Exploding gradients occurs when the gradients become so large that the optimization routine becomes unstable and again learning ceases to occur. To mode collapse, we use KL-divergence loss annealing. KL-divergence loss annealing slowly ramps up the weight of the normal prior in latent space as the model learns to reconstruct the encoded images better. This essentially enforces that the decoder and encoder learn at similar rates. The avoid exploding gradients, we use gradient clipping which limits the magnitude of any particular optimization step. This enforces a slow and gradual learning process.

examples/rnagen/README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,18 @@
1+
## Usage
2+
3+
The CANDLE-ized versions of the codes can simply be run without any command line arguments, with the default settings being read from the corresponding `default_model` file.
4+
When needed, the CANDLE versions also use the `fetch_file` methods, which store the data in the top-level `Data/Examples` directory.
5+
Any keywords in the `default_model` file can be overwritten with the appropriate command line argument.
6+
The orginal codes and workflow below are preserved for comparison.
7+
New package dependencies are now included in the top-level install instructions.
8+
9+
# CANDLE workflow
10+
11+
```
12+
python rnagen_baseline_keras2.py
13+
python rnagen_baseline_keras2.py --plot
14+
```
15+
116
# Improving cancer type classifier with synthetic data
217

318
We demonstrate the value of generator models in boosting the performance of predictive models.

examples/rnngen/README.md

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,22 @@
11
# RNN Generator
22
Based 99.98\% on the model from [1]
33

4+
## Usage
45

5-
# How to use Molecular Generator Code
6+
The CANDLE-ized versions of the codes can simply be run without any command line arguments, with the default settings being read from the corresponding `default_model` file.
7+
When needed, the CANDLE versions also use the `fetch_file` methods, which store the data in the top-level `Data/Examples` directory.
8+
Any keywords in the `default_model` file can be overwritten with the appropriate command line argument.
9+
The orginal codes and workflow below are preserved for comparison.
10+
11+
# CANDLE workflow
12+
13+
This will automatically download the models needed and run with the `autosave.model.pt` set in the `default_model` file.
14+
15+
```
16+
python infer_rnngen_baseline_pytorch.py
17+
```
18+
19+
# Original workflow
620

721
## Python dependencies
822

@@ -38,6 +52,6 @@ python infer.py -i mosesrun/ --logdir pilot1/ -o p1_poor.txt -n 10000 -vr --mode
3852

3953

4054
# Refereces:
41-
1. Gupta, A., Müller, A., Huisman, B., Fuchs, J., Schneider, P., Schneider, G. (2018). Generative Recurrent Networks for De Novo Drug Design Molecular Informatics 37(1-2)https://dx.doi.org/10.1002/minf.201700111
42-
2. Polykovskiy, D., Zhebrak, A., Sanchez-Lengeling, B., Golovanov, S., Tatanov, O., Belyaev, S., Kurbanov, R., Artamonov, A., Aladinskiy, V., Veselov, M., Kadurin, A., Nikolenko, S., Aspuru-Guzik, A., Zhavoronkov, A. (2018). Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Modelshttps://arxiv.org/abs/1811.12823
55+
1. Gupta, A., Müller, A., Huisman, B., Fuchs, J., Schneider, P., Schneider, G. (2018). Generative Recurrent Networks for De Novo Drug Design Molecular Informatics 37(1-2) https://dx.doi.org/10.1002/minf.201700111
56+
2. Polykovskiy, D., Zhebrak, A., Sanchez-Lengeling, B., Golovanov, S., Tatanov, O., Belyaev, S., Kurbanov, R., Artamonov, A., Aladinskiy, V., Veselov, M., Kadurin, A., Nikolenko, S., Aspuru-Guzik, A., Zhavoronkov, A. (2018). Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models https://arxiv.org/abs/1811.12823
4357

0 commit comments

Comments
 (0)