MolPrice

A deep learning model for synthetic accessibility prediction based on molecular prices.

Installation

Clone the repository and create a virtual environment with conda:

# Get the code
git clone https://github.com/fredhastedt/MolPrice.git
cd MolPrice

# Create environment
conda env create -f molprice.yml
conda activate molprice

Model Usage

There is two ways to use MolPrice model described below:

Standalone Numpy Model

We now provide MolPrice as a lightweight, standalone numpy implementation for the Morgan fingerprint. The pickled model is provided as part of the repo.

Note

This implementation does not need a GPU, and calculates the price in less than 0.1 ms given a molecule!

PyTorch Models

We provide model checkpoints for MolPrice via Figshare . One can choose from the following models:

SECFP fingerprint (with or w/o 2D features)
Morgan Fingerprint (with or w/o 2D features)

Once the model is downloaded, place in ./models directory.
These models take about 1.3 ms per molecule.

Predicting Molecular Prices

One can run the code per molecule or using batch prediction. In case of batch prediction, please first save all molecules in a .csv file.

# Single molecule prediction
# Using NumPy model:
python -m bin.numpy_predict --mol "CC(=O)OC1=CC=CC=C1C(=O)O"

# Using PyTorch model
python -m bin.predict --mol "CC(=O)OC1=CC=CC=C1C(=O)O" --cn MP_SECFP_hybrid

# Batch prediction
# Using NumPy model
python -m bin.numpy_predict --mol molecules.csv --smiles-col SMILES_COLUMN

# Using PyTorch model
python -m bin.predict --mol molecules.csv --cn MP_SECFP_hybrid --smiles-col SMILES_COLUMN

Reproducing SA Test Results

The test datasets for SA comparison can be obtained from Figshare via test files. Once the files are downloaded, place within ./testing directory.
The results for each test dataset can be obtained by running:

python -m bin.test main_ood --model Fingerprint --cn MODEL_CHECKPOINT --test_name TEST_FILE1,TEST_FILE2 --combined

For example, if one downloaded the MP_SECFP_hybrid model and saved the test files 3 as follows: TS3_hs.csv and TS3_es.csv, one can run:

python -m bin.test main_ood --model Fingerprint --cn MP_SECFP_hybrid/best.ckpt --test_name TS3_hs.csv,TS3_es.csv --combined

Model Training

If one has access to a database containing molecules along with their prices, one can run the following script to train their own model (given that prices are in log(USD)/mmol):

python -m bin.train --model MODEL_TYPE --fp FINGERPRINT_TYPE

Within the script, the following arguments can be adjusted:
- model: Choose between [Fingerprint, RoBERTa, Transformer, LSTM_EFG]
- fp: Choose between [atom, rdkit, morgan, mhfp] (mhfp is the SECFP fingerprint encoder)

In one has a pre-trained Fingerprint model, one can train the model on the contrastive loss by calling:

python -m bin.train --model Fingerprint --fp FINGERPRINT_TYPE --combined --cn MODEL_CHECKPOINT

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
bin		bin
configs		configs
data		data
logs		logs
models		models
scripts		scripts
src		src
testing		testing
.gitignore		.gitignore
README.md		README.md
molprice.yml		molprice.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MolPrice

Installation

Model Usage

Standalone Numpy Model

PyTorch Models

Predicting Molecular Prices

Reproducing SA Test Results

Model Training

About

Uh oh!

Releases

Packages

Languages

OptiMaL-PSE-Lab/MolPrice

Folders and files

Latest commit

History

Repository files navigation

MolPrice

Installation

Model Usage

Standalone Numpy Model

PyTorch Models

Predicting Molecular Prices

Reproducing SA Test Results

Model Training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages