Skip to content

Fast synthetic accessibility evaluation for fine chemicals, based on molecular price

Notifications You must be signed in to change notification settings

OptiMaL-PSE-Lab/MolPrice

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Optimal PSE logo


MolPrice

Python 3.10 License: MIT Code style: black

A deep learning model for synthetic accessibility prediction based on molecular prices.

Installation

Clone the repository and create a virtual environment with conda:

# Get the code
git clone https://github.com/fredhastedt/MolPrice.git
cd MolPrice

# Create environment
conda env create -f molprice.yml
conda activate molprice

Model Usage

There is two ways to use MolPrice model described below:

Standalone Numpy Model

We now provide MolPrice as a lightweight, standalone numpy implementation for the Morgan fingerprint. The pickled model is provided as part of the repo.

Note

This implementation does not need a GPU, and calculates the price in less than 0.1 ms given a molecule!

PyTorch Models

We provide model checkpoints for MolPrice via Figshare . One can choose from the following models:

  1. SECFP fingerprint (with or w/o 2D features)
  2. Morgan Fingerprint (with or w/o 2D features)

Once the model is downloaded, place in ./models directory.
These models take about 1.3 ms per molecule.

Predicting Molecular Prices

One can run the code per molecule or using batch prediction. In case of batch prediction, please first save all molecules in a .csv file.

# Single molecule prediction
# Using NumPy model:
python -m bin.numpy_predict --mol "CC(=O)OC1=CC=CC=C1C(=O)O"

# Using PyTorch model
python -m bin.predict --mol "CC(=O)OC1=CC=CC=C1C(=O)O" --cn MP_SECFP_hybrid

# Batch prediction
# Using NumPy model
python -m bin.numpy_predict --mol molecules.csv --smiles-col SMILES_COLUMN

# Using PyTorch model
python -m bin.predict --mol molecules.csv --cn MP_SECFP_hybrid --smiles-col SMILES_COLUMN

Reproducing SA Test Results

The test datasets for SA comparison can be obtained from Figshare via test files. Once the files are downloaded, place within ./testing directory.
The results for each test dataset can be obtained by running:

python -m bin.test main_ood --model Fingerprint --cn MODEL_CHECKPOINT --test_name TEST_FILE1,TEST_FILE2 --combined

For example, if one downloaded the MP_SECFP_hybrid model and saved the test files 3 as follows: TS3_hs.csv and TS3_es.csv, one can run:

python -m bin.test main_ood --model Fingerprint --cn MP_SECFP_hybrid/best.ckpt --test_name TS3_hs.csv,TS3_es.csv --combined

Model Training

If one has access to a database containing molecules along with their prices, one can run the following script to train their own model (given that prices are in log(USD)/mmol):

python -m bin.train --model MODEL_TYPE --fp FINGERPRINT_TYPE

Within the script, the following arguments can be adjusted:
- model: Choose between [Fingerprint, RoBERTa, Transformer, LSTM_EFG]
- fp: Choose between [atom, rdkit, morgan, mhfp] (mhfp is the SECFP fingerprint encoder)

In one has a pre-trained Fingerprint model, one can train the model on the contrastive loss by calling:

python -m bin.train --model Fingerprint --fp FINGERPRINT_TYPE --combined --cn MODEL_CHECKPOINT

About

Fast synthetic accessibility evaluation for fine chemicals, based on molecular price

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Shell 0.2%