ART-AttentiveFP-CI

Project summary

This repository contains the code and data for the study "Machine Learning Model for Catalytic Asymmetric Reactions of Simple Alkenes: From Model to Chemical Insights." The project utilizes a manually curated dataset of asymmetric transformations of alkenes (ART: AlkeneReactionTriad) from literature to train a deep learning model focused on predicting reaction outcomes, particularly enantioselectivity. These reactions are crucial for catalytic enantioselective transformations of alkenes, yielding important building blocks such as cyclopropanes, aziridines, and arylated alkenes. Using this dataset various machine learning (ML) models are developed using different featurization techniques, including one-hot encoding, molecular fingerprints, SMILES, and molecular graphs. We used Optuna for hyper-parameter tuning for these ML models.

Dataset

ART Dataset (Alkene Reaction Triad)

The ART dataset comprises 376 catalytic asymmetric alkene reactions, covering three reaction classes:

Cyclopropanation
Aziridination
Arylation

Each reaction varies in the identity of the alkene, chiral ligand, and reaction substrate, forming a comprehensive benchmark for enantioselectivity prediction.

ART Core Data

Filename	Description
`ART_ind.csv`	Contains individual reaction entries with SMILES representations of all reaction components and the corresponding experimentally reported enantiomeric excess (%ee) values.
`ART_30_splits.xlsx`	Provides 30 predefined train–test splits of the ART dataset to enable statistically robust model evaluation and fair comparison across different machine learning methods.

Benchmark Reaction Datasets

In addition to the ART dataset, several widely used benchmark reaction datasets are included in the main branch of this repository. These datasets are used to evaluate the generalizability of the proposed model across diverse reaction classes and data regimes.

Asymmetric Hydrogenation

Asymmetric_hydrogenation_reaction.csv: Benchmark for asymmetric hydrogenation reactions.

N,S-Acetylation Reaction

N,S_acetylation_reaction.xlsx: Data covering N,S-acetylation reactions.

Buchwald–Hartwig Amination (BHA)

Datasets covering both High-Throughput (HTE) and Low-Throughput (LTE) experiment regimes:

BHA_HTE.csv: High-throughput experiments.
BHA_LTE.csv: Manually created LTE datapoints from HTE.

USPTO Gram-Scale Reaction Dataset

A large-scale dataset split for training and evaluation:

USPTO_gram_train.csv: Training set.
USPTO_gram_test.csv: Test set.

Predicted Ligand Dataset

Potential_Pubchem_Ligands_Predicted_ee.xlsx

Contains candidate ligands sourced from PubChem along with their model-predicted enantiomeric excess (%ee) values.

Figures

The codebase contains two primary visualization utilities: Attention_visualization_with_predictions.py, to visualize attention map with prediction overlays, and Heatmap.py, for heatmap-based performance analysis.

Environmental Setup

conda env create -f environment.yml
conda activate ART-AttentiveFP-CI
pip install dgl-cu110
pip install dgllife==0.2.8
pip install optuna
pip install rdkit

Demo & Instructions for use

Notebook1 showcases the training of a deep neural network (DNN) model using fingerprint techniques and Optuna for hyperparameter tuning.

Notebook2 illustrates the training of a DNN model using one-hot encoding and Optuna for hyperparameter tuning.

Notebook3 presents the training of Random Forest, SVM, Decision Tree, and Gradient Boosting models using one-hot encoding and Optuna for hyperparameter tuning.

Notebook4 details the training of the AttentiveFP model using the AttentiveFPAtomFeaturizer, which includes one-hot encodings for atom type, degree, hybridization, formal charge, and other relevant properties.

Notebook5 covers the training of the AttentiveFP-CI model, also utilizing the AttentiveFPAtomFeaturizer, but with a different approach to handling class imbalance in the loss function.

Notebook5 shows the creation of LTE dataset from HTE. It is important to highlight that Optuna is utilized for hyperparameter tuning to find the most promising hyperparameter sets for all these ML models.

Acknowledgement

We would like to acknowledge the following works:

https://github.com/OpenDrugAI/AttentiveFP.git

https://github.com/Sunojlab/Transfer_Learning_in_Catalysis.git

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
dataset		dataset
figures		figures
LICENSE		LICENSE
Notebook1_Regression_Optuna_DNN_Fingerprint.ipynb		Notebook1_Regression_Optuna_DNN_Fingerprint.ipynb
Notebook2_Regression_Optuna_DNN_OHE.ipynb		Notebook2_Regression_Optuna_DNN_OHE.ipynb
Notebook3_Regression_Optuna_ML_Fingerprint.ipynb		Notebook3_Regression_Optuna_ML_Fingerprint.ipynb
Notebook4_Regression_Optuna_AttentiveFP.ipynb		Notebook4_Regression_Optuna_AttentiveFP.ipynb
Notebook5_Regression_Optuna_AttentiveFP_CI.ipynb		Notebook5_Regression_Optuna_AttentiveFP_CI.ipynb
Notebook6_LTE_from_HTE.ipynb		Notebook6_LTE_from_HTE.ipynb
Readme.md		Readme.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ART-AttentiveFP-CI

Project summary

Dataset

ART Dataset (Alkene Reaction Triad)

ART Core Data

Benchmark Reaction Datasets

Asymmetric Hydrogenation

N,S-Acetylation Reaction

Buchwald–Hartwig Amination (BHA)

USPTO Gram-Scale Reaction Dataset

Predicted Ligand Dataset

Figures

Environmental Setup

Demo & Instructions for use

Acknowledgement

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

alhqlearn/ART-AttentiveFP-CI

Folders and files

Latest commit

History

Repository files navigation

ART-AttentiveFP-CI

Project summary

Dataset

ART Dataset (Alkene Reaction Triad)

ART Core Data

Benchmark Reaction Datasets

Asymmetric Hydrogenation

N,S-Acetylation Reaction

Buchwald–Hartwig Amination (BHA)

USPTO Gram-Scale Reaction Dataset

Predicted Ligand Dataset

Figures

Environmental Setup

Demo & Instructions for use

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Packages