Prediction of selectivity and Fukui indices

This is the repository for the paper XXXXXXXXXXX.

It is used for predicting the selectivity under Anti-Friedel Crafts alkylation for organic molecules.

Machine Learning

We trained machine learning classifiers on predicting the most active site. To check which active site is predicted, we used the predict_proba() method of the classifier to output the probability, instead of just the predicted class. Then, for all atomic sites in a molecule, the site with the highest probability was selected as the predicted active site. In order to get a fair estimate of the model comparison, we also used a leave-one-molecule-out cross-validation.

Scripts

There are two main scripts in this repository:

src/get_features_from_smiles.py: Generates SOAP features from SMILES strings and optionally from DFT structures as well. The input is a csv file with SMILES strings, the output is a csv file in which additionally to all original columns the SOAP features and PCA-compressed SOAP features are added.
src/run_simple_ML.py: Runs machine learning models on the csv file with SOAP features. The options are explained in the script.

Data

Data is contained in the data directory. This directory contains three subdirectories:

generate_features: Contains the original csv file with SMILES string and the generated csv file with SOAP features.
test_new_molecules: Contains the same csv files, but with additional 4 molecules that were tested at a later stage of the project.
ml_results: Contains the results of the machine learning models. The file ml_results.xlsx in the main directory contains an overview of these machine learning experiments.

Conda environment

The exact conda environment used for the results in the paper can be found in the conda_env.yml file. This environment works on a MacBook but should also work on other operating systems. To create the environment, run the following command:

conda env create -f conda_env.yml

License

This repository is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prediction of selectivity and Fukui indices

Machine Learning

Scripts

Data

Conda environment

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda_env.yml		conda_env.yml
ml_results.xlsx		ml_results.xlsx

License

CCEMGroupTCD/Fukui-Indices

Folders and files

Latest commit

History

Repository files navigation

Prediction of selectivity and Fukui indices

Machine Learning

Scripts

Data

Conda environment

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages