This repository provides an end-to-end framework for analyzing the synthetic accessibility of non-natural amino acids (NNAAs) and their protected forms. It leverages cheminformatics tools and deep learning models to:
- Generate protected forms of input amino acids
- Perform retrosynthetic analysis using AiZynthFinder
- Score synthetic routes using Chemformer and expert-augmented feasibility models
- Select the best synthetic route for each protected form
The workflow is designed to support peptide and amino acid design projects exploring beyond the standard 20 natural amino acids.
- Automated protection of amino acids using customizable protection group rules
- Retrosynthetic analysis via AiZynthFinder
- Feasibility scoring using Chemformer and expert-augmented models
- Extensible and modular end-to-end pipeline for synthesizability analysis
Before using this repository, you must manually install the following external tools and libraries from their respective GitHub repositories:
-
AiZynthFinder
Retrosynthetic planning software and its environment to run this project. GitHub: https://github.com/MolecularAI/aizynthfinder -
AiZynthModels
Provides models and environment for AiZynthFinder and Chemformer API.
GitHub: https://github.com/MolecularAI/aizynthmodels -
Chemformer
Transformer-based model for reaction and molecule prediction. You will need to obtain only the vocabulary to run Chemformer API (bart_vocab_downstream.json) GitHub: https://github.com/MolecularAI/Chemformer -
Expert-Augmented Scorer Model Files To use the expert-augmented scoring models, download the following files into a folder named
expert_augmented_models:mkdir expert_augmented_models cd expert_augmented_models wget https://zenodo.org/records/14533779/files/deepset_route_scoring_sdf.onnx?download=1 -O deepset_route_scoring_sdf.onnx wget https://zenodo.org/records/14533779/files/reaction_class_ranks.csv?download=1 -O reaction_class_ranks.csv wget https://zenodo.org/records/14533779/files/scscore_model_1024_bits.onnx?download=1 -O scscore_model_1024_bits.onnx wget https://raw.githubusercontent.com/MolecularAI/reaction_utils/refs/heads/route-scoring-example/examples/route-scoring/example-routes.json
-
Clone this repository
-
Create the environment from AiZynthFinder and AiZynthModels:
- The
aizynthfinderrepository provides the environment to run this pipeline. - The
aizynthmodelsrepository provides the environment to run the Chemformer API.
- The
-
Install reaction-utils dependencies to both environments:
cd ../reaction_utils
poetry install --all-extras-
Prepare the following resources (also described in the
example.ipynb):- AiZynthFinder configuration file (example in
nnaasynth/aizynthfinder_setup/aizynth_config.yml)- Download the public model and stock files as described in the AiZynthFinder repository.
- Protection group and reaction rules CSVs (stored in
reaction-utils/examples/nnaa/) - DeepSet and SCScore model files (described above)
- AiZynthFinder configuration file (example in
-
Run the Chemformer API Chemformer API can be run locally with the bash script and the model checkpoint provided in the
nnaasynth/chemformer_apifolder. To run the API, you need to:
- set the required paths in chemformer_api.sh
- run the api and obtain the port number
conda activate aizynthmodels
bash chemformer_api.sh- In case of a different model, please ensure it is compatible with the Chemformer API.
example.ipynb— Jupyter notebook demonstrating usagerun_synthesizability_analysis.py— Main pipeline classnnaasynth/— Main package directoryaizynthfinder_setup/— AiZynthFinder configuration and stock fileschemformer_api/— Chemformer API-related code (if any)dtos/— Data transfer objects for amino acids and resultsutils/— Utility functionsworkflow_components/— Core modules for protection, scoring, and running retrosynthesis
If you use this codebase in your research, please cite the relevant tools (AiZynthFinder, Chemformer, reaction-utils) and our preprint:
@misc{geylan2025nnaasynth,
title={From Concept to Chemistry: Integrating protection group strategy and reaction feasibility into non-natural amino acid synthesis planning},
author={Gökçe Geylan and Mikhail Kabeshov and Samuel Genheden and Christos Kannas and Thierry Kogej and Leonardo De Maria and Florian David and Ola Engkvist},
year={2025},
eprint={10.26434/chemrxiv-2025-jtwn1},
archivePrefix={ChemRxiv},
url={https://chemrxiv.org/engage/chemrxiv/article-details/68592f151a8f9bdab5199312},
}This project is licensed under the Apache 2.0 License. See the LICENSE file for more details.
For questions or contributions, please contact Gökçe Geylan at gokcegeylan96@gmail.com