Authors: Song Yin, Yunsie Chung, Alec Glisman, Jason Wang, and Alan Cheng.
This project extends the original PepMNet to handle non-Canonical AAs and more complex peptide topologies, such as cyclic and cross-linked peptides. The goal is to create a generalized model that can process an arbitrary peptide sequence provided in SMILES format, while retaining the ability to utilize HELM notation for hierarchical graph construction. This repository is currently a work in progress.
The original PepMNet w published by Daniel Garzon-Otero et al. is designed to work with linear peptides composed of the 20 canonical amino acids and can be found below:
@article{otero2025pepmnet,
title={PepMNet: a hybrid deep learning model for predicting peptide properties using hierarchical graph representations},
author={Otero, Daniel Garzon and Akbari, Omid and Bilodeau, Camille},
journal={Molecular Systems Design \& Engineering},
volume={10},
number={3},
pages={205--218},
year={2025},
publisher={Royal Society of Chemistry}
}
- Atomic level graph-processing implemented
- Amino acid level graph-processing implemented
- Hierarchical graph building in progress
- Complete SMILES to HELM monomer mapping for hierarchical graph construction
- Validate generalized model performance across diverse peptide datasets
Sequence → HELM → Mol (RDKit) → Nodes (Atoms) / Edges (Bonds) features
SMILES → Mol (RDKit) → Nodes (Atoms) / Edges (Bonds) features
- Nodes:
Sequence → HELM → Mol (RDKit) → Amino Acids → Biopython → Amino Acid features - Edges:
Get edges from linear peptide sequence but in one direction
- Nodes:
HELM → monomer → complete monomer SMILES → Mol (RDKit) → MOE+RDKit+Mordred → PCA → monomer features - Edges:
Get edges from HELM monomer sequence and connection
Identify peptide bond → fragments → assigns fragment index to each atom → Sum among all atoms for each feature for an Amino Acid → Concat with Amino Acid features → Avg Pool → Final Embedding
Mapping fragments of SMILES to HELM monomers (Ongoing)