Skip to content

MSDLLCpapers/General-PepMNet

Repository files navigation

General-PepMNet

Authors: Song Yin, Yunsie Chung, Alec Glisman, Jason Wang, and Alan Cheng.

This project extends the original PepMNet to handle non-Canonical AAs and more complex peptide topologies, such as cyclic and cross-linked peptides. The goal is to create a generalized model that can process an arbitrary peptide sequence provided in SMILES format, while retaining the ability to utilize HELM notation for hierarchical graph construction. This repository is currently a work in progress.

The original PepMNet w published by Daniel Garzon-Otero et al. is designed to work with linear peptides composed of the 20 canonical amino acids and can be found below:

@article{otero2025pepmnet,
  title={PepMNet: a hybrid deep learning model for predicting peptide properties using hierarchical graph representations},
  author={Otero, Daniel Garzon and Akbari, Omid and Bilodeau, Camille},
  journal={Molecular Systems Design \& Engineering},
  volume={10},
  number={3},
  pages={205--218},
  year={2025},
  publisher={Royal Society of Chemistry}
}

Progress

Status

  • Atomic level graph-processing implemented
  • Amino acid level graph-processing implemented
  • Hierarchical graph building in progress

Next Steps

  • Complete SMILES to HELM monomer mapping for hierarchical graph construction
  • Validate generalized model performance across diverse peptide datasets

Architecture Comparison

Atomic Level Nodes/Edges

Original PepMNet

Sequence → HELM → Mol (RDKit) → Nodes (Atoms) / Edges (Bonds) features

Generalized Model

SMILES → Mol (RDKit) → Nodes (Atoms) / Edges (Bonds) features

Amino Acid Level Nodes/Edges

Original PepMNet

  • Nodes: Sequence → HELM → Mol (RDKit) → Amino Acids → Biopython → Amino Acid features
  • Edges: Get edges from linear peptide sequence but in one direction

Generalized Model

  • Nodes: HELM → monomer → complete monomer SMILES → Mol (RDKit) → MOE+RDKit+Mordred → PCA → monomer features
  • Edges: Get edges from HELM monomer sequence and connection

Building Hierarchical Graphs

Original PepMNet

Identify peptide bond → fragments → assigns fragment index to each atom → Sum among all atoms for each feature for an Amino Acid → Concat with Amino Acid features → Avg Pool → Final Embedding

Generalized Model

Mapping fragments of SMILES to HELM monomers (Ongoing)

About

Generalization of PepMNet to allow for non-canonical amino acids and more complex peptide topologies

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •