Skip to content

LINKER: Learning Interactions Between Functional Groups and Residues With Chemical Knowledge-Enhanced Reasoning and Explainability

Notifications You must be signed in to change notification settings

HySonLab/LINKER

Repository files navigation

LINKER: Learning Interactions Between Functional Groups and Residues with Chemical Knowledge-Enhanced Reasoning and Explainability

LINKER is a framework for modeling and explaining protein–ligand interactions by explicitly learning interactions between ligand functional groups and protein residues. The method integrates chemical knowledge, structural information, and deep learning to improve interpretability in structure-based drug discovery.

Preliminary versions of this work were presented at NeurIPS 2025 workshops:


LINKER Architecture

Codeflow


Environment Setup

First, create the Conda environment required to run LINKER.
This will install all Python libraries and core dependencies needed for the pipeline.

conda env create -f environment.yml
conda activate linker

If you prefer using pip instead of Conda, you can install the required packages with:

pip install -r requirements.txt

External Dependencies

In addition to the Python environment above, LINKER relies on several external tools that must be installed separately. Since each dependency has its own installation procedure, please install them individually by following the instructions provided in the README.md file inside each corresponding folder.

Required Tools

  • PLIP – Protein–Ligand Interaction Profiler
  • pyCheckmol – Functional group detection

Installation Instructions

  1. Navigate to each dependency’s folder.
  2. Open the README.md file inside that folder.
  3. Follow the installation steps provided there.
  4. Verify that the tool is correctly installed and accessible in your environment

Datasets

We use publicly available protein–ligand complex datasets:

  • Leak-Proof PDBBind (LP-PDBBind)

    Repository: https://github.com/THGLab/LP-PDBBind

    First, clone the LP-PDBBind repository into the data/ directory:

    Next, download the processed data files from Zenodo: https://zenodo.org/records/18323765

    Place them into the data/LP-PDBBind directory and extract the downloaded files.

    After completing the above steps, the directory structure should look like this:

    LINKER/
      ├─ data/
      ├─── LP_PDBBind/
      ├────── complexes/
      ├────── ligands/
      ├────── proteins/
      ├────── ....
      ├────── LP_PDBBind.csv
      ├─ dataloader/
      ├─ ...
    
  • BindingDB 3D Complexes
    Please download the dataset from: https://www.bindingdb.org/rwd/data/surflex/surflex.tar Then extract it into your data/BindingDB directory. After completing the above steps, the directory structure should look like this:

    LINKER/
      ├─ data/
      ├─── BindingDB/
      ├────── 1A4H_GDM/
      ├────── 1A9U_SB2/
      ├────── ....
      ├─ dataloader/
      ├─ ...
    

Pipeline

1. Preprocessing

Preprocess raw BindingDB 3D complexes, including structure cleaning and filtering.

bash script/PDBBindPreprocessing.sh

Preprocess the PDBBind dataset and split it according to LP_PDBBind.

bash script/BindingDBPreprocessing.sh

2. Featurizer

Extract chemical and structural features from processed protein–ligand complexes, including functional group annotations and residue-level representations.

bash script/PDBBindFeaturizer.sh
bash script/BindingDBFeaturizer.sh

3. Dataloader

Construct datasets and dataloaders with batching, masking, and padding strategies for efficient model training.

bash script/Dataloader.sh

4. Run

Train the LINKER model on the prepared dataset and save checkpoints:

bash script/Run_LINKER.sh

Train the Binding Affinity model on the pretrained features and save checkpoints:

bash script/Run_Predictor.sh

Acknowledgement

  • PLIP: Protein-Ligand Interaction Profiler (PLIP)
@article{salentin2015plip,
  title={PLIP: fully automated protein--ligand interaction profiler},
  author={Salentin, Sebastian and Schreiber, Sven and Haupt, V Joachim and Adasme, Melissa F and Schroeder, Michael},
  journal={Nucleic acids research},
  volume={43},
  number={W1},
  pages={W443--W447},
  year={2015},
  publisher={Oxford University Press}
}
  • pyCheckmol: Application for detecting functional groups of a molecules

  • ESMC: ESM Cambrian creates representations of the underlying biology of proteins

@misc{esm2024cambrian,
  author = {{ESM Team}},
  title = {ESM Cambrian: Revealing the mysteries of proteins with unsupervised learning},
  year = {2024},
  publisher = {EvolutionaryScale Website},
  url = {https://evolutionaryscale.ai/blog/esm-cambrian},
  urldate = {2024-12-04}
}

If our work is useful, please cite it!

@inproceedings{
pham2025linker,
title={{LINKER}: Learning Interactions Between Functional Groups and Residues With Chemical Knowledge-Enhanced Reasoning and Explainability},
author={Phuc Pham and Viet Thanh Duy Nguyen and Truong-Son Hy},
booktitle={NeurIPS 2025 AI for Science Workshop},
year={2025},
url={https://openreview.net/forum?id=LsDdZUSVso}
}

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •