LINKER: Learning Interactions Between Functional Groups and Residues with Chemical Knowledge-Enhanced Reasoning and Explainability

LINKER is a framework for modeling and explaining protein–ligand interactions by explicitly learning interactions between ligand functional groups and protein residues. The method integrates chemical knowledge, structural information, and deep learning to improve interpretability in structure-based drug discovery.

Preliminary versions of this work were presented at NeurIPS 2025 workshops:

AI for Science: https://openreview.net/pdf?id=LsDdZUSVso
Multi-modal Foundation Models and Large Language Models for Life Sciences: https://openreview.net/pdf?id=En4Q41ZA3T
Machine Learning and the Physical Sciences: https://ml4physicalsciences.github.io/2025/files/NeurIPS_ML4PS_2025_102.pdf

LINKER Architecture

Codeflow

Environment Setup

First, create the Conda environment required to run LINKER.
This will install all Python libraries and core dependencies needed for the pipeline.

conda env create -f environment.yml
conda activate linker

If you prefer using pip instead of Conda, you can install the required packages with:

pip install -r requirements.txt

External Dependencies

In addition to the Python environment above, LINKER relies on several external tools that must be installed separately. Since each dependency has its own installation procedure, please install them individually by following the instructions provided in the README.md file inside each corresponding folder.

Required Tools

PLIP – Protein–Ligand Interaction Profiler
pyCheckmol – Functional group detection

Installation Instructions

Navigate to each dependency’s folder.
Open the README.md file inside that folder.
Follow the installation steps provided there.
Verify that the tool is correctly installed and accessible in your environment

Datasets

We use publicly available protein–ligand complex datasets:

Leak-Proof PDBBind (LP-PDBBind)

Repository: https://github.com/THGLab/LP-PDBBind

First, clone the LP-PDBBind repository into the data/ directory:

Next, download the processed data files from Zenodo: https://zenodo.org/records/18323765

Place them into the data/LP-PDBBind directory and extract the downloaded files.

After completing the above steps, the directory structure should look like this:
```
LINKER/
  ├─ data/
  ├─── LP_PDBBind/
  ├────── complexes/
  ├────── ligands/
  ├────── proteins/
  ├────── ....
  ├────── LP_PDBBind.csv
  ├─ dataloader/
  ├─ ...
```
BindingDB 3D Complexes
Please download the dataset from: https://www.bindingdb.org/rwd/data/surflex/surflex.tar Then extract it into your data/BindingDB directory. After completing the above steps, the directory structure should look like this:
```
LINKER/
  ├─ data/
  ├─── BindingDB/
  ├────── 1A4H_GDM/
  ├────── 1A9U_SB2/
  ├────── ....
  ├─ dataloader/
  ├─ ...
```

Pipeline

1. Preprocessing

Preprocess raw BindingDB 3D complexes, including structure cleaning and filtering.

bash script/PDBBindPreprocessing.sh

Preprocess the PDBBind dataset and split it according to LP_PDBBind.

bash script/BindingDBPreprocessing.sh

2. Featurizer

Extract chemical and structural features from processed protein–ligand complexes, including functional group annotations and residue-level representations.

bash script/PDBBindFeaturizer.sh
bash script/BindingDBFeaturizer.sh

3. Dataloader

Construct datasets and dataloaders with batching, masking, and padding strategies for efficient model training.

bash script/Dataloader.sh

4. Run

Train the LINKER model on the prepared dataset and save checkpoints:

bash script/Run_LINKER.sh

Train the Binding Affinity model on the pretrained features and save checkpoints:

bash script/Run_Predictor.sh

Acknowledgement

PLIP: Protein-Ligand Interaction Profiler (PLIP)

@article{salentin2015plip,
  title={PLIP: fully automated protein--ligand interaction profiler},
  author={Salentin, Sebastian and Schreiber, Sven and Haupt, V Joachim and Adasme, Melissa F and Schroeder, Michael},
  journal={Nucleic acids research},
  volume={43},
  number={W1},
  pages={W443--W447},
  year={2015},
  publisher={Oxford University Press}
}

pyCheckmol: Application for detecting functional groups of a molecules
ESMC: ESM Cambrian creates representations of the underlying biology of proteins

@misc{esm2024cambrian,
  author = {{ESM Team}},
  title = {ESM Cambrian: Revealing the mysteries of proteins with unsupervised learning},
  year = {2024},
  publisher = {EvolutionaryScale Website},
  url = {https://evolutionaryscale.ai/blog/esm-cambrian},
  urldate = {2024-12-04}
}

If our work is useful, please cite it!

@inproceedings{
pham2025linker,
title={{LINKER}: Learning Interactions Between Functional Groups and Residues With Chemical Knowledge-Enhanced Reasoning and Explainability},
author={Phuc Pham and Viet Thanh Duy Nguyen and Truong-Son Hy},
booktitle={NeurIPS 2025 AI for Science Workshop},
year={2025},
url={https://openreview.net/forum?id=LsDdZUSVso}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LINKER: Learning Interactions Between Functional Groups and Residues with Chemical Knowledge-Enhanced Reasoning and Explainability

LINKER Architecture

Codeflow

Environment Setup

External Dependencies

Required Tools

Installation Instructions

Datasets

Pipeline

1. Preprocessing

2. Featurizer

3. Dataloader

4. Run

Acknowledgement

If our work is useful, please cite it!

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
assets		assets
dataloader		dataloader
evaluation		evaluation
featurizer		featurizer
model		model
plip		plip
preprocessing		preprocessing
pyCheckmol		pyCheckmol
run		run
script		script
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
utils.py		utils.py

HySonLab/LINKER

Folders and files

Latest commit

History

Repository files navigation

LINKER: Learning Interactions Between Functional Groups and Residues with Chemical Knowledge-Enhanced Reasoning and Explainability

LINKER Architecture

Codeflow

Environment Setup

External Dependencies

Required Tools

Installation Instructions

Datasets

Pipeline

1. Preprocessing

2. Featurizer

3. Dataloader

4. Run

Acknowledgement

If our work is useful, please cite it!

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages