LINKER: Learning Interactions Between Functional Groups and Residues with Chemical Knowledge-Enhanced Reasoning and Explainability
LINKER is a framework for modeling and explaining protein–ligand interactions by explicitly learning interactions between ligand functional groups and protein residues. The method integrates chemical knowledge, structural information, and deep learning to improve interpretability in structure-based drug discovery.
Preliminary versions of this work were presented at NeurIPS 2025 workshops:
- AI for Science: https://openreview.net/pdf?id=LsDdZUSVso
- Multi-modal Foundation Models and Large Language Models for Life Sciences: https://openreview.net/pdf?id=En4Q41ZA3T
- Machine Learning and the Physical Sciences: https://ml4physicalsciences.github.io/2025/files/NeurIPS_ML4PS_2025_102.pdf
First, create the Conda environment required to run LINKER.
This will install all Python libraries and core dependencies needed for the pipeline.
conda env create -f environment.yml
conda activate linkerIf you prefer using pip instead of Conda, you can install the required packages with:
pip install -r requirements.txtIn addition to the Python environment above, LINKER relies on several external tools that must be installed separately. Since each dependency has its own installation procedure, please install them individually by following the instructions provided in the README.md file inside each corresponding folder.
- PLIP – Protein–Ligand Interaction Profiler
- pyCheckmol – Functional group detection
- Navigate to each dependency’s folder.
- Open the
README.mdfile inside that folder. - Follow the installation steps provided there.
- Verify that the tool is correctly installed and accessible in your environment
We use publicly available protein–ligand complex datasets:
-
Leak-Proof PDBBind (LP-PDBBind)
Repository: https://github.com/THGLab/LP-PDBBind
First, clone the LP-PDBBind repository into the data/ directory:
Next, download the processed data files from Zenodo: https://zenodo.org/records/18323765
Place them into the data/LP-PDBBind directory and extract the downloaded files.
After completing the above steps, the directory structure should look like this:
LINKER/ ├─ data/ ├─── LP_PDBBind/ ├────── complexes/ ├────── ligands/ ├────── proteins/ ├────── .... ├────── LP_PDBBind.csv ├─ dataloader/ ├─ ... -
BindingDB 3D Complexes
Please download the dataset from: https://www.bindingdb.org/rwd/data/surflex/surflex.tar Then extract it into your data/BindingDB directory. After completing the above steps, the directory structure should look like this:LINKER/ ├─ data/ ├─── BindingDB/ ├────── 1A4H_GDM/ ├────── 1A9U_SB2/ ├────── .... ├─ dataloader/ ├─ ...
Preprocess raw BindingDB 3D complexes, including structure cleaning and filtering.
bash script/PDBBindPreprocessing.shPreprocess the PDBBind dataset and split it according to LP_PDBBind.
bash script/BindingDBPreprocessing.shExtract chemical and structural features from processed protein–ligand complexes, including functional group annotations and residue-level representations.
bash script/PDBBindFeaturizer.sh
bash script/BindingDBFeaturizer.shConstruct datasets and dataloaders with batching, masking, and padding strategies for efficient model training.
bash script/Dataloader.shTrain the LINKER model on the prepared dataset and save checkpoints:
bash script/Run_LINKER.shTrain the Binding Affinity model on the pretrained features and save checkpoints:
bash script/Run_Predictor.sh- PLIP: Protein-Ligand Interaction Profiler (PLIP)
@article{salentin2015plip,
title={PLIP: fully automated protein--ligand interaction profiler},
author={Salentin, Sebastian and Schreiber, Sven and Haupt, V Joachim and Adasme, Melissa F and Schroeder, Michael},
journal={Nucleic acids research},
volume={43},
number={W1},
pages={W443--W447},
year={2015},
publisher={Oxford University Press}
}-
pyCheckmol: Application for detecting functional groups of a molecules
-
ESMC: ESM Cambrian creates representations of the underlying biology of proteins
@misc{esm2024cambrian,
author = {{ESM Team}},
title = {ESM Cambrian: Revealing the mysteries of proteins with unsupervised learning},
year = {2024},
publisher = {EvolutionaryScale Website},
url = {https://evolutionaryscale.ai/blog/esm-cambrian},
urldate = {2024-12-04}
}@inproceedings{
pham2025linker,
title={{LINKER}: Learning Interactions Between Functional Groups and Residues With Chemical Knowledge-Enhanced Reasoning and Explainability},
author={Phuc Pham and Viet Thanh Duy Nguyen and Truong-Son Hy},
booktitle={NeurIPS 2025 AI for Science Workshop},
year={2025},
url={https://openreview.net/forum?id=LsDdZUSVso}
}
