A fast and accurate machine learning model built on PyTorch for predicting protein chemical shifts from PDB structures and molecular dynamics trajectories. This model is designed to be run on either CPUs or GPUs.
- Clone the repository using
--recursiveto include the required submodule:
git clone --recursive https://github.com/roitberg-group/legolas.git
cd legolas- Set up a conda environment: Use the provided legolas_env.yaml file to create the conda environment:
conda env create -n legolas -f legolas_env.yaml
conda activate legolasVersions when installing using conda:
- python 3.10
- cuda 11.8
- pytorch 2.5.1
- Install TorchANI:
Within external/internal-legolas-aev:
LEGOLAS is most efficient when run using the torchani compiled cuAEV extension, but it is not required.
You have two options, depending on whether you want to install the torchani compiled extensions. To install torchani with no compiled extensions run:
pip install --no-deps -v .To install torchani with the cuAEV compiled extension run instead:
# Use 'ext-all-sms' instead of 'ext' if you want to build for all possible GPUs
pip install --config-settings=--global-option=ext --no-build-isolation --no-deps -v .In both cases you can add the editable, -e, flag after the verbose, -v,
flag if you want an editable install (for developers). The -v flag can of
course be omitted, but it is sometimes handy to have some extra information
about the installation process.
# atypes = HA, H, CA, CB, C, N
python legolas.py {coordinates_file(s)} [-b {BATCH_SIZE}] [-atype {INTERESTED_ATYPES}] [-t {TOPOLOGY}] [-o {OUTPUT_FILETYPE}]# All atom types:
python legolas.py data/A001_1KF3A.pdbH
# Specfy atom types:
python legolas.py data/A001_1KF3A.pdbH -atype H,C,N
# Run on molecular dynamics trajectory:
python legolas.py data/{trajectory_file}.nc -t data/{topology_file}.parm7
# Specify output file type ("csv", "parquet", "pdbcs", "all", default=all)
# pdbcs output file type is only available for PDB inputs (not trajectories)
python legolas.py data/A001_1KF3A.pdbH -o csv,pdbcs| Column Name | Description |
|---|---|
ATOM_TYPE |
Atom type: N, CA, CB, C, HA, H |
SEQ_ID |
Residue sequence number |
RES_TYPE |
Three-letter code for the 20 standard amino acids |
CHEMICAL_SHIFT |
Predicted chemical shift (average over 5 models) |
CHEMICAL_SHIFT_STD |
Standard deviation across 5 models (lower std = higher confidence) |
If you find a bug or have some feature request, please feel free to open an issue on GitHub or send us a pull request.
This project is licensed under the MIT License.
Please cite the following paper if you use LEGOLAS:
Mikayla Y. Darrows, Dimuthu Kodituwakku, Jinze Xue, Ignacio Pickering, Nicholas S. Terrel, Adrian E. Roitberg. LEGOLAS: a Machine Learning method for rapid and accurate predictions of protein NMR chemical shifts.
J. Chem. Theory Comput. 2025, 21, 8, 4266–4275
