Skip to content

📚 Tools and databases for analyzing HLA and VDJ genes.

License

Notifications You must be signed in to change notification settings

slowkow/awesome-vdj

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

awesome-vdj

Antigen presentation and recognition is central to immunology. HLA genes encode the proteins that present antigens. VDJ genes encode the receptors: T cell receptors (TCRs) in T cells and the repertoires of antibodies/immunoglobulins in B cells.

Here, researchers can find links to tools and resources for computational analysis of HLA and VDJ data.

Contributions are welcome!

CI

Table of Contents

Related Work


📚 Literature


🗄️ VDJ Databases

Structure Databases

Specificity Databases

Sequence Repositories

  • iReceptor — iReceptor facilitates the curation, analysis and sharing of antibody/B-cell and T-cell receptor repertoires (Adaptive Immune Receptor Repertoire or AIRR-seq data) from multiple labs and institution...
    PubMed · 🪝 133 · Homepage

  • A Public Database of Memory and Naive B-Cell Receptor Sequences — We present a public database of more than 37 million unique BCR sequences from three healthy adult donors that is many fold deeper than any existing resource, together with a set of online tools de...
    PubMed · 🪝 104 · Homepage

  • immuneACCESS — Dive into the world’s largest collection of TCR and BCR sequences. Easily incorporate millions of sequences worth of public data into your next papers and projects using immunoSEQ Analyzer. Constru...
    Docs · Homepage

  • PIRD: Pan immune repertoire database — Pan immune repertoire database (PIRD) collects raw and processed sequences of immunoglobulins (IGs) and T cell receptors (TCRs) of human and other vertebrate species with different phenotypes. You ...
    PubMed · Homepage

Standards & Resources


🔬 VDJ Analysis

Single-Cell

  • TRUST4: TCR and BCR assembly from RNA-seq data — Tcr Receptor Utilities for Solid Tissue (TRUST) is a computational tool to analyze TCR and BCR sequences using unselected RNA sequencing data, profiled from solid tissues, including tumors. TRUST4 ...
    PubMed · 🪝 227 · ⭐ 337 · C C++ Perl

  • Scirpy: a Scanpy extension for analyzing single-cell T-cell receptor-sequencing data — A scalable Python toolkit that provides simplified access to the analysis and visualization of immune repertoires from single cells and seamless integration with transcriptomic data.
    PubMed · 🪝 212 · ⭐ 243 · Homepage · Python

  • scRepertoire: A toolkit for single-cell immune profiling — R package for analyzing and visualizing single-cell immune receptor data. This new version introduces an array of features designed to enhance both the depth and breadth of immune receptor analysis...
    PubMed · 🪝 9 · ⭐ 358 · R

  • DeepTCR: Deep Learning Methods for Parsing T-Cell Receptor Sequencing (TCRSeq) Data — DeepTCR is a python package that has a collection of unsupervised and supervised deep learning methods to parse TCRSeq data. It has the added functionality of being able to analyze paired alpha/bet...
    PubMed · 🪝 217 · ⭐ 123 · Python

  • dandelion — dandelion - A single cell BCR/TCR V(D)J-seq analysis package for 10X Chromium 5' data
    PubMed · 🪝 44 · ⭐ 122 · Homepage · Python

  • STARTRAC — STARTRAC(Single T-cell Analysis by Rna-seq and Tcr TRACking)
    PubMed · 🪝 27 · ⭐ 114 · HTML

  • TCRGP — TCRGP is a novel Gaussian process method that can predict if TCRs recognize certain epitopes. This method can utilize different CDR sequences from both TCRα and TCRβ chains from single-cell data an...
    PubMed · 🪝 109 · ⭐ 30 · Python

  • CONGA: Clonotype Neighbor Graph Analysis — CONGA was developed to detect correlation between T cell gene expression profile and TCR sequence in single-cell datasets.
    Paper · 🪝 9 · ⭐ 93 · Python

  • airrflow — B-cell and T-cell Adaptive Immune Receptor Repertoire (AIRR) sequencing analysis pipeline using the Immcantation framework
    PubMed · 🪝 10 · ⭐ 73 · Homepage · Nextflow

  • Platypus — R package for the analysis of single-cell immune repertoires
    PubMed · 🪝 40 · ⭐ 43 · R

  • mvTCR — A multi-view Variational Autoencoder (mvTCR) to jointly embed transcriptomic and TCR sequence information at a single-cell level to better capture the phenotypic behavior of T cells.
    PubMed · 🪝 17 · ⭐ 56 · Homepage · Python

  • enclone — enclone is standalone software (primarily written in Rust) developed by 10x Genomics for analysis of single cell TCR and BCR sequences. enclone performs SHM-aware clonotyping, phylogenetic/lineage ...
    50 · Homepage · Rust

  • covid19 — Regularly updated list of publicly available datasets with single-cell (scRNAseq) and T-cell/antibody immune repertoire (AIRR / RepSeq / immunosequencing) data of COVID-19 patients with SARS-CoV-2.
    46

  • TCRconvert — TCRconvert converts T cell receptor (TCR) gene names between the 10X, Adaptive, and IMGT naming conventions. It supports alpha-beta and gamma-delta TCRs for human, mouse, and rhesus macaque.
    15 · Python

  • TCRconvertR — TCRconvertR converts T cell receptor (TCR) gene names between the 10X, Adaptive, and IMGT naming conventions. It supports alpha-beta and gamma-delta TCRs for human, mouse, and rhesus macaque.
    6 · R

Repertoire Analysis

  • VDJtools — A comprehensive analysis framework for T-cell and B-cell repertoire sequencing data
    PubMed · 🪝 529 · ⭐ 142 · Java Groovy

  • immunarch: An R Package for Painless Bioinformatics Analysis of T-cell and B-cell Immune Repertoire Data — immunarch is an R package designed to analyse T-cell receptor (TCR) and B-cell receptor (BCR) repertoires, aimed at medical scientists and bioinformaticians. The mission of immunarch is to make imm...
    334 · R

  • msm: Max Snippet Model — Improved statistical classifier for immune repertoires
    PubMed · 🪝 8 · ⭐ 177 · Python

  • DeepRC — DeepRC: Immune repertoire classification with attention-based deep massive multiple instance learning
    124 · Python

  • Recon: Reconstruction of Estimated Communities from Observed Numbers — Recon uses the distribution of species counts in a sample to estimate the distribution of species counts in the population from which the sample was drawn.
    PubMed · 🪝 91 · ⭐ 14 · Python R

  • dkm: Dynamic Kernel Matching — DKM is analogous to a convolutional network, but for sequences. Consider the problem of classifying a sequence. Because some sequences are longer than others, the number of features is irregular. G...
    94 · Python

  • immuneML — immuneML is a platform for machine learning analysis of adaptive immune receptor repertoire data.
    73 · Homepage · Python

  • abstar — VDJ assignment and antibody sequence annotation. Scalable from a single sequence to billions of sequences.
    44 · Pkl

  • vdjer — V'DJer - B Cell Receptor Repertoire Reconstruction from short read mRNA-Seq data
    29 · C

  • CATT — An ultra-sensitive and precise tool for characterizing T cell CDR3 sequences in TCR-seq and RNA-seq data.
    21 · Julia

  • epitopefindr — R package to BLAST peptide sequences against each other and identify the minimal overlap of aligning regions.
    16 · Homepage · R

Sequence Processing

Clustering & Similarity

Epitope Prediction

  • neoantigens — Exploring novel tumor epitope identification
    PubMed · 🪝 624 · ⭐ 37 · Python

  • epitopepredict — Python package and command line tool for epitope prediction
    PubMed · 🪝 6 · ⭐ 52 · Jupyter Notebook

  • MuPeXI — MuPeXI: the mutant peptide extractor and informer, a tool for predicting neo-epitopes from tumor sequencing data.
    52 · Python

  • epitopeprediction — A bioinformatics best-practice analysis pipeline for epitope prediction and annotation
    49 · Homepage · Nextflow

  • pyrepseq — Python library for immune repertoire analysis
    PubMed · 🪝 29 · ⭐ 17 · Python

  • MixTCRpred — Predictor of TCR-epitope interactions
    34 · Python

  • topiary — Predict mutated T-cell epitopes from sequencing data
    30 · Python

  • AsEP-dataset — NeurIPS 2024 Dataset and Benchmark Submission "AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction"
    30 · Jupyter Notebook

  • EpiDope — Prediction of B-cell epitopes from amino acid sequences using deep neural networks.
    PubMed · 🪝 11 · ⭐ 18 · Python

  • Repitope — Epitope immunogenicity prediction through in silico TCR-peptide contact potential profiling.
    25 · R

  • ImRex — Generic TCR-epitope recognition prediction using CNN approach on both known and novel epitopes
    17 · Jupyter Notebook

Structure & Modeling

  • TITAN - Tcr epITope bimodal Attention Networks — a bimodal neural network that explicitly encodes both TCR sequences and epitopes to enable the independent study of generalization capabilities to unseen TCRs and/or epitopes.
    PubMed · 🪝 150 · ⭐ 30 · Python

  • TCRdock — Python tools for TCR:peptide-MHC modeling and analysis: - Set up and run TCR-specialized AlphaFold simulations starting from a TSV file with TCR, peptide, and MHC information. - Parse a TCR:peptide...
    PubMed · 🪝 93 · ⭐ 86 · Python

  • Absolut: Unconstrained lattice antibody-antigen bindings generator - One tool to simulate them all! — Absolut! is a database and C++ user interface that allows the high-throughput computation for the 3D-lattice binding of any CDRH3 sequence to any antigen, enabling the custom generation of new anti...
    Paper · 🪝 20 · ⭐ 111 · C++

  • tcr-bert — TCR-BERT is a large language model trained on T-cell receptor sequences, built using a lightly modified BERT architecture with tweaked pre-training objectives.
    Paper · 🪝 74 · ⭐ 57 · Python

  • TCRmodel2: high-resolution modeling of T cell receptor recognition using deep learning — This method, named TCRmodel2, allows users to submit sequences through an easy-to-use interface and shows similar or greater accuracy than AlphaFold and other methods to model TCR–peptide–MHC compl...
    PubMed · 🪝 70 · ⭐ 45 · Python R

  • vampire: Deep generative models for TCR sequences — Fit and test variational autoencoder (VAE) models for T cell receptor sequences.
    PubMed · 🪝 66 · ⭐ 17 · Python

  • TEINet — TEINet: a deep learning framework for prediction of TCR-epitope binding specificity
    PubMed · 🪝 55 · ⭐ 16 · Python

  • TEIM — TEIM: TCR-Epitope Interaction Modeling
    55 · Python

  • TCRconv — TCRconv is a deep learning model for predicting recognition between T cell receptors and epitopes. It uses protBERT embeddings for the TCRs and convolutional neural networks for the prediction.
    PubMed · 🪝 20 · ⭐ 26 · Python R

  • compairr — Comparison of Adaptive Immune Receptor Repertoires
    PubMed · 🪝 15 · ⭐ 28 · C++


🗃️ HLA Databases


🧬 HLA Analysis

Association Studies

  • BIGDAWG: Case-Control Analysis of Multi-Allelic Loci — Data sets and functions for chi-squared Hardy-Weinberg and case-control association tests of highly polymorphic genetic data [e.g., human leukocyte antigen (HLA) data]. Performs association tests a...
    PubMed · 🪝 82 · ⭐ 3 · R

  • HLA_analyses_tutorial — A thorough tutorial on HLA imputation and association, accompanying our manuscript "Tutorial: A statistical genetics guide to identifying HLA alleles driving complex disease"
    70 · Jupyter Notebook

  • HLA-TAPAS: HLA-Typing At Protein for Association Studies — An HLA-focused pipeline that can handle HLA reference panel construction (MakeReference), HLA imputation (SNP2HLA), and HLA association (HLAassoc). It is an updated version of the SNP2HLA.
    54 · Python R

  • HLA Electrostatic Potential — A method for predicting humoral alloimmunity from differences in donor and recipient HLA surface electrostatic potential, enabling assessment of immunological compatibility in transplantation.
    PubMed · 🪝 53

  • HATK: HLA Analysis Toolkit — HATK(HLA Analysis Tool-Kit) is a collection of tools and modules to perform HLA fine-mapping analysis, which is to identify which HLA allele or amino acid position of the HLA gene is driving the di...
    PubMed · 🪝 15 · ⭐ 28 · Python

  • MATER: Minimizer RNAseq HLA typer — MATER is a minimizer-based HLA typer for RNAseq read dataset. In a typical RNAseq dataset, the reads sampled from HLA genes are less uniform and may miss regions that makes assembly or variant call...
    PubMed · 🪝 24 · ⭐ 14 · Python R C

  • PyHLA — Python for HLA analysis: summary, association analysis, zygosity test and interaction test
    PubMed · ⭐ 38 · Python

  • cdr3-QTL — Trans-association between HLA and TCR-CDR3
    19 · HTML

  • hlabud: HLA genotype analysis in R — hlabud provides methods to retrieve sequence alignment data from IMGTHLA and convert the data into convenient R matrices ready for downstream analysis. See the usage examples to learn how to use th...
    17 · R

HLA Typing

Peptide Prediction

  • HLAMatchmaker — A molecularly based algorithm for histocompatibility determination that identifies acceptable HLA antigens for highly alloimmunized patients based on amino acid triplets (eplets) on exposed parts o...
    PubMed · 🪝 267

  • High-Throughput Prediction of MHC Class I and II Neoantigens with MHCnuggets — MHC Class I and Class II neoantigen binding prediction
    PubMed · 🪝 131 · ⭐ 33 · Python

  • NeoBert — NeoBERT is an advanced model designed specifically for predicting the binding affinity between neoantigens and HLA. It is a variant of the original BERT model, enhanced to integrate biological feat...
    PubMed · ⭐ 155 · Python

  • bigmhc — BigMHC predicts MHC-I (neo)epitope presentation and immunogenicity
    PubMed · 🪝 60 · ⭐ 59 · Jupyter Notebook

  • HLA-EMMA — A user-friendly tool to analyze HLA class I and class II compatibility on the amino acid level, facilitating the assessment of donor-recipient compatibility in transplantation.
    PubMed · 🪝 82

  • PIRCHE-II — An algorithm to predict indirectly recognizable HLA epitopes in solid organ transplantation, helping to evaluate immunological compatibility between donors and recipients.
    PubMed · 🪝 81 · Homepage

  • MHCAttnNet — MHCAttnNet: Allele-Peptide predictions for class I & class II MHC alleles
    PubMed · 🪝 39 · ⭐ 30 · Python

  • MixMHC2pred — HLA-II ligand predictor.
    PubMed · 🪝 4 · ⭐ 46 · C++

  • MixMHCpred — HLA-I ligand predictor
    43 · Python

  • EpVix: epitope reactivity analysis and epitope virtual crossmatching — Performs automated epitope virtual crossmatching at the initiation of the organ donation process. EpViX is a free, web-based application developed for use over the internet on a tablet, smartphone ...
    PubMed · 🪝 12 · Ruby

  • immunogenetr — immunogenetr is a comprehensive toolkit for clinical HLA informatics. It is built on tidyverse principles and makes use of genotype list string (GL string, https://glstring.org/) for storing and us...
    PubMed · ⭐ 6 · Homepage · R

Data & Nomenclature

  • MHC-PRG — Population Reference Graphs for the HLA and MHC.
    35 · C++

  • py-ard — HLA ARD Reduction in Python. Although HLA nomenclature has not always conformed to the same standard, it is now defined by The WHO Nomenclature Committee for Factors of the HLA System. py-ard is aw...
    19 · Python

  • HLAtools: Functions and Datasets for HLA Informatics — We have developed HLAtools, an R package that automates the consumption of IPD-IMGT/HLA resources, renders them computable, and makes them available alongside tools for data analysis, visualization...
    PubMed · 🪝 1 · ⭐ 4 · R


Contributors