A computational toolkit for systematic antigen-specificity inference from single-cell TCR and transcriptomic data.
cell2specificity integrates T cell state annotation, TCR clonotype analysis, HLA genotype inference, pathogen exposure inference, and structure-informed modeling of TCR-peptide-HLA binding. It is the companion software to:
Dratva et al. (2026) Single-cell analysis of human T cells across infections unlocks systematic antigen-specificity inference.
Authors: Lisa M. Dratva, Yizhou Yu, Elizaveta K. Vlasova, Min Gyu Im, Krzysztof Polanski, Maximilian Alexandrov, Lisa M. Milchsack, Rakeshlal Kapuge, Alexander V. Predeus, Mikhail Shugay, Lorenz Kretschmer, and Sarah A. Teichmann.
Starting from scRNA+TCR-seq data, the toolkit enables:
- T cell state annotation: predict T cell states using bundled CellTypist models trained on the atlas
- TCR motif discovery: group clonotypes into shared-specificity clusters
- TCR motif annotation: query VDJdb database for antigen specificity, classify MAIT and iNKT cells from V/J gene usage
- Fast TCR matching: map new repertoires onto atlas motifs
- HLA genotype inference: impute HLA alleles from MHC-restricted public TCR motifs
- Pathogen exposure inference: predict donor infection history from TCR motif composition
git clone https://github.com/lisadratva/cell2specificity.git
cd cell2specificity
pip install -e ".[dev]"For TCR motif inference (requires tcrdist3):
pip install -e ".[motifs]"Python ≥ 3.10 required.
import pandas as pd
from cell2specificity.tcr_motifs import preprocess_tcr_table, annotate_invariant
from cell2specificity.motif_based_inference import build_donor_motif_matrix, predict_pathogen_exposure, predict_hla_type
from cell2specificity.annotation import annotate
# 1. Annotate cell states with bundled CellTypist models
predictions = annotate(adata, model="paninfection_level2")
adata = predictions.to_adata()
# 2. Preprocess VDJ table and annotate invariant T cells
df = preprocess_tcr_table(pd.read_csv("my_tcr_data.csv"))
df = annotate_invariant(df)
# 3. Build donor × motif matrix and run inference
dmm = build_donor_motif_matrix(df)
exposure = predict_pathogen_exposure(dmm, threshold=2) # double-hit rule
hla = predict_hla_type(dmm)→ See the full step-by-step walkthrough in docs/tutorial.md
Three CellTypist models trained on the pan-infection T cell atlas are shipped with the package:
| Alias | Description |
|---|---|
paninfection_level2 |
Broad T cell lineages (CD4, CD8, MAIT, iNKT, γδ, NKT) |
paninfection_CD4_level3 |
Fine-grained CD4 T cell subtypes (29 states) |
paninfection_CD8_level3 |
Fine-grained CD8 T cell subtypes (12 states) |
Two reference tables for clinical inference are bundled under
src/cell2specificity/motif_based_inference/data/:
disease_associated_motifs_hla.csv— motif → predicted pathogendf_motifs_with_hla.csv— motif → MHC-I restricted HLA allele + metadata
src/cell2specificity/
├── tcr_motifs/ # Preprocessing, motif inference and annotation, seed-and-extend matching, VDJdb queries
│ ├── _preprocess.py
│ ├── _invariant.py
│ └── _matching.py
├── motif_based_inference/ # Pathogen exposure + HLA inference from motif composition
│ ├── _predict.py
│ └── data/ # Bundled reference CSVs
├── annotation/ # CellTypist wrapper + bundled pan-infection models
│ └── models/ # .pkl model files
└── utils/ # Shared helpers
tests/
├── data/ # Test data subset for self-contained testing
└── test_*.py
docs/
└── tutorial.md # Full worked tutorial
pytest tests/ -vAll tests run against the toy dataset in tests/data/ — no external data or
compute required.
See CONTRIBUTING.md. New modules follow the src/ layout
and must include tests and docstrings.
Apache 2.0 — see LICENSE. Copyright 2026 Lisa M. Dratva