Skip to content

Latest commit

 

History

History
125 lines (85 loc) · 4.22 KB

File metadata and controls

125 lines (85 loc) · 4.22 KB

cell2specificity

A computational toolkit for systematic antigen-specificity inference from single-cell TCR and transcriptomic data.

cell2specificity integrates T cell state annotation, TCR clonotype analysis, HLA genotype inference, pathogen exposure inference, and structure-informed modeling of TCR-peptide-HLA binding. It is the companion software to:

Dratva et al. (2026) Single-cell analysis of human T cells across infections unlocks systematic antigen-specificity inference.

Authors: Lisa M. Dratva, Yizhou Yu, Elizaveta K. Vlasova, Min Gyu Im, Krzysztof Polanski, Maximilian Alexandrov, Lisa M. Milchsack, Rakeshlal Kapuge, Alexander V. Predeus, Mikhail Shugay, Lorenz Kretschmer, and Sarah A. Teichmann.

Overview

Starting from scRNA+TCR-seq data, the toolkit enables:

  • T cell state annotation: predict T cell states using bundled CellTypist models trained on the atlas
  • TCR motif discovery: group clonotypes into shared-specificity clusters
  • TCR motif annotation: query VDJdb database for antigen specificity, classify MAIT and iNKT cells from V/J gene usage
  • Fast TCR matching: map new repertoires onto atlas motifs
  • HLA genotype inference: impute HLA alleles from MHC-restricted public TCR motifs
  • Pathogen exposure inference: predict donor infection history from TCR motif composition

Installation

git clone https://github.com/lisadratva/cell2specificity.git
cd cell2specificity
pip install -e ".[dev]"

For TCR motif inference (requires tcrdist3):

pip install -e ".[motifs]"

Python ≥ 3.10 required.

Quickstart

import pandas as pd
from cell2specificity.tcr_motifs import preprocess_tcr_table, annotate_invariant
from cell2specificity.motif_based_inference import build_donor_motif_matrix, predict_pathogen_exposure, predict_hla_type
from cell2specificity.annotation import annotate

# 1. Annotate cell states with bundled CellTypist models
predictions = annotate(adata, model="paninfection_level2")
adata = predictions.to_adata()

# 2. Preprocess VDJ table and annotate invariant T cells
df = preprocess_tcr_table(pd.read_csv("my_tcr_data.csv"))
df = annotate_invariant(df)

# 3. Build donor × motif matrix and run inference
dmm      = build_donor_motif_matrix(df)
exposure = predict_pathogen_exposure(dmm, threshold=2)  # double-hit rule
hla      = predict_hla_type(dmm)

→ See the full step-by-step walkthrough in docs/tutorial.md

Bundled models and reference data

Three CellTypist models trained on the pan-infection T cell atlas are shipped with the package:

Alias Description
paninfection_level2 Broad T cell lineages (CD4, CD8, MAIT, iNKT, γδ, NKT)
paninfection_CD4_level3 Fine-grained CD4 T cell subtypes (29 states)
paninfection_CD8_level3 Fine-grained CD8 T cell subtypes (12 states)

Two reference tables for clinical inference are bundled under src/cell2specificity/motif_based_inference/data/:

  • disease_associated_motifs_hla.csv — motif → predicted pathogen
  • df_motifs_with_hla.csv — motif → MHC-I restricted HLA allele + metadata

Package structure

src/cell2specificity/
├── tcr_motifs/              # Preprocessing, motif inference and annotation, seed-and-extend matching, VDJdb queries
│   ├── _preprocess.py
│   ├── _invariant.py
│   └── _matching.py
├── motif_based_inference/   # Pathogen exposure + HLA inference from motif composition
│   ├── _predict.py
│   └── data/                # Bundled reference CSVs
├── annotation/              # CellTypist wrapper + bundled pan-infection models
│   └── models/              # .pkl model files
└── utils/                   # Shared helpers
tests/
├── data/           # Test data subset for self-contained testing
└── test_*.py
docs/
└── tutorial.md     # Full worked tutorial

Running tests

pytest tests/ -v

All tests run against the toy dataset in tests/data/ — no external data or compute required.

Contributing

See CONTRIBUTING.md. New modules follow the src/ layout and must include tests and docstrings.

License

Apache 2.0 — see LICENSE. Copyright 2026 Lisa M. Dratva