This project investigates transcriptional heterogeneity of peripheral blood T cells in Sjögren’s disease (SjD) using single-cell RNA sequencing (scRNA-seq) data from PBMC samples.
Rather than focusing on exhaustive cell-type annotation, the analysis is designed to identify disease-associated transcriptional programs while explicitly controlling for cellular state. The central question is how SjD reshapes T cell states at the transcriptional level, through gains and losses of specific functional programs.
The analysis follows a progressive, state-controlled strategy:
- establish robust quality control and global structure,
- focus specifically on T cells,
- identify disease-enriched and disease-depleted T cell subclusters,
- perform differential expression within comparable T cell states,
- interpret results through functional transcriptional axes rather than rigid cell identities.
This structure ensures that observed differences reflect disease-associated transcriptional changes rather than shifts in cell composition alone.
NB01 – Exploratory single-sample analysis
Initial exploration of one PBMC sample to inspect data quality, QC metrics, and clustering behavior.
This notebook defines biologically reasonable QC thresholds reused consistently throughout the project.
NB02 – Multi-sample global PBMC analysis
Integration of all PBMC samples (Healthy Donors and SjD patients) using shared QC thresholds.
Global PCA, neighborhood graph, UMAP, and Leiden clustering establish a reference PBMC atlas.
NB03 – T cell–focused analysis
Identification of T cell–enriched clusters using canonical markers.
T cells are subsetted from the global PBMC dataset and re-embedded.
This step reveals multiple T cell transcriptional states with uneven representation between HD and SjD, including both disease-enriched and disease-depleted subclusters.
NB04 – Differential expression within T cell subclusters
Differential gene expression analysis (Wilcoxon test) comparing SjD vs HD within individual T cell subclusters.
Two subclusters are analyzed in detail:
- cluster 3: enriched in SjD,
- cluster 0: depleted in SjD.
Ranked gene tables and visualization (dotplots, heatmaps) capture transcriptional differences while controlling for T cell state.
NB05 – Functional interpretation of DE genes
Non-computational synthesis of differential expression results.
Genes are grouped into broad functional axes such as:
- TCR signaling and activation,
- interferon response,
- transcriptional regulation,
- cytotoxicity and late activation.
This step contrasts gain- and loss-associated transcriptional programs between disease-enriched and disease-depleted T cell states.
scRNA_PBMC_SjD/
│
├── data/
│ └── raw/
│
├── figures/
│ ├── umap_NB03_Tcells_condition.png
│ ├── umap_NB03_Tcells_leiden.png
│ ├── dotplot_NB03_markers_dotplot.png
│ ├── dotplot_NB04_cluster0_dotplot.png
│ ├── dotplot_NB04_cluster3_dotplot.png
│ ├── heatmap_NB04_cluster0_heatmap.png
│ └── heatmap_NB04_cluster3_heatmap.png
│
├── notebooks/
│ ├── NB01_Explorative_GSM8023480_HD_1.ipynb
│ ├── NB02_multisample_Global_PBMC_Analysis.ipynb
│ ├── NB03_multisample_Tcells_Focused_Analysis.ipynb
│ ├── NB04_DE_HD_vs_SjD_Tcells.ipynb
│ └── NB05_Functional_Interpretation_of_DE_genes_Tcells.ipynb
│
├── results/
│ ├── adata_pbmc_global_qc.h5ad
│ ├── adata_tcells_subclustered.h5ad
│ ├── NB04_cluster0_SjD_vs_HD_rank_genes.csv
│ └── NB04_cluster3_SjD_vs_HD_rank_genes.csv
│
└── README.md
Sjögren’s disease is associated with a redistribution of T cell transcriptional states rather than a uniform activation signature. By comparing Healthy Donors and SjD patients within matched T cell subclusters, this analysis reveals complementary gain- and loss-associated transcriptional programs affecting T cell signaling, activation, and immune regulation.
All figures are pre-generated and stored in the figures/ directory to allow full inspection without executing the notebooks. Intermediate and final AnnData objects are saved in results/ to ensure reproducibility and transparency.
This analysis is based on publicly available single-cell RNA-seq data from peripheral blood mononuclear cells (PBMCs) of patients with Sjögren’s disease and healthy controls. Raw sequencing data are not included in this repository. All datasets used in this project are publicly available through GEO (GSE253568).
Associated publication:
Regulatory T cells and IFN-γ-producing Th1 cells play a critical role in the pathogenesis of Sjögren’s Syndrome
McDermott M, Wenyi L, Wang Y, Feske S et al. 2024
Data repository:
NCBI Gene Expression Omnibus (GEO)
Accession number: GSE253568
The dataset was reanalyzed here to study disease-associated transcriptional programs in T cells, with a focus on gain- and loss-associated cellular states rather than exhaustive cell-type annotation.