This project presents an exploratory single-cell RNA-seq analysis of human lung adenocarcinoma (LUAD) with a specific focus on T cell distribution and states across tumor-related tissue contexts.
The analysis is based on a publicly available dataset and is designed to illustrate a careful, auditable workflow rather than to provide mechanistic or causal claims.
The project follows a stepwise logic: from global tumor microenvironment (TME) composition to a focused exploration of T cell states, with explicit attention to how analytical choices (filtering, annotation, baseline definition) influence interpretation.
- GEO accession: GSE131907
- Organism: Homo sapiens
- Description: Single-cell RNA-seq profiles from lung adenocarcinoma patients, including primary tumors, metastatic sites (lymph node, brain), pleural effusions, and matched non-tumor tissues.
- Total cells (original study): >200,000
- Subset used here: Post-QC subsample (~40,000 cells)
Tumor versus Non-tumor status is defined by tissue of origin, as described in the original study, and not as an experimental condition.
The dataset was subsampled post-QC to ensure computational tractability while preserving cellular diversity.
scRNA_LUAD_Tcells/
│
├── Data/
│ └── processed/
│ └── subsample_40k/
│ ├── adata_qc.h5ad # Global post-QC TME object
│ ├── adata_Tcells.h5ad # T cell subset with annotations
│
├── Notebooks/
│ ├── NB01_data_acquisition_and_validation.ipynb
│ ├── NB02_TME_Global.ipynb
│ ├── NB03_TME_WIDE.ipynb
│ ├── NB04_T_Cells_Analysis.ipynb
│ └── NB05_Tcells_Tumor_vs_NonTumor_Analysis.ipynb
│
├── Results/
│ ├── figures/
│ └── tables/
│
└── README.md
-
NB01 – Data acquisition & validation
Dataset loading, metadata integration, and initial quality control. -
NB02 – Global TME analysis
Broad characterization of the tumor microenvironment, including major cell types. -
NB03 – TME WIDE
Post-QC global TME object generation (adata_qc) and reference visualizations. -
NB04 – T cell analysis
Subsetting of T lymphocytes, Leiden clustering, and derivation of major T cell states (Naive/Memory, Cytotoxic, Exhaustion-like). -
NB05 – Tumor vs Non-tumor (T cells)
Exploratory comparison of T cell distributions and states across Tumor, Non-tumor, and Tumor_PE contexts, using a clearly defined global baseline and sensitivity analyses.
- Analyses are descriptive and exploratory.
- Tumor context is treated as a tissue-based biological context, not as a modeled condition.
- All comparisons rely on explicitly defined objects (
adata_qcfor baseline,adata_Tcellsfor T cells). - T cell states are conservatively annotated; clusters without prior state definition are labeled as Unassigned rather than forced.
- Sensitivity analyses are used to assess robustness to filtering choices.
This project is intended as a learning-oriented, exploratory analysis and does not aim to:
- infer functional mechanisms,
- establish causal relationships,
- or identify biomarkers.
Instead, it demonstrates a rigorous and cautious approach to single-cell data exploration, suitable for PhD-level methodological discussion and project development.
- Intermediate AnnData objects are generated and saved during the analysis to support a reproducible workflow, but are not included in this repository due to file size limitations.
- Figures and tables are generated within notebooks and stored in the
Results/directory. - Raw data are not redistributed due to controlled-access constraints.
Kim N., Kim H.K., Lee K., Hong Y., et al. (2020).
Single-cell RNA sequencing demonstrates the molecular and cellular reprogramming of metastatic lung adenocarcinoma.
Nature Communications, 11, 2285.
GEO accession: GSE131907.