This project evaluates transformation strategies for using single-nucleus RNA-seq (snRNA-seq) as a reference for bulk RNA-seq deconvolution, compared to single-cell RNA-seq (scRNA-seq).
Because snRNA-seq captures primarily nuclear RNA, it differs systematically from scRNA-seq, which contains both nuclear and cytoplasmic RNA. We test whether these differences degrade deconvolution performance and whether specific transformations can mitigate them. Comparisons are performed using both simulated and real bulk datasets.
git clone https://github.com/greenelab/deconvolution_sc_sn_comparison.git
cd deconvolution_sc_sn_comparisonCreate the required Python and R environments:
bash environments/create_envs.shThis creates:
env_deconv(Python)env_deconv_R(R / Bioconductor)
Download all datasets into the data/ directory using the appropriate dataset identifiers:
data/
└── DATASET_ID/
All datasets are publicly available. Links, preprocessing steps, and metadata are provided in:
data/details/Data_Details.xlsx
Run the following scripts in order (typically using sbatch on an HPC system).
scripts/0_preprocess_data.shRuns preprocessing and QC notebooks for all datasets.
scripts/1_train_scvi_models_sim.shTrains scVI models (conditional and non-conditional; with and without DE genes).
scripts/2_prepare_deconvolution_sim.shPrepares transformed references and pseudobulks for simulations.
scripts/3_run_bayesprism_sim.shRuns BayesPrism / InstaPrism deconvolution.
scripts/4_process_results_sim.shProcesses simulation results and computes RMSE and Pearson correlation.
scripts/5_results_notebook_sim.shGenerates notebooks and figures for simulation results.
scripts/6_comparison_with_sc_and_bulks.shTrains models on real bulks and generates comparison notebooks.
All figures used in the manuscript are generated in the result notebooks located in:
notebooks/
- Preprocess data using:
scripts/0_preprocess_data.sh- If training is required, add your code to:
scripts/train_scvi_models_allgenes.py
- Add your transformation to:
scripts/prepare_deconvolution_sim.py
(see the section marked “Add your transformation here”).
- Add a preprocessing notebook to:
notebooks/
- Add the dataset under:
data/YOUR_DATASET_ID/
- Register the new dataset ID in the relevant shell scripts, for example:
datasets=("ADP" "PBMC" "MBC" "MSB")Change to:
datasets=("ADP" "PBMC" "MBC" "MSB" "YOUR_DATASET_ID")- For real bulk analyses, add the dataset ID to scripts that include
Real_ADP.
Detailed information on all datasets used in this study — including download links, filtering steps, and preprocessing details — is available in:
data/details/Data_Details.xlsx