Skip to content

Preproc scripts and analysis documents for deconvolution profiling of EV endometrial samples

License

Notifications You must be signed in to change notification settings

allumik/endo-ev

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Non-invasive Transcriptomic Cell Profiling of the Human Endometrium with Generative Deep Learning

Abstract

This repository contains scripts and Quarto documents for the analysis of uterine fluid extracellular vesicles (UF-EV) using generative deep models, specifically the BulkTrajBlend architecture from omicverse package.

Repository Structure

Best efforts were made to try to organise things in the following way:

  • ./preproc_scripts/ - Scripts to run nf-core/rnaseq preprocessing pipeline on the raw samples in a SLURM HPC.

  • ./analysis/ - RMarkdown and Quarto documents to generate interactive analysis reports. The main analysis workflow used in the manuscript is described in the ev_article.qmd, while other documents in that folder were used for explorative data analysis and to run statistical testing.

  • ./raw_data/ - Some extra raw data files used as input during the analysis; the contents of the folders should be in $RAW_DATA_FOLDER.

  • ./scripts/ - Scripts for processing raw read counts emitted by the nf-core/rnaseq pipeline.

    • ev_raw.r & ev_comb.r are used for preprocessing the read tables for our dataset and combined.

    • de_runner.r & de_comb_runner.r are used for running differential analysis (not included in the manuscript).

    • preproc_sc.qmd & preproc_st.qmd are document versions of the scripts used to run single cell atlas preprocessing, loading in the UF-EV datasets, training models and running inference for deconvolution and project to spatial transcriptomic datasets.

  • R and Python scripts to take the read count matrices emitted by nf-core/rnaseq pipeline in $RAW_DATA_FOLDER (not included in this repository) and output phenotype files after formatting to $DATA_FOLDER.

Setup for Reproducing the Analysis

  1. We are using pixi for this project, and there are following environments defined:
# install as necessary
pixi install -e proc # train the models and perform st mapping
pixi install -e analysis # main analysis in Python

# some extras
pixi install -e pydeconv # for deconvolution tasks
pixi install -e r-analysis # some R tools for DE and DWLS, not used
  1. Then, copy the .env_template to .env and populate the environment variables to suit your situation.

  2. Download the processed data from E-MTAB-15505 and unzip them to the $RAW_DATA_FOLDER folder as defined in your .env file. Don't forget to copy ./raw_data/ contents and download endometriumAtlasV2_cells_with_counts.h5ad to that folder too.

  3. Next, start running the preprocessing scripts found in ./preproc_scripts/ to quantify reads.

  4. Follow the ./run_wf.sh script to recreate the results.

Alternatively, you can use the scripts in the ./preproc_scripts/ folder to run the nf-core/rnaseq pipeline on the raw data.

About

Preproc scripts and analysis documents for deconvolution profiling of EV endometrial samples

Resources

License

Stars

Watchers

Forks

Packages

No packages published