This repository contains scripts and Quarto documents for the analysis of uterine fluid extracellular vesicles (UF-EV) using generative deep models, specifically the BulkTrajBlend architecture from omicverse package.
Best efforts were made to try to organise things in the following way:
-
./preproc_scripts/- Scripts to run nf-core/rnaseq preprocessing pipeline on the raw samples in a SLURM HPC. -
./analysis/- RMarkdown and Quarto documents to generate interactive analysis reports. The main analysis workflow used in the manuscript is described in theev_article.qmd, while other documents in that folder were used for explorative data analysis and to run statistical testing. -
./raw_data/- Some extra raw data files used as input during the analysis; the contents of the folders should be in$RAW_DATA_FOLDER. -
./scripts/- Scripts for processing raw read counts emitted by the nf-core/rnaseq pipeline.-
ev_raw.r&ev_comb.rare used for preprocessing the read tables for our dataset and combined. -
de_runner.r&de_comb_runner.rare used for running differential analysis (not included in the manuscript). -
preproc_sc.qmd&preproc_st.qmdare document versions of the scripts used to run single cell atlas preprocessing, loading in the UF-EV datasets, training models and running inference for deconvolution and project to spatial transcriptomic datasets.
-
-
R and Python scripts to take the read count matrices emitted by nf-core/rnaseq pipeline in
$RAW_DATA_FOLDER(not included in this repository) and output phenotype files after formatting to$DATA_FOLDER.
- We are using
pixifor this project, and there are following environments defined:
# install as necessary
pixi install -e proc # train the models and perform st mapping
pixi install -e analysis # main analysis in Python
# some extras
pixi install -e pydeconv # for deconvolution tasks
pixi install -e r-analysis # some R tools for DE and DWLS, not used
-
Then, copy the
.env_templateto.envand populate the environment variables to suit your situation. -
Download the processed data from E-MTAB-15505 and unzip them to the
$RAW_DATA_FOLDERfolder as defined in your.envfile. Don't forget to copy./raw_data/contents and download endometriumAtlasV2_cells_with_counts.h5ad to that folder too. -
Next, start running the preprocessing scripts found in
./preproc_scripts/to quantify reads. -
Follow the
./run_wf.shscript to recreate the results.
Alternatively, you can use the scripts in the ./preproc_scripts/ folder to run the nf-core/rnaseq pipeline on the raw data.