Non-invasive Transcriptomic Cell Profiling of the Human Endometrium with Generative Deep Learning

Abstract

This repository contains scripts and Quarto documents for the analysis of uterine fluid extracellular vesicles (UF-EV) using generative deep models, specifically the BulkTrajBlend architecture from omicverse package.

Repository Structure

Best efforts were made to try to organise things in the following way:

./preproc_scripts/ - Scripts to run nf-core/rnaseq preprocessing pipeline on the raw samples in a SLURM HPC.
./analysis/ - RMarkdown and Quarto documents to generate interactive analysis reports. The main analysis workflow used in the manuscript is described in the ev_article.qmd, while other documents in that folder were used for explorative data analysis and to run statistical testing.
./raw_data/ - Some extra raw data files used as input during the analysis; the contents of the folders should be in $RAW_DATA_FOLDER.
./scripts/ - Scripts for processing raw read counts emitted by the nf-core/rnaseq pipeline.
- ev_raw.r & ev_comb.r are used for preprocessing the read tables for our dataset and combined.
- de_runner.r & de_comb_runner.r are used for running differential analysis (not included in the manuscript).
- preproc_sc.qmd & preproc_st.qmd are document versions of the scripts used to run single cell atlas preprocessing, loading in the UF-EV datasets, training models and running inference for deconvolution and project to spatial transcriptomic datasets.
R and Python scripts to take the read count matrices emitted by nf-core/rnaseq pipeline in $RAW_DATA_FOLDER (not included in this repository) and output phenotype files after formatting to $DATA_FOLDER.

Setup for Reproducing the Analysis

We are using pixi for this project, and there are following environments defined:

# install as necessary
pixi install -e proc # train the models and perform st mapping
pixi install -e analysis # main analysis in Python

# some extras
pixi install -e pydeconv # for deconvolution tasks
pixi install -e r-analysis # some R tools for DE and DWLS, not used

Then, copy the .env_template to .env and populate the environment variables to suit your situation.
Download the processed data from E-MTAB-15505 and unzip them to the $RAW_DATA_FOLDER folder as defined in your .env file. Don't forget to copy ./raw_data/ contents and download endometriumAtlasV2_cells_with_counts.h5ad to that folder too.
Next, start running the preprocessing scripts found in ./preproc_scripts/ to quantify reads.
Follow the ./run_wf.sh script to recreate the results.

Alternatively, you can use the scripts in the ./preproc_scripts/ folder to run the nf-core/rnaseq pipeline on the raw data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Non-invasive Transcriptomic Cell Profiling of the Human Endometrium with Generative Deep Learning

Abstract

Repository Structure

Setup for Reproducing the Analysis

About

Uh oh!

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
analysis		analysis
preproc_scripts		preproc_scripts
raw_data		raw_data
scripts		scripts
.env_template		.env_template
LICENSE.md		LICENSE.md
README.md		README.md
pixi.toml		pixi.toml
run_wf.sh		run_wf.sh

License

allumik/endo-ev

Folders and files

Latest commit

History

Repository files navigation

Non-invasive Transcriptomic Cell Profiling of the Human Endometrium with Generative Deep Learning

Abstract

Repository Structure

Setup for Reproducing the Analysis

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages