Multiome (scRNA + scATAC) Analysis Pipeline

This repository contains scripts for analyzing single-cell RNA sequencing (scRNA-seq) and single-cell ATAC sequencing (scATAC-seq) multiome data on HPC (High-Performance Computing) systems running Linux/Unix. The workflow ensures high-quality cell filtering, annotation, differential analysis, and advanced computational techniques for exploring regulatory mechanisms. It is designed to be scalable across multiple samples by submitting parallel jobs, optimizing efficiency and performance.

Options

Option	Description
`--all_samples`	Process all available samples.
`--include_samples`	Process specific samples (comma-separated list).
`--skip_samples`	Exclude specific samples from processing.
`--rna`	Process RNA data.
`--atac`	Process ATAC data.
`--souporcell`	Run Souporcell for RNA doublet detection.
`--assign_geno`	Correlate clusters to donor SNP genotypes.
`--overwrite_rna`	Overwrite existing RNA preprocessing results.
`-i, --input`	(Required) Input directory path.
`-o, --output`	(Required) Output directory path.
`-d, --dd_file`	(Required for Souporcell) CSV file containing `sample_name,n_genotypes,vcf_file`.
`-r, --ref_genome`	Reference genome directory for doublet detection.
`-h, --help`	Help commands

Run multiome_qc_workflow.sh --help for guidlines

3.2. scRNA-seq Quality Control (QC) using Scanpy

Filter out low-quality cells based on:

Number of detected genes per cell.
Total number of UMI counts.
Percentage of mitochondrial gene expression.

Remove doublets using tools like Scrublet or DoubletFinder.

3.3. scATAC-seq Quality Control using ArchR

Assess TSS Enrichment to evaluate chromatin accessibility signal.

Filter low-quality cells based on:

Total number of fragments.
Percentage of reads in peaks.
Doublet identification using computational methods.

4. Cell Annotation

4.1. Annotation of scRNA-seq Cells

Cluster cells using graph-based clustering methods (e.g., Leiden, Louvain, Phenograph).
Identify marker genes for each cluster and manually annotate based on known cell types.
Use reference datasets (such as PBMC Atlas) to guide annotations.

4.2. Annotation of scATAC-seq Cells

Compute gene activity scores from chromatin accessibility data.
Transfer labels from scRNA-seq clusters using integration techniques like Harmony or Seurat.
Identify cell-type-specific open chromatin regions using peak calling.

5. Exploratory Data Analysis & Visualization

5.1. Visualizing Quality Control Metrics

Generate violin plots and histograms for cell counts, gene counts, and mitochondrial content.
Identify potential batch effects and outliers.

5.2. Clustering and Dimensionality Reduction

Perform PCA, UMAP, and t-SNE to visualize cell heterogeneity.
Assess clustering consistency across different resolutions.

5.3. Cell Type Proportions and Batch Effects

Compare cell-type distribution across different conditions or sample batches.
Apply batch correction techniques if needed (e.g., Harmony, Combat).

5.4. Peak Accessibility Visualization in ATAC Data

Generate genome browser tracks to inspect chromatin accessibility at specific loci.
Plot peak intensities for marker genes associated with different cell types.

6. Differential Analysis

6.1. Differential Gene Expression (DGE) Analysis

Compare gene expression between different conditions using Wilcoxon rank-sum test or MAST.
Identify upregulated and downregulated genes across groups.
Perform pseudo-bulk analysis by aggregating cells into sample-level pseudo-replicates.

6.2. Differential Peak Accessibility in ATAC Data

Identify differentially accessible regions (DARs) between conditions.
Perform peak-based differential analysis using statistical tests.

6.3. ChromVAR for Differential Motif Accessibility

Compute TF motif deviations using chromVAR to infer regulatory activity.
Identify transcription factors with altered activity across conditions.

7. Advanced Analysis

7.1. Differential Abundance Testing with Milo

Uses MiloR to detect significant changes in the proportion of cell states.
Identifies cell populations expanding or contracting across conditions.

7.2. SPECTRA for Latent Cell State Identification

Uncovers latent regulatory programs using non-negative matrix factorization (NMF).
Identifies transcriptional programs driving cell state changes.

7.3. SCENIC+ for Gene Regulatory Network Inference

Infers gene regulatory networks (GRNs) from multiome data.
Identifies key transcription factors (TFs) and their downstream targets.

8. Results Interpretation

8.1. Integrating scRNA-seq and scATAC-seq Insights

Compare gene expression changes with chromatin accessibility patterns.
Identify genes showing coordinated regulation at both the transcript and chromatin level.

8.2. Functional Enrichment Analysis

Perform Gene Ontology (GO) and pathway enrichment analysis for significant genes.
Identify biological processes and pathways associated with differential expression.

8.3. Visualization of Key Findings

Generate summary figures such as heatmaps, volcano plots, and motif enrichment plots.
Create interactive visualizations using tools like cellxgene or UCSC Genome Browser.

9. Variant Calling in Single-Cell Multiome Data

Variant calling in single-cell RNA and ATAC-seq data helps identify genetic mutations, single nucleotide variants (SNVs), and structural variations at the single-cell level. This process can reveal clonal heterogeneity, somatic mutations, and their effects on gene expression and chromatin accessibility.

Before calling variants, the sequencing data must be preprocessed:

Read Alignment: The raw FASTQ files are aligned to the reference genome using a high-accuracy aligner like CellRanger ARC, STAR, or BWA.
BAM File Processing: The output BAM files undergo sorting, duplicate removal, and quality filtering using tools like Samtools or Picard.
Base Quality Score Recalibration (BQSR): Improves accuracy by correcting systematic sequencing errors using GATK.

10. Troubleshooting

Common Issues and Solutions

Issue	Possible Cause	Solution
Low cell yield	Stringent filtering	Adjust QC thresholds carefully
High doublet rate	Poor library quality	Use stricter doublet detection
Batch effects	Technical variability	Apply batch correction techniques
Poor clustering	High noise in data	Optimize dimensionality reduction

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
01_run_cellranger		01_run_cellranger
02_create_persample_objects		02_create_persample_objects
03_Manually_filter_persample		03_Manually_filter_persample
03_Variant_Calling		03_Variant_Calling
04_create_merged_objects		04_create_merged_objects
05_QCmetrics_CellAnnotation		05_QCmetrics_CellAnnotation
06_Differential_Expression_Accesibility		06_Differential_Expression_Accesibility
07_advanced_analysis		07_advanced_analysis
Installation_and_setup.sh		Installation_and_setup.sh
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multiome (scRNA + scATAC) Analysis Pipeline

Table of Contents

1. Installation & Dependencies

2. Data Processing

3. Quality Control & Filtering

Options

4. Cell Annotation

5. Exploratory Data Analysis & Visualization

6. Differential Analysis

7. Advanced Analysis

8. Results Interpretation

9. Variant Calling in Single-Cell Multiome Data

10. Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

ReshmaRamaiah10/Single-Cell-Multiome-Scripts

Folders and files

Latest commit

History

Repository files navigation

Multiome (scRNA + scATAC) Analysis Pipeline

Table of Contents

1. Installation & Dependencies

2. Data Processing

3. Quality Control & Filtering

Options

4. Cell Annotation

5. Exploratory Data Analysis & Visualization

6. Differential Analysis

7. Advanced Analysis

8. Results Interpretation

9. Variant Calling in Single-Cell Multiome Data

10. Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages