RNA-seq Analysis Pipeline

A comprehensive Snakemake-based RNA-seq data analysis pipeline for bulk RNA sequencing data.

Overview

This pipeline processes paired-end RNA-seq data from raw FASTQ files through quality control, trimming, alignment, and quantification. It is designed to handle multiple technical replicates (runs) per biological sample and supports biological replicates for reproducibility analysis.

Features

Quality Control: FastQC and fastp for adapter trimming and quality filtering
Alignment: STAR aligner with gene counting
Quantification: Gene-level expression quantification from STAR ReadsPerGene output
Technical Replicate Handling: Automatic merging of multiple runs per sample
Biological Replicate Support: Support for replicate groups (via replicate_name column)
Comprehensive Reporting: MultiQC report aggregating all QC metrics

Requirements

Snakemake >= 8.0.0
Conda/Mamba for environment management
Python >= 3.8

Sample Information File

The pipeline uses a CSV file (config/test_info.csv) with the following columns:

Column	Required	Description
sample_name	Yes	Unique sample identifier
run	Yes	Run number (technical replicate), integer
R1	Yes	Path to forward read FASTQ file
R2	Yes	Path to reverse read FASTQ file
replicate_name	No	Biological replicate group name
strandedness	No	Library strandedness: none/forward/reverse
passqc	No	QC pass flag: 1=pass, 0=fail

Example Sample File

sample_name,run,R1,R2,replicate_name
test1,1,/path/to/test1_1_1.fq,/path/to/test1_1_2.fq,test1
test1,2,/path/to/test1_2_1.fq,/path/to/test1_2_2.fq,test1
test2,1,/path/to/test2_1_1.fq,/path/to/test2_1_2.fq,test2

Key Features of Sample Handling

Technical Replicates: Multiple runs per sample (same sample_name, different run numbers) are processed independently and then merged
Biological Replicates: Samples sharing the same replicate_name are grouped for reproducibility analysis
QC Filtering: If passqc column exists, only samples with at least one run passing QC (passqc=1) are included in downstream analysis

Configuration

Edit config/config.yaml to customize:

# Path to sample information CSV file
samples: config/test_info.csv

# Output directory path
result_path: results

# Reference genome settings
ref:
  species: homo_sapiens
  release: 100
  build: GRCh38

# Resource paths
resources:
  star_index: /path/to/star/index
  gtf: /path/to/annotation.gtf
  fasta: /path/to/genome.fa

# STAR alignment parameters
params:
  star:
    align:
      extra: "--outSAMtype BAM SortedByCoordinate --quantMode GeneCounts"
    index:
      extra: ""

Running the Pipeline

Dry Run (Check Execution Plan)

cd workflow
snakemake --use-conda -n

Execute Pipeline

cd workflow
snakemake --use-conda --cores 8

Execute with Cluster Support

cd workflow
snakemake --use-conda --cluster "sbatch -p partition -c {threads}" --jobs 10

License

This pipeline is provided as-is for research purposes.

Contact

For questions or issues, please open an issue on the repository or contact the pipeline maintainer.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
config		config
resources		resources
schemas		schemas
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNA-seq Analysis Pipeline

Overview

Features

Requirements

Sample Information File

Example Sample File

Key Features of Sample Handling

Configuration

Running the Pipeline

Dry Run (Check Execution Plan)

Execute Pipeline

Execute with Cluster Support

License

Contact

About

Uh oh!

Releases

Packages

Languages

License

regulatory-genomics/RNA-sm

Folders and files

Latest commit

History

Repository files navigation

RNA-seq Analysis Pipeline

Overview

Features

Requirements

Sample Information File

Example Sample File

Key Features of Sample Handling

Configuration

Running the Pipeline

Dry Run (Check Execution Plan)

Execute Pipeline

Execute with Cluster Support

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages