Author: Dr. Eric Bareke (Majewski Lab, McGill University)
Repository: https://github.com/ebareke/rnaseqDegy
rnaseqDegy is an R package providing a single, fully automated pipeline for bulk RNAβseq analysis β covering QC, differential expression, GO ORA, and GSEA using userβsupplied GMT files, for both human and mouse datasets.
The package is designed for research labs, HPC clusters, and automated workflows, offering clean, reproducible, publicationβready outputs.
- Count filtering (β₯10 counts in β₯50% samples)
- PCA (with optional batch correction via
limma::removeBatchEffect) - Sample correlation heatmap (Pearson/Spearman/Kendall)
- Top 500 most variable gene heatmap
- Autoβdimensioned plots with publicationβready styling
- Choose DESeq2 or edgeR
- Batch included in the design (
~ Batch + Condition) - Export full results + normalized counts
- Highlights top 10 up/down DEGs
- Shows counts of Up/Down genes in subtitle
- Custom colour scheme and bold theme
- Uses clusterProfiler
- Simplification by semantic similarity
- Dotplot, barplot, cnetplot, emapplot
- Accepts any custom GMT file
- Dotplot, barplot, cnetplot, emapplot
- Autoβfiltered term subset via
gmt_terms
This package requires Bioconductor dependencies. Install everything cleanly with:
# Install devtools if not present
install.packages("devtools")
# Install rnaseqDegy from GitHub
devtools::install_github("ebareke/rnaseqDegy")library(rnaseqDegy)
run_rnaseqDegy(
counts = "counts.tsv",
design = "design.tsv",
species = "human",
de_method = "DESeq2",
condition_reference = "Control",
condition_test = "Mutant",
batch_correction = TRUE,
run_ora = TRUE,
run_gsea = TRUE,
gmt_dir = "gmt/",
gmt_files = c("HALLMARK.gmt", "GO_BP.gmt"),
output_dir = "results/Control_vs_Mutant",
prefix = "C_vs_M"
)This will generate:
- QC plots
- DE tables + volcano
- ORA (BP/MF/CC)
- GSEA (for each GMT)
Outputs are autoβnamed and saved as PDF + PNG.
GeneID S1 S2 S3 S4
ENSG0001 50 23 11 9
ENSG0002 140 180 90 76
...
- First column: GeneID (Ensembl or Symbol)
- Following columns: raw integer counts
Sample Batch Condition
S1 B1 Control
S2 B1 Control
S3 B2 Mutant
S4 B2 Mutant
results/
βββ QC_PCA_*.pdf/png
βββ QC_sampleCorrelation.pdf/png
βββ QC_topVar500_heatmap.pdf/png
βββ DE_results_all.tsv
βββ normalized_counts.tsv
βββ DE_volcano.pdf/png
βββ ORA_GO_BP_*.tsv + plots
βββ ORA_GO_MF_*.tsv + plots
βββ ORA_GO_CC_*.tsv + plots
βββ GSEA_<gmt>_*.tsv + plots
project/
βββ counts.tsv
βββ design.tsv
βββ gmt/
β βββ HALLMARK.gmt
β βββ GO_BP.gmt
βββ rnaseqDegy_results/
filter_min_count,filter_min_propbatch_correction = TRUE/FALSEgmt_terms = c("GOBP_APOPTOSIS","MYC_TARGETS_V2")corr_method = "spearman"output_dir,prefix
All parameters mirror the original CLI pipeline.
If you use rnaseqDegy in your research, please cite:
Dr. Eric Bareke, Majewski Lab, McGill University.
(A full CITATION.cff file is included in this repository.)
See CONTRIBUTING.md for guidelines. Pull requests are welcome.
MIT License β see LICENSE file.