GSEA support by zeehio · Pull Request #84 · pinin4fjords/shinyngs

zeehio · 2025-10-29T16:17:55Z

This is a draft pull request that will add support for GSEA output files to shinyngs.

Still work in progress.

Status/Roadmap

Able to read enrichment results when two csv files are given separately (up/down)
In the ExploratorySummarizedExperiment there is a new slot for the gene_set_analyses_tool. This slot allows the user to specify the tool used for gene enrichment. Depending on the tool, the package processes the enrichment files, applying the pvalue/FDR filters from the user to the corresponding columns. By default a simple auto detect heuristic is implemented.
The gene set enrichment analysis module and the gene set barcode module are updated so they filter the table based on the user given pvalue/FDR values, applying filters to the correct columns.
make app from files script
documentation and tests

Closes #83

zeehio · 2025-11-08T17:14:20Z

I've pushed significant progress and the code is starting to grow. I believe commits can be reviewed incrementally for your convenience.

I would like to find a way to pass enrichment result files to make_app_from_files, but CLI arguments are not that convenient: The make_app_from_files tool accepts one assay with a list of differential expression tables, one per contrast.

For each contrast, there is a list of gene set types. For each gene set type there are up to two files associated (up and down are reported separately in gsea). For 4 contrasts and 3 gene set types I would need to report 24 tsv files. This is not convenient in the command line.

My suggestion is to add functional enrichment support to that script by passing a list of gene set types separated by commas in an argument, and passing a file name template. The tool will assume that file names from the enrichment tool follow a predictable naming based on the template, and will substitute contrast information and the gene set to find all the tsv files.

This solution avoids bloating the command line arguments and it is flexible enough to work well with the differential abundance nfcore pipeline.

I'm happy to get feedback about this idea

zeehio · 2025-11-09T07:34:53Z

Still needs more testing, tests and documentation, but the core work is already here.

Copilot

Pull Request Overview

This PR adds support for GSEA (Gene Set Enrichment Analysis) output files to shinyngs. The implementation includes a new gene_set_analyses_tool slot that allows users to specify the enrichment tool used (GSEA or ROAST), with automatic detection as the default. The PR updates the gene set enrichment analysis and barcode plot modules to correctly filter tables based on user-specified p-value and FDR thresholds, applying filters to the appropriate columns depending on the enrichment tool used.

Key changes:

New gene_set_analyses_tool slot in ExploratorySummarizedExperiment class for tool-specific column handling
Enhanced support for reading GSEA results split into separate up/down regulation files
Auto-detection heuristic to identify enrichment tool format from column names
Updated filtering logic in gene set analysis modules to use tool-specific column names

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
R/ExploratorySummarizedExperiment-class.R	Adds `gene_set_analyses_tool` slot and validation function, modernizes class definition syntax
R/ExploratorySummarizedExperimentList-class.R	Modernizes class definition from `representation` to `slots`, adds warning suppression for numeric type checking
R/accessory.R	Implements GSEA file reading logic, adds `get_gst_columns` for tool-specific column detection and `remove_nulls` helper
R/genesetanalysistable.R	Updates filtering to use tool-specific column names for p-values, FDR, and direction
R/genesetbarcodeplot.R	Updates to retrieve FDR and direction using tool-specific column names
exec/make_app_from_files.R	Adds command-line options for enrichment analysis and implements filename template parsing for GSEA results
man/*.Rd	Updates documentation for new parameters and slots
vignettes/shinyngs.Rmd	Adds explicit library load for better code clarity
DESCRIPTION	Updates R version requirement and RoxygenNote version
R/shinyngs.R	Modernizes package documentation syntax

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

exec/make_app_from_files.R

R/accessory.R

Copilot · 2025-11-11T13:48:40Z

R/ExploratorySummarizedExperiment-class.R

+  # structure. gene_set_analyses_tool should have a single string
+
+  # Create default structure if necessary:
+  out <- list()


Unused variable out is created but never used. This line can be removed.

Suggested change

out <- list()

R/accessory.R

R/ExploratorySummarizedExperiment-class.R

exec/make_app_from_files.R

pinin4fjords · 2025-11-11T14:01:03Z

R/ExploratorySummarizedExperiment-class.R

+check_gene_set_analyses_tool_consistency <- function(gene_set_analyses, gene_set_analyses_tool) {
+  # gene_set_analyses and gene_set_analyses_tool should have the same list of lists
+  # structure. gene_set_analyses_tool should have a single string
+
+  # Create default structure if necessary:
+  out <- list()
+
+  if (is.null(gene_set_analyses_tool)) {
+    gene_set_analyses_tool <- list()
+  }
+  for (assay_name in names(gene_set_analyses)) {
+    if (!assay_name %in% names(gene_set_analyses_tool)) {
+      gene_set_analyses_tool[[assay_name]] <- list()
+    }
+    for (gs_type in names(gene_set_analyses[[assay_name]])) {
+      if (! gs_type %in% names(gene_set_analyses_tool[[assay_name]])) {
+        gene_set_analyses_tool[[assay_name]][[gs_type]] <- list()
+      }
+      for (contrast_name in names(gene_set_analyses[[assay_name]][[gs_type]])) {
+        if (! contrast_name %in% names(gene_set_analyses_tool[[assay_name]][[gs_type]])) {
+          gene_set_analyses_tool[[assay_name]][[gs_type]][[contrast_name]] <- "auto"
+        } else {
+          tool_name <- gene_set_analyses_tool[[assay_name]][[gs_type]][[contrast_name]]
+          if (!is.character(tool_name) || length(tool_name) > 1) {
+            stop(paste0("Invalid gene_set_analyses_tool. gene_set_analyses_tool for ",
+                        gs_type, " and ", contrast_name, " should be one of 'auto', 'gsea' or 'roast'. Found ",
+                        paste0(tool_name, collapse=",")))
+          }
+          if (! tool_name %in% c("auto", "gsea", "roast")) {
+            stop(paste0("Invalid gene_set_analyses_tool. gene_set_analyses_tool for ",
+                        gs_type, " and ", contrast_name, " should be one of 'auto', 'gsea' or 'roast'. Found ",
+                        tool_name))
+          }
+        }
+      }
+      # In case gene_set_analyses_tool has other entries, just keep the ones matching gene_set_analyses
+      contrasts_ordered <- names(gene_set_analyses[[assay_name]][[gs_type]])
+      gene_set_analyses_tool[[assay_name]][[gs_type]] <- gene_set_analyses_tool[[assay_name]][[gs_type]][contrasts_ordered]
+    }
+    gene_set_type_names_ordered <- names(gene_set_analyses[[assay_name]])
+    gene_set_analyses_tool[[assay_name]] <- gene_set_analyses_tool[[assay_name]][gene_set_type_names_ordered]
+  }
+  analysis_names_ordered <- names(gene_set_analyses_tool)
+  gene_set_analyses_tool <- gene_set_analyses_tool[analysis_names_ordered]
+  gene_set_analyses_tool
+}


Suggested change

check_gene_set_analyses_tool_consistency <- function(gene_set_analyses, gene_set_analyses_tool) {

# gene_set_analyses and gene_set_analyses_tool should have the same list of lists

# structure. gene_set_analyses_tool should have a single string

# Create default structure if necessary:

out <- list()

if (is.null(gene_set_analyses_tool)) {

gene_set_analyses_tool <- list()

}

for (assay_name in names(gene_set_analyses)) {

if (!assay_name %in% names(gene_set_analyses_tool)) {

gene_set_analyses_tool[[assay_name]] <- list()

}

for (gs_type in names(gene_set_analyses[[assay_name]])) {

if (! gs_type %in% names(gene_set_analyses_tool[[assay_name]])) {

gene_set_analyses_tool[[assay_name]][[gs_type]] <- list()

}

for (contrast_name in names(gene_set_analyses[[assay_name]][[gs_type]])) {

if (! contrast_name %in% names(gene_set_analyses_tool[[assay_name]][[gs_type]])) {

gene_set_analyses_tool[[assay_name]][[gs_type]][[contrast_name]] <- "auto"

} else {

tool_name <- gene_set_analyses_tool[[assay_name]][[gs_type]][[contrast_name]]

if (!is.character(tool_name) || length(tool_name) > 1) {

stop(paste0("Invalid gene_set_analyses_tool. gene_set_analyses_tool for ",

gs_type, " and ", contrast_name, " should be one of 'auto', 'gsea' or 'roast'. Found ",

paste0(tool_name, collapse=",")))

}

if (! tool_name %in% c("auto", "gsea", "roast")) {

stop(paste0("Invalid gene_set_analyses_tool. gene_set_analyses_tool for ",

gs_type, " and ", contrast_name, " should be one of 'auto', 'gsea' or 'roast'. Found ",

tool_name))

}

}

}

# In case gene_set_analyses_tool has other entries, just keep the ones matching gene_set_analyses

contrasts_ordered <- names(gene_set_analyses[[assay_name]][[gs_type]])

gene_set_analyses_tool[[assay_name]][[gs_type]] <- gene_set_analyses_tool[[assay_name]][[gs_type]][contrasts_ordered]

}

gene_set_type_names_ordered <- names(gene_set_analyses[[assay_name]])

gene_set_analyses_tool[[assay_name]] <- gene_set_analyses_tool[[assay_name]][gene_set_type_names_ordered]

}

analysis_names_ordered <- names(gene_set_analyses_tool)

gene_set_analyses_tool <- gene_set_analyses_tool[analysis_names_ordered]

gene_set_analyses_tool

}

check_gene_set_analyses_tool_consistency <- function(gene_set_analyses, gene_set_analyses_tool) {

if (is.null(gene_set_analyses_tool)) {

gene_set_analyses_tool <- list()

}

valid_tools <- c("auto", "gsea", "roast")

# Recursive function to mirror structure and validate

mirror_and_validate <- function(gsa_node, gst_node, path = "") {

if (is.data.frame(gsa_node)) {

# Leaf node - should have a tool string

tool <- if (is.null(gst_node)) "auto" else gst_node

if (!is.character(tool) || length(tool) != 1 || !tool %in% valid_tools) {

stop(sprintf("Invalid gene_set_analyses_tool at %s: expected one of %s, got %s",

path, paste(valid_tools, collapse=", "), paste(tool, collapse=",")))

}

return(tool)

}

# Recurse through structure

lapply(setNames(names(gsa_node), names(gsa_node)), function(name) {

mirror_and_validate(gsa_node[[name]], gst_node[[name]], paste0(path, "/", name))

})

}

mirror_and_validate(gene_set_analyses, gene_set_analyses_tool)

}

I found this function quite hard to understand- my AI friend suggested this, could you try it?

Your AI friend gave me a good starting point, although its suggestion was a bit flawed. I believe I have now implemented something easier to read

pinin4fjords · 2025-11-11T14:03:37Z

R/accessory.R

+        # Basic list to pass to object creation
+        exp$gene_set_analyses_tool <- check_gene_set_analyses_tool_consistency(exp$gene_set_analyses, exp$gene_set_analyses_tool)
+
        ese_list$gene_set_analyses <- lapply(exp$gene_set_analyses, function(assay) {
          lapply(assay, function(gene_set_type) {
            lapply(gene_set_type, function(contrast) {
-              read.csv(contrast, check.names = FALSE, stringsAsFactors = FALSE, row.names = 1)
+              # contrast may be one file name or two file names (up and down), or NULL
+              if (is.null(contrast) || length(contrast) == 0) {
+                NULL
+              } else if (length(contrast) == 1) {
+                read.csv(contrast, sep=getSeparator(contrast),
+                         check.names = FALSE, stringsAsFactors = FALSE, row.names = 1)
+              } else if (length(contrast) == 2) {
+                # This is useful for GSEA output, that splits up and down in two tsv files.
+                # We read both files and set the direction
+                up <- read.csv(contrast[["up"]], sep=getSeparator(contrast[["up"]]),
+                               check.names = FALSE, stringsAsFactors = FALSE, row.names = 1)
+                up$Direction <- rep("Up", nrow(up))
+                down <- read.csv(contrast[["down"]], sep=getSeparator(contrast[["down"]]),
+                               check.names = FALSE, stringsAsFactors = FALSE, row.names = 1)
+                down$Direction <- rep("Down", nrow(down))
+                rbind(up, down)
+              } else {
+                stop("gene_set_analyses should have zero, one or two contrast files per gene_set_type")
+              }
            })
          })
        })
+
+        ese_list$gene_set_analyses <- remove_nulls(ese_list$gene_set_analyses)


Maybe a helper function here?

read_enrichment_file <- function(contrast_spec) { if (is.null(contrast_spec) || length(contrast_spec) == 0) { return(NULL) } read_one <- function(path) { read.csv(path, sep = getSeparator(path), check.names = FALSE, stringsAsFactors = FALSE, row.names = 1) } if (length(contrast_spec) == 1) { return(read_one(contrast_spec)) } if (length(contrast_spec) == 2) { up <- read_one(contrast_spec[["up"]]) up$Direction <- "Up" down <- read_one(contrast_spec[["down"]]) down$Direction <- "Down" return(rbind(up, down)) } stop("gene_set_analyses should have 0, 1, or 2 files per contrast") } # Then simplify the nested lapply: ese_list$gene_set_analyses <- lapply(exp$gene_set_analyses, function(assay) { lapply(assay, function(gene_set_type) { lapply(gene_set_type, read_enrichment_file) }) })

pinin4fjords · 2025-11-11T14:06:36Z

R/accessory.R

+# Gene set enrichment format and filtering columns
+# gst: A data frame coming from the gene enrichment tool (roast, gsea)
+# gs_tool: A string, the tool used to get gst ("auto", "gsea", "roast")
+get_gst_columns <- function(gst, gs_tool) {


This would probably be easier to maintain with some more separation of concerns. Rather than having the function do lots of things and unpack:

detect_enrichment_tool <- function(gst) { if ("NOM p-val" %in% colnames(gst)) return("gsea") if (any(c("p value", "PValue") %in% colnames(gst))) return("roast") stop("Could not detect enrichment tool from column names") } get_column_mapping <- function(gs_tool) { mappings <- list( roast = list(pvalue = "PValue", fdr = "FDR", direction = "Direction"), gsea = list(pvalue = "NOM p-val", fdr = "FDR q-val", direction = "Direction") ) # Handle roast variant with lowercase "p value" if (gs_tool == "roast") { mappings$roast$pvalue <- "PValue" # Check and override below } mappings[[gs_tool]] } clean_gsea_columns <- function(gst) { cols_to_remove <- c("GS<br> follow link to MSigDB", "GS DETAILS") gst[, !(names(gst) %in% cols_to_remove), drop = FALSE] }

pinin4fjords · 2025-11-11T14:10:17Z

exec/make_app_from_files.R

+        # "gsea_for_{contrast_name}_using_{geneset_type}_results.tsv"
+        # becomes:
+        # "gsea_for_disease_vs_healthy_using_geo_bp_results.tsv"
+        if (grepl("target\\|reference", opt$enrichment_filename_template)) {


We should centralise the path building logic so we only have to maintain it in one place:

build_enrichment_path <- function(template, contrast_info, geneset_type, direction = NULL) { path <- template path <- gsub("{contrast_name}", contrast_info$id, path, fixed = TRUE) path <- gsub("{geneset_type}", geneset_type, path, fixed = TRUE) if (!is.null(direction)) { target_val <- if (direction == "up") contrast_info$target else contrast_info$reference path <- gsub("{target|reference}", target_val, path, fixed = TRUE) } path } # Then use: if (grepl("\\{target\\|reference\\}", template)) { up_file <- build_enrichment_path(template, ctrst_info, geneset_type, "up") down_file <- build_enrichment_path(template, ctrst_info, geneset_type, "down") # ... rest of logic } else { enrichment_file <- build_enrichment_path(template, ctrst_info, geneset_type) # ... rest of logic }

Function will probably need to go in accessory.R

pinin4fjords

Thanks again for the work! I know you're still working so I won't comment too much on lack of tests / docs (though the AI has done a bit of that).

In general it would be good if as much of the new logic as possible is in functions where it can have tests, rather than e.g. in large tranches of logic in the exec script.

zeehio · 2025-11-12T08:42:14Z

I agree with your suggestions and I'll follow up implementing them on the following days. Thank you very much for your review!

We know, and we don't care in this case

The SummarizedExperiment class was removed from GenomicRanges (after having moved it to the SummarizedExperiment package) back in Bioconductor 3.3 [1], released in May 2016 and targetting R 3.3. This commit removes code targeting support for Bioconductor versions older than Bioconductor 3.3. It slightly bumps the R version dependency to explicitly break with that old Bioconductor version. [1] https://bioconductor.org/news/bioc_3_3_release/

Back in Sept 2016, the representation= argument of setClass was documented as old, and the slots= argument was suggested instead [1]. [1] wch/r-source@edadf29 In 2016 it was already old, so now it is considered even older. This commit just updates the syntax to ease maintenance.

Look at gene_set_analyses being used here: https://github.com/pinin4fjords/shinyngs/blob/d2ca950934070c699c839aea45e77053bc461722/R/genesetanalysistable.R#L239 gst <- ese@gene_set_analyses[[assay]][[gene_set_types]][[as.numeric(selected_contrasts)]] It's indexed by assay, geneset type and contrast. Fix the docs so the explanation matches reality.

The gene_set_analyses_tool has a list-based structure, just like gene_set_analyses. gene_set_analyses_tool[[assay]][[gene_set_type]][[contrast]] contains a string, either "auto", "gsea", or "roast". This string can be used by the geneset-related modules to parse and interact with the gene set enrichment tables.

Ensure each given gene_set_analysis has a tool set, and set it to "auto" otherwise.

Just a helper function

Previous library(shinyngs) statements are on eval=FALSE chunks.

When reading csv files from each assay, gene_set type and contrast, allow to not have enrichment results for all combinations.

These are clear documentations and fixes from code review by Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

So it is easier to read

The names are used in the enrichment file name template subsitution, so it is better to avoid character mangling.

zeehio · 2026-01-25T18:55:29Z

It took me a long while to find some time to finish this (definitely longer than expected).

I implemented all the suggested changes, added unit tests to the accessory functions, improved docstrings and rebased.

Happy to get your feedback and hopefully be able to merge this eventually :-)

pinin4fjords · 2026-01-26T10:05:40Z

Looking pretty good now! Do you have some example outputs to show?

zeehio · 2026-01-26T17:07:48Z

I'll find a public dataset and upload some plots as soon as I find some more time.

pinin4fjords self-requested a review October 31, 2025 12:37

pinin4fjords requested a review from Copilot November 11, 2025 13:42

Copilot started reviewing on behalf of pinin4fjords November 11, 2025 13:42 View session

Copilot finished reviewing on behalf of pinin4fjords November 11, 2025 13:47

Copilot AI reviewed Nov 11, 2025

View reviewed changes

pinin4fjords reviewed Nov 11, 2025

View reviewed changes

zeehio marked this pull request as ready for review January 25, 2026 18:42

zeehio added 16 commits January 25, 2026 19:44

Support reading split up/down enrichment result files (GSEA)

4ff5fca

Ignore NA coercion errors

c1d223e

We know, and we don't care in this case

Make sure we have the geneset analysis tool

2d90f90

Ensure each given gene_set_analysis has a tool set, and set it to "auto" otherwise.

Get column names for each enrichment method.

985414d

Just a helper function

Spelling fixes in comments

e94629e

Barcodeplot module allows multiple enrichment methods

59ce319

Geneset analysis module allows multiple enrichment formats

75bd504

Maintenance: Fix deprecation of doctype package

a9e82da

roxygen2 requires UTF-8 encoding, set it

48d23df

devtools::document()

3b8c9a4

Add missing library(shinyngs) on vignette

5bd9b71

Previous library(shinyngs) statements are on eval=FALSE chunks.

Flexible gene_set_analyses spec

5f0a314

When reading csv files from each assay, gene_set type and contrast, allow to not have enrichment results for all combinations.

zeehio and others added 14 commits January 25, 2026 19:44

enrichment results support in make_app_from_files

3a73ac7

Some suggestions from code review

501c97f

These are clear documentations and fixes from code review by Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Add build_enrichment_path function

a12ac20

Remove unused variable

93a55e3

Rewrite check_gene_set_analyses_tool_consistency

7598c9b

So it is easier to read

spelling typo

63a6337

Helper function to read enrichment files

de409a6

Separate concerns for cleaning the enrichment table

bacaeef

Cleanup documentation

1206c4e

Fix names of genesets_files

5078db4

The names are used in the enrichment file name template subsitution, so it is better to avoid character mangling.

roxygenize

25acc4f

test read_enrichment_file

8a88661

test remove_nulls

794eae0

Test accessories

a49dabc

zeehio force-pushed the gsea-support branch from 150e2f4 to a49dabc Compare January 25, 2026 18:50

zeehio changed the title ~~GSEA support (WIP)~~ GSEA support Jan 25, 2026

Output directory should be created recursively

be045c9

Conversation

zeehio commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zeehio commented Nov 8, 2025

Uh oh!

zeehio commented Nov 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pinin4fjords left a comment

Choose a reason for hiding this comment

Uh oh!

zeehio commented Nov 12, 2025

Uh oh!

zeehio commented Jan 25, 2026

Uh oh!

pinin4fjords commented Jan 26, 2026

Uh oh!

zeehio commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zeehio commented Oct 29, 2025 •

edited

Loading