Skip to content

Commit a9e97d0

Browse files
authored
Merge pull request #443 from nf-core/add_toolsheet
Add toolsheet-related implementations
2 parents 6d3e353 + 641ff2d commit a9e97d0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+4016
-1806
lines changed

.github/workflows/ci.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@ jobs:
4444
- "maxquant"
4545
- "soft"
4646
- "rnaseq_limma"
47+
- "rnaseq_dream"
48+
- "custom_paramsheet"
4749
compute_profile:
4850
- "docker"
4951
- "singularity"

.nf-core.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
lint:
22
files_exist:
33
- assets/multiqc_config.yml
4+
files_unchanged:
5+
- .github/CONTRIBUTING.md
46
multiqc_config: false
57
nextflow_config:
68
- config_defaults:

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
### Added
99

10+
- [[#443](https://github.com/nf-core/differentialabundance/pull/443)] - Add toolsheet-related implementations. ([@suzannejin](https://github.com/suzannejin), review by [@pinin4fjords](https://github.com/pinin4fjords), [@mirpedrol](https://github.com/mirpedrol) and [@JoseEspinosa](https://github.com/JoseEspinosa))
1011
- [[#450](https://github.com/nf-core/differentialabundance/pull/441)] - Allow usage of strings for makeContrasts in DREAM. ([@atrigila](https://github.com/atrigila), review by [@pinin4fjords](https://github.com/pinin4fjords), [@suzannejin](https://github.com/suzannejin) and [@grst](https://github.com/grst)).
1112
- [[#441](https://github.com/nf-core/differentialabundance/pull/441)] - Add dream differential module. ([@nschcolnicov](https://github.com/nschcolnicov) and [@alanmmobbs93](https://github.com/alanmobbs93), review by [@pinin4fjords](https://github.com/pinin4fjords), [@suzannejin](https://github.com/suzannejin) and [@grst](https://github.com/grst)).
1213
- [[#440](https://github.com/nf-core/differentialabundance/pull/440)] - Add handling for formula based contrasts. ([@nschcolnicov](https://github.com/nschcolnicov), review by [@pinin4fjords](https://github.com/pinin4fjords))
@@ -38,6 +39,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3839

3940
### Changed
4041

42+
- [[#448](https://github.com/nf-core/differentialabundance/pull/448)] - Simplify toolsheet handling and restructure workflow to use paramset in meta. ([@pinin4fjords](https://github.com/pinin4fjords), review by [@suzannejin](https://github.com/suzannejin) and [@grst](https://github.com/grst))
4143
- [[#431](https://github.com/nf-core/differentialabundance/pull/431)] - Replace the calls to differential and functional analysis modules by subworkflows. ([@suzannejin](https://github.com/suzannejin), review by [@pinin4fjords](https://github.com/pinin4fjords))
4244
- [[#410](https://github.com/nf-core/differentialabundance/pull/410)] - Update contrasts file format to allow yaml ([@nschcolnicov](https://github.com/nschcolnicov), review by [@pinin4fjords](https://github.com/pinin4fjords)).
4345
- [[#374](https://github.com/nf-core/differentialabundance/pull/374)] - Update all modules and subworkflows ([@nschcolnicov](https://github.com/nschcolnicov), review by [@pinin4fjords](https://github.com/pinin4fjords)).

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,21 @@ Affymetrix microarray:
9595
> [!WARNING]
9696
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).
9797
98+
The paramsheet file (ie. `paramsheet.csv`) stored in the `assets` directory defines the combination of tools and parameters that make sense to run for a given study type. You can use the flag `--paramset_name` to specify which set of tools to run. For example:
99+
100+
```bash
101+
nextflow run nf-core/differentialabundance \
102+
--input samplesheet.csv \
103+
--contrasts contrasts.yaml \
104+
--matrix assay_matrix.tsv \
105+
--gtf mouse.gtf \
106+
--outdir <OUTDIR> \
107+
-profile rnaseq,<docker/singularity/podman/shifter/charliecloud/conda/institute> \
108+
--paramset_name deseq2_rnaseq_gprofiler2
109+
```
110+
111+
You could also provide your own paramsheet through the `--paramsheet` parameter.
112+
98113
For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/differentialabundance/usage) and the [parameter documentation](https://nf-co.re/differentialabundance/parameters).
99114

100115
### Reporting

assets/differentialabundance_report.Rmd

Lines changed: 54 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ params:
2626
report_author: NULL,
2727
report_description: NULL,
2828
report_scree: NULL
29-
report_round_digits: NULL
29+
round_digits: NULL
3030
observations_type: NULL
3131
observations: NULL # GSE156533.samplesheet.csv
3232
observations_id_col: NULL
@@ -90,6 +90,7 @@ params:
9090
filtering_min_abundance: 1
9191
filtering_min_proportion: NULL
9292
filtering_grouping_var: NULL
93+
differential_method: NULL
9394
differential_file_suffix: NULL
9495
differential_feature_id_column: NULL
9596
differential_feature_name_column: NULL
@@ -118,7 +119,7 @@ params:
118119
deseq2_cores: NULL
119120
deseq2_vs_blind: NULL
120121
deseq2_vst_nsub: NULL
121-
gsea_run: false
122+
functional_method: NULL
122123
gsea_nperm: NULL
123124
gsea_permute: NULL
124125
gsea_scoring_scheme: NULL
@@ -137,7 +138,6 @@ params:
137138
gsea_save_rnd_lists: NULL
138139
gsea_zip_report: NULL
139140
gsea_chip_file: NULL
140-
gprofiler2_run: false
141141
gprofiler2_organism: NULL
142142
gprofiler2_significant: NULL
143143
gprofiler2_measure_underrepresentation: NULL
@@ -176,7 +176,7 @@ round_dataframe_columns <- function(df, columns = NULL, digits = -1) {
176176
if (is.null(columns)) {
177177
columns <- colnames(df)[(unlist(lapply(df, is.numeric), use.names=F))] # extract only numeric columns for rounding
178178
}
179-
df[,columns] <- format(data.frame(df[, columns], check.names = FALSE), scientific=T, digits=params$report_round_digits)
179+
df[,columns] <- format(data.frame(df[, columns], check.names = FALSE), scientific=T, digits=params$round_digits)
180180
# Convert columns back to numeric
181181
182182
for (c in columns) {
@@ -364,14 +364,12 @@ treatment-mCherry-hND6-batcheffect.deseq2.results.tsv
364364
```{r, echo=FALSE}
365365
differential_file_suffix <- params$differential_file_suffix
366366
if (is.null(differential_file_suffix)) {
367-
differential_file_suffix <- ifelse(params$study_type %in% c('rnaseq'), ".deseq2.results.tsv", ".limma.results.tsv")
367+
differential_file_suffix <- paste0(".", params$differential_method, ".results.tsv")
368368
}
369369
differential_files <- lapply(contrasts$id, function(d){
370370
file.path(params$input_dir, paste0(gsub(' |;', '_', d), '_', params$study_name, differential_file_suffix))
371371
})
372-
differential_names <- lapply(contrasts$id, function(d){
373-
paste0(gsub(' |;', '_', d), '_', params$study_name)
374-
})
372+
differential_names <- paste0(contrasts$id, '_', params$study_name)
375373
376374
# Initialize vector to store warning messages before merging tables
377375
warnings_list <- c()
@@ -535,7 +533,7 @@ contrasts_to_print <- contrasts
535533
colnames(contrasts_to_print) <- prettifyVariablename(colnames(contrasts_to_print))
536534
537535
# Add design/model formulae to report
538-
de_tool <- ifelse(params$study_type %in% c('rnaseq'), "deseq2", "limma")
536+
de_tool <- params$differential_method
539537
contrasts_to_print$model <- sapply(contrasts_to_print$Id, function(id) {
540538
model_file <- paste0(id, ".", de_tool, ".model.txt")
541539
if (file.exists(model_file)) {
@@ -934,7 +932,7 @@ for (i in 1:nrow(contrasts)){
934932
colnames(contrast_de) <- prettifyVariablename(colnames(contrast_de))
935933
936934
if (nrow(contrast_de) > 0){
937-
contrast_de <- round_dataframe_columns(contrast_de, digits=params$report_round_digits)
935+
contrast_de <- round_dataframe_columns(contrast_de, digits=params$round_digits)
938936
print( htmltools::tagList(datatable(contrast_de, caption = paste('Differential genes', dir, 'in', contrast_descriptions[i], " (check", differential_files[[i]], "for more detail)"), rownames = FALSE) ))
939937
940938
if ("Gene biotype" %in% colnames(contrast_de)) {
@@ -969,56 +967,51 @@ for (i in 1:nrow(contrasts)){
969967
<!-- Gene set analysis results -->
970968

971969
```{r, echo=FALSE, results='asis'}
972-
possible_gene_set_methods <- c('gsea', 'gprofiler2')
973-
if (any(unlist(params[paste0(possible_gene_set_methods, '_run')]))){
970+
if (!is.null(params$functional_method)){
974971
cat("\n### Gene set analysis\n")
972+
cat("\n#### ", toupper(params$functional_method) ," {.tabset}\n")
973+
974+
if (params$functional_method == 'gsea') {
975+
for (gmt_file in simpleSplit(params$gene_sets_files)) {
976+
gmt_name <- basename(tools::file_path_sans_ext(gmt_file))
977+
cat("\n##### ", gmt_name ," {.tabset}\n")
978+
979+
reference_gsea_tables <- paste0(differential_names, ".", gmt_name, '.gsea_report_for_', contrasts$reference, '.tsv')
980+
target_gsea_tables <- paste0(differential_names, ".", gmt_name, '.gsea_report_for_', contrasts$target, '.tsv')
981+
for (i in 1:nrow(contrasts)){
982+
cat("\n###### ", contrast_descriptions[i], "\n")
983+
target_gsea_results <- read_metadata(target_gsea_tables[i])[,c(-2,-3)]
984+
target_gsea_results <- round_dataframe_columns(target_gsea_results, digits=params$round_digits)
985+
print( htmltools::tagList(datatable(target_gsea_results, caption = paste0("\nTarget (", contrasts$target[i], ")\n"), rownames = FALSE) ))
986+
ref_gsea_results <- read_metadata(reference_gsea_tables[i])[,c(-2,-3)]
987+
ref_gsea_results <- round_dataframe_columns(ref_gsea_results, digits=params$round_digits)
988+
print( htmltools::tagList(datatable(ref_gsea_results, caption = paste0("\nReference (", contrasts$reference[i], ")\n"), rownames = FALSE) ))
989+
}
990+
}
975991
976-
for (gene_set_method in possible_gene_set_methods){
977-
if (unlist(params[paste0(gene_set_method, '_run')])){
978-
cat("\n#### ", toupper(gene_set_method) ," {.tabset}\n")
979-
if (gene_set_method == 'gsea') {
980-
for (gmt_file in simpleSplit(params$gene_sets_files)) {
981-
gmt_name <- basename(tools::file_path_sans_ext(gmt_file))
982-
cat("\n##### ", gmt_name ," {.tabset}\n")
983-
984-
reference_gsea_tables <- paste0(differential_names, ".", gmt_name, '.gsea_report_for_', contrasts$reference, '.tsv')
985-
target_gsea_tables <- paste0(differential_names, ".", gmt_name, '.gsea_report_for_', contrasts$target, '.tsv')
986-
for (i in 1:nrow(contrasts)){
987-
cat("\n###### ", contrast_descriptions[i], "\n")
988-
target_gsea_results <- read_metadata(target_gsea_tables[i])[,c(-2,-3)]
989-
target_gsea_results <- round_dataframe_columns(target_gsea_results, digits=params$report_round_digits)
990-
print( htmltools::tagList(datatable(target_gsea_results, caption = paste0("\nTarget (", contrasts$target[i], ")\n"), rownames = FALSE) ))
991-
ref_gsea_results <- read_metadata(reference_gsea_tables[i])[,c(-2,-3)]
992-
ref_gsea_results <- round_dataframe_columns(ref_gsea_results, digits=params$report_round_digits)
993-
print( htmltools::tagList(datatable(ref_gsea_results, caption = paste0("\nReference (", contrasts$reference[i], ")\n"), rownames = FALSE) ))
994-
}
995-
}
996-
997-
} else if (gene_set_method == 'gprofiler2') {
998-
999-
cat(paste0("\nThis section contains the results tables of the pathway analysis which was done with the R package gprofiler2. The differential fraction is the number of differential genes in a pathway divided by that pathway's size, i.e. the number of genes annotated for the pathway.",
1000-
ifelse(params$gprofiler2_significant, paste0(" Enrichment was only considered if significant, i.e. adjusted p-value <= ", params$gprofiler2_max_qval, "."), "Enrichment was also considered if not significant."), "\n"))
1001-
1002-
# Make sure to grab only non-empty files
1003-
for (name in differential_names) {
1004-
cat(paste0("\n##### ", name, "\n"))
1005-
1006-
table <- paste0(name, ".gprofiler2.all_enriched_pathways.tsv")
1007-
table_path <- file.path(params$input_dir, table)
1008-
if (!file.exists(table_path) || file.size(table_path) == 0){
1009-
cat(paste0("No ", ifelse(params$gprofiler2_significant, "significantly", ""), " enriched pathways were found for this contrast."))
1010-
} else {
1011-
all_enriched <- read.table(table_path, header=T, sep="\t", quote="\"")
1012-
all_enriched <- data.frame("Pathway name" = all_enriched$term_name, "Pathway code" = all_enriched$term_id,
1013-
"Differential features" = all_enriched$intersection_size, "Pathway size" = all_enriched$term_size,
1014-
"Differential fraction" = (all_enriched$intersection_size/all_enriched$term_size),
1015-
"Adjusted p value" = all_enriched$p_value, check.names = FALSE)
1016-
all_enriched <- round_dataframe_columns(all_enriched, digits=params$report_round_digits)
1017-
print(htmltools::tagList(datatable(all_enriched, caption = paste('Enriched pathways in', name, " (check", table, "for more detail)"), rownames = FALSE)))
1018-
}
1019-
cat("\n")
992+
} else if (params$functional_method == 'gprofiler2') {
993+
994+
cat(paste0("\nThis section contains the results tables of the pathway analysis which was done with the R package gprofiler2. The differential fraction is the number of differential genes in a pathway divided by that pathway's size, i.e. the number of genes annotated for the pathway.",
995+
ifelse(params$gprofiler2_significant, paste0(" Enrichment was only considered if significant, i.e. adjusted p-value <= ", params$gprofiler2_max_qval, "."), "Enrichment was also considered if not significant."), "\n"))
996+
997+
# Make sure to grab only non-empty files
998+
for (name in differential_names) {
999+
cat(paste0("\n##### ", name, "\n"))
1000+
1001+
table <- paste0(name, ".gprofiler2.all_enriched_pathways.tsv")
1002+
table_path <- file.path(params$input_dir, table)
1003+
if (!file.exists(table_path) || file.size(table_path) == 0){
1004+
cat(paste0("No ", ifelse(params$gprofiler2_significant, "significantly", ""), " enriched pathways were found for this contrast."))
1005+
} else {
1006+
all_enriched <- read.table(table_path, header=T, sep="\t", quote="\"")
1007+
all_enriched <- data.frame("Pathway name" = all_enriched$term_name, "Pathway code" = all_enriched$term_id,
1008+
"Differential features" = all_enriched$intersection_size, "Pathway size" = all_enriched$term_size,
1009+
"Differential fraction" = (all_enriched$intersection_size/all_enriched$term_size),
1010+
"Adjusted p value" = all_enriched$p_value, check.names = FALSE)
1011+
all_enriched <- round_dataframe_columns(all_enriched, digits=params$round_digits)
1012+
print(htmltools::tagList(datatable(all_enriched, caption = paste('Enriched pathways in', name, " (check", table, "for more detail)"), rownames = FALSE)))
10201013
}
1021-
}
1014+
cat("\n")
10221015
}
10231016
}
10241017
}
@@ -1066,7 +1059,7 @@ make_params_table('exploratory analysis', 'exploratory_', remove_pattern = TRUE)
10661059

10671060

10681061
```{r, echo=FALSE, results='asis'}
1069-
if (params$study_type == 'rnaseq'){
1062+
if (params$differential_method == 'deseq2'){
10701063
make_params_table('DESeq2', 'deseq2_', remove_pattern = TRUE)
10711064
}
10721065
make_params_table('downstream differential analysis', 'differential_', remove_pattern = TRUE)
@@ -1075,18 +1068,10 @@ make_params_table('downstream differential analysis', 'differential_', remove_pa
10751068
<!-- If any gene set methods have been activated show their params -->
10761069

10771070
```{r, echo=FALSE, results='asis'}
1078-
possible_gene_set_methods <- c('gsea', 'gprofiler2')
1079-
1080-
if (any(unlist(params[paste0(possible_gene_set_methods, '_run')]))){
1071+
if (!is.null(params$functional_method)){
10811072
cat("\n### Gene set analysis\n")
1082-
1083-
for (gene_set_method in possible_gene_set_methods){
1084-
if (unlist(params[paste0(gene_set_method, '_run')])){
1085-
cat("\n#### ", toupper(gene_set_method) ," {.tabset}\n")
1086-
make_params_table(toupper(gene_set_method), paste0(gene_set_method, '_'), remove_pattern = TRUE)
1087-
}
1088-
}
1089-
1073+
cat("\n#### ", toupper(params$functional_method) ," {.tabset}\n")
1074+
make_params_table(toupper(params$functional_method), paste0(params$functional_method, '_'), remove_pattern = TRUE)
10901075
}
10911076
```
10921077

assets/paramsheet.csv

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
paramset_name,study_type,differential_method,limma_use_voom,functional_method
2+
deseq2_rnaseq,rnaseq,deseq2,,
3+
deseq2_rnaseq_gsea,rnaseq,deseq2,,gsea
4+
deseq2_rnaseq_gprofiler2,rnaseq,deseq2,,gprofiler2
5+
limma_rnaseq,rnaseq,limma,true,
6+
limma_rnaseq_gsea,rnaseq,limma,true,gsea
7+
limma_rnaseq_gprofiler2,rnaseq,limma,true,gprofiler2
8+
dream_rnaseq,rnaseq,dream,,
9+
dream_rnaseq_gsea,rnaseq,dream,,gsea
10+
dream_rnaseq_gprofiler2,rnaseq,dream,,gprofiler2
11+
limma_affy,affy_array,limma,,
12+
limma_affy_gsea,affy_array,limma,,gsea
13+
limma_affy_gprofiler2,affy_array,limma,,gprofiler2
14+
limma_soft,geo_soft_file,limma,,
15+
limma_maxquant,maxquant,limma,,

conf/affy.config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ params {
3030
exploratory_log2_assays = null
3131

3232
// Differential options
33+
differential_method = "limma"
3334
differential_file_suffix = ".limma.results.tsv"
3435
differential_fc_column = "logFC"
3536
differential_pval_column = "P.Value"

conf/maxquant.config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ params {
3131
exploratory_log2_assays = null
3232

3333
// Differential options
34+
differential_method = "limma"
3435
differential_file_suffix = ".limma.results.tsv"
3536
differential_fc_column = "logFC"
3637
differential_pval_column = "P.Value"

0 commit comments

Comments
 (0)