Skip to content

Commit ecfdc89

Browse files
authored
Merge pull request #477 from SusiJo/update_docu_report
Update documentation & minor report fixes
2 parents 13a6a44 + 5d94228 commit ecfdc89

18 files changed

+164
-114
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2525

2626
### Fixed
2727

28+
- [[#476](https://github.com/nf-core/differentialabundance/pull/476)] - Fixed null.csv and warning at top of report ([@SusiJo](https://github.com/SusiJo), reviewed by [@pinin4fjords](https://github.com/pinin4fjords), [@atrigila](https://github.com/atrigila), [@maxulysse](https://github.com/maxulysse))
2829
- [[#358](https://github.com/nf-core/differentialabundance/pull/358)] - Fixed nf-tests not running due to `--changed-since HEAD^`([@atrigila](https://github.com/atrigila), review by [@pinin4fjords](https://github.com/pinin4fjords))
2930
- [[#344](https://github.com/nf-core/differentialabundance/pull/344)] - Fixed replacement of NA sub-strings
3031
([@atrigila](https://github.com/atrigila), suggested by [@BEFH](https://github.com/BEFH), review by [@apeltzer](https://github.com/apeltzer) and [@nschcolnicov](https://github.com/nschcolnicov))
@@ -40,6 +41,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
4041

4142
### Changed
4243

44+
- [[#476](https://github.com/nf-core/differentialabundance/pull/476)] - Update documentation & report fixes ([@SusiJo](https://github.com/SusiJo), reviewed by [@pinin4fjords](https://github.com/pinin4fjords), [@atrigila](https://github.com/atrigila), [@maxulysse](https://github.com/maxulysse))
4345
- [[#468](https://github.com/nf-core/differentialabundance/pull/468)] - Template update for nf-core/tools v3.3.1 ([@SusiJo](https://github.com/SusiJo), reviewed by [@famosab](https://github.com/famosab), [@mashehu](https://github.com/mashehu))
4446
- [[#448](https://github.com/nf-core/differentialabundance/pull/448)] - Simplify toolsheet handling and restructure workflow to use paramset in meta. ([@pinin4fjords](https://github.com/pinin4fjords), review by [@suzannejin](https://github.com/suzannejin) and [@grst](https://github.com/grst))
4547
- [[#431](https://github.com/nf-core/differentialabundance/pull/431)] - Replace the calls to differential and functional analysis modules by subworkflows. ([@suzannejin](https://github.com/suzannejin), review by [@pinin4fjords](https://github.com/pinin4fjords))

assets/differentialabundance_report.Rmd

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ report_subtitle <- paste0(ifelse(is.null(params$report_author), '', paste0('By '
220220
```
221221

222222
---
223-
title: "<img src=\"`r file.path(params$input_dir, params$logo)`\" style=\"float: left;\"/>`r report_title`"
223+
title: "<img id=\"logo\" src=\"`r file.path(params$input_dir, params$logo)`\" style=\"float: left;\"/>`r report_title`"
224224
subtitle: `r report_subtitle`
225225
---
226226
\
@@ -987,8 +987,8 @@ if (!is.null(params$functional_method)){
987987
gmt_name <- basename(tools::file_path_sans_ext(gmt_file))
988988
cat("\n##### ", gmt_name ," {.tabset}\n")
989989
990-
reference_gsea_tables <- paste0(differential_names, ".", gmt_name, '.gsea_report_for_', gsea_contrasts$reference, '.tsv')
991-
target_gsea_tables <- paste0(differential_names, ".", gmt_name, '.gsea_report_for_', gsea_contrasts$target, '.tsv')
990+
reference_gsea_tables <- paste0(gsea_contrasts$id, ".", gmt_name, '.gsea_report_for_', gsea_contrasts$reference, '.tsv')
991+
target_gsea_tables <- paste0(gsea_contrasts$id, ".", gmt_name, '.gsea_report_for_', gsea_contrasts$target, '.tsv')
992992
993993
for (i in seq_len(nrow(gsea_contrasts))) {
994994
cat("\n###### ", contrast_descriptions[i], "\n")
@@ -1025,10 +1025,11 @@ if (!is.null(params$functional_method)){
10251025
ifelse(params$gprofiler2_significant, paste0(" Enrichment was only considered if significant, i.e. adjusted p-value <= ", params$gprofiler2_max_qval, "."), "Enrichment was also considered if not significant."), "\n"))
10261026
10271027
# Make sure to grab only non-empty files
1028-
for (name in differential_names) {
1028+
for (i in seq_along(differential_names)) {
1029+
name <- differential_names[i]
10291030
cat(paste0("\n##### ", name, "\n"))
10301031
1031-
table <- paste0(name, ".gprofiler2.all_enriched_pathways.tsv")
1032+
table <- paste0(contrasts$id[i], ".gprofiler2.all_enriched_pathways.tsv")
10321033
table_path <- file.path(params$input_dir, table)
10331034
if (!file.exists(table_path) || file.size(table_path) == 0){
10341035
cat(paste0("No ", ifelse(params$gprofiler2_significant, "significantly", ""), " enriched pathways were found for this contrast."))

conf/modules.config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -375,6 +375,7 @@ process {
375375

376376
withName: GSEA_GSEA {
377377
ext.prefix = { "${meta.id}.${gene_sets.baseName}." }
378+
378379
publishDir = [
379380
[
380381
path: { "${params.outdir}/report/gsea/${meta.id}/${gene_sets.baseName}" },

docs/output.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ This directory contains the main reporting output of the workflow.
1414
- `report/`
1515
- `*.html`: an HTML report file named according to the value of `params.study_name`, containing graphical and tabular summary results for the workflow run.
1616
- `*.zip`: a zip file containing an R markdown file with parameters set and all necessary input files to open and customise the reporting.
17+
- `gsea/`: Directory containing graphical outputs from GSEA (where enabled). Plots are stored in directories named for the associated contrast.
18+
- `[contrast]/png/[gsea_plot_type].png`
1719

1820
</details>
1921

@@ -38,8 +40,6 @@ Stand-alone graphical outputs are placed in this directory. They may be useful i
3840
- `[contrast]/png/volcano.png`: Volcano plots of -log(10) p value agains log(2) fold changes
3941
- `immunedeconv/`: Directory containing graphical outputs of immunedeconv results
4042
- `${prefix}.plot1_stacked_bar_chart.png`
41-
- `gsea/`: Directory containing graphical outputs from GSEA (where enabled). Plots are stored in directories named for the associated contrast.
42-
- `[contrast]/png/[gsea_plot_type].png`
4343
- `gprofiler2/`: Directory containing graphical outputs from gprofiler2 (where enabled). Plots are stored in directories named for the associated contrast.
4444
- `[contrast]/[contrast].gprofiler2.[source].gostplot.html`: An interactive gprofiler2 Manhattan plot of enriched pathways from one specific source/database, e.g. REAC
4545
- `[contrast]/[contrast].gprofiler2.[source].gostplot.png`: A static gprofiler2 Manhattan plot of enriched pathways from one specific source/database, e.g. REAC
@@ -92,7 +92,7 @@ The `differential` folder is likely to be the core result set for most users, co
9292
<summary>Output files</summary>
9393

9494
- `shinyngs_app/`
95-
- `[study name]`:
95+
- `[study_name]`:
9696
- `data.rds`: serialized R object which can be used to generate a Shiny application
9797
- `app.R`: minimal R script that will source the data object and generate the app
9898

docs/usage.md

Lines changed: 28 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
77
## Introduction
88

9-
Differential analysis is a common task in a variety of use cases. In essence, all these use cases entail taking an input matrix containing features (e.g. genes) and observations (e.g. samples), and comparing groups of observations in all or a subset of the features. The feature/ observation language here reflects our hope that this workflow will extend in future to encompass a variety of applications where an assumption of gene vs sample may not be a valid one- though that is the application to which the first release will apply.
9+
Differential analysis is a common task in a variety of use cases. In essence, all these use cases entail taking an input matrix containing features (e.g. genes) and observations (e.g. samples), and comparing groups of observations in all or a subset of the features. The feature/ observation language here reflects our hope that this workflow will extend in future to encompass a variety of applications where an assumption of gene vs sample may not be a valid one - though that is the application to which the first release will apply.
1010

1111
With the above in mind, running this workflow requires:
1212

@@ -20,14 +20,14 @@ With the above in mind, running this workflow requires:
2020
## Observations (samplesheet) input
2121

2222
```bash
23-
--input '[path to samplesheet file]'
23+
--input '[path to samplesheet file].(csv|tsv)'
2424
```
2525

26-
This may well be the same sample sheet used to generate the input matrix. For example, in RNA-seq this might be the same sample sheet, perhaps derived from [fetchngs](https://github.com/nf-core/fetchngs), that was input to the [RNA-seq workflow](https://github.com/nf-core/rnaseq). It may be necessary to add columns that describe the groups you want to compare. The columns that the pipeline requires are:
26+
The samplesheet file can be tab or comma separated. This may well be the same sample sheet used to generate the input matrix. For example, in RNA-seq this might be the same sample sheet, perhaps derived from [fetchngs](https://github.com/nf-core/fetchngs), that was input to the [RNA-seq workflow](https://github.com/nf-core/rnaseq). It may be necessary to add columns that describe the groups you want to compare. The columns that the pipeline requires are:
2727

28-
- a column listing the sample IDs (must be the same IDs as in the abundance matrix), in the example below it is called 'sample'. For some study_types, this column might need to be filled in with file names, e.g. when doing an affymetrix analysis.
29-
- one or more columns describing conditions for the differential analysis. In the example below it is called 'condition'
30-
- optionally one or more columns describing sample batches or similar which you want to be considered in the analysis. In the example below it is called 'batch'
28+
- a column listing the sample IDs (must be the same IDs as in the abundance matrix), in the example below it is called `sample`. For some study_types, this column might need to be filled in with file names, e.g. when doing an affymetrix analysis.
29+
- one or more columns describing conditions for the differential analysis. In the example below it is called `condition`
30+
- optionally one or more columns describing sample batches or similar which you want to be considered in the analysis. In the example below it is called `batch`
3131

3232
For example:
3333

@@ -41,8 +41,6 @@ TREATED_REP2,AEG588A2_S1_L003_R1_001.fastq.gz,AEG588A2_S1_L003_R2_001.fastq.gz,t
4141
TREATED_REP3,AEG588A2_S1_L004_R1_001.fastq.gz,AEG588A2_S1_L004_R2_001.fastq.gz,treated,3,B
4242
```
4343

44-
The file can be tab or comma separated.
45-
4644
### Affymetrix arrays
4745

4846
Abundances for Affy arrays are provided in CEL files within an archive. When creating sample sheets for Affy arrays, it's crucial to include a column that specifies which file corresponds to each sample. This file column is essential for linking each sample to its corresponding data file, as shown in the example below:
@@ -59,30 +57,30 @@ Abundances for Affy arrays are provided in CEL files within an archive. When cre
5957
"GSM1229348_Gudjohnsson_008_8470_PN.CEL.gz","GSM1229348","p8470_PN","6690","uninvolved"
6058
```
6159

62-
The "file" column in this example is used to specify the data file associated with each sample, which is essential for data analysis and interpretation.
60+
The `file` column in this example is used to specify the data file associated with each sample, which is essential for data analysis and interpretation.
6361

6462
## Abundance values
6563

6664
### RNA-seq and similar
6765

6866
```bash
69-
--matrix '[path to matrix file]'
67+
--matrix '[path to matrix file].(csv|tsv)'
7068
```
7169

72-
This is a numeric square matrix file, comma or tab-separated, with a column for every observation, and features corresponding to the supplied feature set. The parameters `--observations_id_col` and `--features_id_col` define which of the associated fields should be matched in those inputs.
70+
This is a numeric matrix file, comma or tab-separated, with features as rows and observations in columns. The features correspond to the supplied feature set. The parameters `--observations_id_col` and `--features_id_col` define which of the associated fields should be matched in those inputs.
7371

7472
#### Outputs from nf-core/rnaseq and other tximport-processed results
7573

76-
The nf-core RNAseq workflow incorporates [tximport](https://bioconductor.org/packages/release/bioc/html/tximport.html) for producing quantification matrices. From [version 3.12.2](https://github.com/nf-core/rnaseq/releases/tag/3.13.2), it additionally provides transcript length matrices which can be directly consumed by DESeq2 to model length bias across samples.
74+
The nf-core RNAseq workflow incorporates [tximport](https://bioconductor.org/packages/release/bioc/html/tximport.html) for producing quantification matrices. From [version 3.12.2](https://github.com/nf-core/rnaseq/releases/tag/3.13.2), it additionally provides transcript/gene length matrices which can be directly consumed by DESeq2 to model length bias across samples.
7775

78-
To use this approach, include the transcript lengths file with the **raw counts**:
76+
To use this approach, include the corresponding lengths file with the **raw counts**:
7977

8078
```bash
8179
--matrix 'salmon.merged.gene_counts.tsv' \
8280
--transcript_length_matrix 'salmon.merged.gene_lengths.tsv'
8381
```
8482

85-
Without the transcript lengths, for instance in earlier rnaseq workflow versions, follow the second recommendation in the [tximport documentation](https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#Downstream_DGE_in_Bioconductor):
83+
Without the transcript/gene lengths, for instance in earlier rnaseq workflow versions, follow the second recommendation in the [tximport documentation](https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#Downstream_DGE_in_Bioconductor):
8684

8785
> "Use the tximport argument `countsFromAbundance='lengthScaledTPM'` or `'scaledTPM'`, then employ the gene-level count matrix `txi$counts` directly in downstream software, a method we call 'bias corrected counts without an offset'"
8886
@@ -92,7 +90,7 @@ It is important to note that the documentation advises:
9290

9391
> "Do not manually pass the original gene-level counts to downstream methods without an offset."
9492
95-
So we **do not recommend** raw counts files such as `salmon.merged.gene_counts.tsv` as input for this workflow **except** where the transcript lengths are also provided.
93+
So we **do not recommend** raw counts files such as `salmon.merged.gene_counts.tsv` as input for this workflow **except** where the transcript/gene lengths are also provided.
9694

9795
### MaxQuant intensities
9896

@@ -130,13 +128,13 @@ Full list of features metadata are available on GEO platform pages.
130128

131129
The contrasts file references the observations file to define groups of samples to compare. It can be provided in **either** CSV/TSV or YAML format using the parameters `--contrasts` or `--contrasts_yml`, respectively.
132130

133-
### CSV contrasts file
131+
### CSV/TSV contrasts file
134132

135133
```bash
136-
--contrasts '[path to CSV contrasts file]'
134+
--contrasts '[path to contrasts file].(csv|tsv)'
137135
```
138136

139-
The contrasts file references the observations file to define groups of samples to compare. For example, based on the sample sheet above we could define contrasts like:
137+
Based on the sample sheet above we could define contrasts as indicated below:
140138

141139
```csv
142140
id,variable,reference,target,blocking
@@ -154,9 +152,7 @@ The necessary fields in order are:
154152
You can optionally supply:
155153

156154
- `blocking` - semicolon-delimited, any additional variables (also observation columns) that should be modelled alongside the contrast variable
157-
- `exclude_samples_col` and `exclude_samples_values` - the former being a valid column in the samples sheet, the latter a semicolon-delimited list of values in that column which should be used to select samples prior to differential modelling. This is helpful where certain samples need to be exluded prior to analysis of a given contrast.
158-
159-
The file can be tab or comma separated.
155+
- `exclude_samples_col` and `exclude_samples_values` - the former being a valid column in the samples sheet, the latter a semicolon-delimited list of values in that column which should be used to select samples prior to differential modelling. This is helpful where certain samples need to be excluded prior to analysis of a given contrast.
160156

161157
### YAML contrasts file format
162158

@@ -264,7 +260,7 @@ To run the pipeline with a specific config row, you can use the `--paramset_name
264260

265261
We provide a `paramsheet.csv` file in the `assets` directory that defines the parameter sets and tool parameters that make sense to run together, for specific study types.
266262

267-
Each row defines a combination of differential analysis tool and functional analysis tool (optional), with the respective arguments.
263+
Each row defines a combination of a differential analysis tool and a functional analysis tool (optional), with the respective arguments.
268264

269265
To run a given combination of tools, you can use the `--paramset_name` parameter.
270266

@@ -427,7 +423,8 @@ nextflow run nf-core/differentialabundance \
427423
[--gtf mouse.gtf OR --features features.tsv] \
428424
--outdir <OUTDIR> \
429425
-profile docker \
430-
[--paramset_name <paramset_name>]
426+
[--paramset_name <paramset_name>] \
427+
--report_contributors $'Jane Doe\nDirector of Institute of Microbiology\nUniversity of Smallville;John Smith\nPhD student\nInstitute of Microbiology\nUniversity of Smallville'
431428
```
432429

433430
This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.
@@ -460,7 +457,7 @@ process {
460457
}
461458
```
462459

463-
You will not get the final reporting outcomes of the workflow, but you will get the differential tables produced by DESeq2 or Limma, and the results of any gene seta analysis you have enabled.
460+
You will not get the final reporting outcomes of the workflow, but you will get the differential tables produced by DESeq2 or Limma, and the results of any gene sets analysis you have enabled.
464461

465462
We have also added a dedicated pipeline parameter, `--skip_reports` that allows you to skip only the RMarkdown notebook and bundled report while leaving other reporting processes active. The `RMARKDOWNNOTEBOOK` process assumes that every grouping variable you pass to it (from the contrasts file’s variable column or PCA-derived informative_variables) exists as a valid, named column in your sample metadata. If you know your metadata or contrasts might be incomplete or non-standard (such as using formula-based yaml files), the you can use this flag to skip these steps.
466463

@@ -489,6 +486,13 @@ with:
489486
input: './samplesheet.csv'
490487
outdir: './results/'
491488
genome: 'GRCh37'
489+
report_contributors: |
490+
Jane Doe
491+
Director of Institute of Microbiology
492+
University of Smallville;John Smith
493+
PhD student
494+
Institute of Microbiology
495+
University of Smallville
492496
<...>
493497
```
494498

0 commit comments

Comments
 (0)