You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,6 +25,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
25
25
26
26
### Fixed
27
27
28
+
-[[#476](https://github.com/nf-core/differentialabundance/pull/476)] - Fixed null.csv and warning at top of report ([@SusiJo](https://github.com/SusiJo), reviewed by [@pinin4fjords](https://github.com/pinin4fjords), [@atrigila](https://github.com/atrigila), [@maxulysse](https://github.com/maxulysse))
28
29
-[[#358](https://github.com/nf-core/differentialabundance/pull/358)] - Fixed nf-tests not running due to `--changed-since HEAD^`([@atrigila](https://github.com/atrigila), review by [@pinin4fjords](https://github.com/pinin4fjords))
29
30
-[[#344](https://github.com/nf-core/differentialabundance/pull/344)] - Fixed replacement of NA sub-strings
30
31
([@atrigila](https://github.com/atrigila), suggested by [@BEFH](https://github.com/BEFH), review by [@apeltzer](https://github.com/apeltzer) and [@nschcolnicov](https://github.com/nschcolnicov))
@@ -40,6 +41,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
-[[#468](https://github.com/nf-core/differentialabundance/pull/468)] - Template update for nf-core/tools v3.3.1 ([@SusiJo](https://github.com/SusiJo), reviewed by [@famosab](https://github.com/famosab), [@mashehu](https://github.com/mashehu))
44
46
-[[#448](https://github.com/nf-core/differentialabundance/pull/448)] - Simplify toolsheet handling and restructure workflow to use paramset in meta. ([@pinin4fjords](https://github.com/pinin4fjords), review by [@suzannejin](https://github.com/suzannejin) and [@grst](https://github.com/grst))
45
47
-[[#431](https://github.com/nf-core/differentialabundance/pull/431)] - Replace the calls to differential and functional analysis modules by subworkflows. ([@suzannejin](https://github.com/suzannejin), review by [@pinin4fjords](https://github.com/pinin4fjords))
@@ -1025,10 +1025,11 @@ if (!is.null(params$functional_method)){
1025
1025
ifelse(params$gprofiler2_significant, paste0(" Enrichment was only considered if significant, i.e. adjusted p-value <= ", params$gprofiler2_max_qval, "."), "Enrichment was also considered if not significant."), "\n"))
Copy file name to clipboardExpand all lines: docs/output.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,8 @@ This directory contains the main reporting output of the workflow.
14
14
-`report/`
15
15
-`*.html`: an HTML report file named according to the value of `params.study_name`, containing graphical and tabular summary results for the workflow run.
16
16
-`*.zip`: a zip file containing an R markdown file with parameters set and all necessary input files to open and customise the reporting.
17
+
-`gsea/`: Directory containing graphical outputs from GSEA (where enabled). Plots are stored in directories named for the associated contrast.
18
+
-`[contrast]/png/[gsea_plot_type].png`
17
19
18
20
</details>
19
21
@@ -38,8 +40,6 @@ Stand-alone graphical outputs are placed in this directory. They may be useful i
38
40
-`[contrast]/png/volcano.png`: Volcano plots of -log(10) p value agains log(2) fold changes
39
41
-`immunedeconv/`: Directory containing graphical outputs of immunedeconv results
40
42
-`${prefix}.plot1_stacked_bar_chart.png`
41
-
-`gsea/`: Directory containing graphical outputs from GSEA (where enabled). Plots are stored in directories named for the associated contrast.
42
-
-`[contrast]/png/[gsea_plot_type].png`
43
43
-`gprofiler2/`: Directory containing graphical outputs from gprofiler2 (where enabled). Plots are stored in directories named for the associated contrast.
44
44
-`[contrast]/[contrast].gprofiler2.[source].gostplot.html`: An interactive gprofiler2 Manhattan plot of enriched pathways from one specific source/database, e.g. REAC
45
45
-`[contrast]/[contrast].gprofiler2.[source].gostplot.png`: A static gprofiler2 Manhattan plot of enriched pathways from one specific source/database, e.g. REAC
@@ -92,7 +92,7 @@ The `differential` folder is likely to be the core result set for most users, co
92
92
<summary>Output files</summary>
93
93
94
94
-`shinyngs_app/`
95
-
-`[study name]`:
95
+
-`[study_name]`:
96
96
-`data.rds`: serialized R object which can be used to generate a Shiny application
97
97
-`app.R`: minimal R script that will source the data object and generate the app
Copy file name to clipboardExpand all lines: docs/usage.md
+28-24Lines changed: 28 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@
6
6
7
7
## Introduction
8
8
9
-
Differential analysis is a common task in a variety of use cases. In essence, all these use cases entail taking an input matrix containing features (e.g. genes) and observations (e.g. samples), and comparing groups of observations in all or a subset of the features. The feature/ observation language here reflects our hope that this workflow will extend in future to encompass a variety of applications where an assumption of gene vs sample may not be a valid one- though that is the application to which the first release will apply.
9
+
Differential analysis is a common task in a variety of use cases. In essence, all these use cases entail taking an input matrix containing features (e.g. genes) and observations (e.g. samples), and comparing groups of observations in all or a subset of the features. The feature/ observation language here reflects our hope that this workflow will extend in future to encompass a variety of applications where an assumption of gene vs sample may not be a valid one- though that is the application to which the first release will apply.
10
10
11
11
With the above in mind, running this workflow requires:
12
12
@@ -20,14 +20,14 @@ With the above in mind, running this workflow requires:
20
20
## Observations (samplesheet) input
21
21
22
22
```bash
23
-
--input '[path to samplesheet file]'
23
+
--input '[path to samplesheet file].(csv|tsv)'
24
24
```
25
25
26
-
This may well be the same sample sheet used to generate the input matrix. For example, in RNA-seq this might be the same sample sheet, perhaps derived from [fetchngs](https://github.com/nf-core/fetchngs), that was input to the [RNA-seq workflow](https://github.com/nf-core/rnaseq). It may be necessary to add columns that describe the groups you want to compare. The columns that the pipeline requires are:
26
+
The samplesheet file can be tab or comma separated. This may well be the same sample sheet used to generate the input matrix. For example, in RNA-seq this might be the same sample sheet, perhaps derived from [fetchngs](https://github.com/nf-core/fetchngs), that was input to the [RNA-seq workflow](https://github.com/nf-core/rnaseq). It may be necessary to add columns that describe the groups you want to compare. The columns that the pipeline requires are:
27
27
28
-
- a column listing the sample IDs (must be the same IDs as in the abundance matrix), in the example below it is called 'sample'. For some study_types, this column might need to be filled in with file names, e.g. when doing an affymetrix analysis.
29
-
- one or more columns describing conditions for the differential analysis. In the example below it is called 'condition'
30
-
- optionally one or more columns describing sample batches or similar which you want to be considered in the analysis. In the example below it is called 'batch'
28
+
- a column listing the sample IDs (must be the same IDs as in the abundance matrix), in the example below it is called `sample`. For some study_types, this column might need to be filled in with file names, e.g. when doing an affymetrix analysis.
29
+
- one or more columns describing conditions for the differential analysis. In the example below it is called `condition`
30
+
- optionally one or more columns describing sample batches or similar which you want to be considered in the analysis. In the example below it is called `batch`
Abundances for Affy arrays are provided in CEL files within an archive. When creating sample sheets for Affy arrays, it's crucial to include a column that specifies which file corresponds to each sample. This file column is essential for linking each sample to its corresponding data file, as shown in the example below:
@@ -59,30 +57,30 @@ Abundances for Affy arrays are provided in CEL files within an archive. When cre
The "file" column in this example is used to specify the data file associated with each sample, which is essential for data analysis and interpretation.
60
+
The `file` column in this example is used to specify the data file associated with each sample, which is essential for data analysis and interpretation.
63
61
64
62
## Abundance values
65
63
66
64
### RNA-seq and similar
67
65
68
66
```bash
69
-
--matrix '[path to matrix file]'
67
+
--matrix '[path to matrix file].(csv|tsv)'
70
68
```
71
69
72
-
This is a numeric square matrix file, comma or tab-separated, with a column for every observation, and features corresponding to the supplied feature set. The parameters `--observations_id_col` and `--features_id_col` define which of the associated fields should be matched in those inputs.
70
+
This is a numeric matrix file, comma or tab-separated, with features as rows and observations in columns. The features correspond to the supplied feature set. The parameters `--observations_id_col` and `--features_id_col` define which of the associated fields should be matched in those inputs.
73
71
74
72
#### Outputs from nf-core/rnaseq and other tximport-processed results
75
73
76
-
The nf-core RNAseq workflow incorporates [tximport](https://bioconductor.org/packages/release/bioc/html/tximport.html) for producing quantification matrices. From [version 3.12.2](https://github.com/nf-core/rnaseq/releases/tag/3.13.2), it additionally provides transcript length matrices which can be directly consumed by DESeq2 to model length bias across samples.
74
+
The nf-core RNAseq workflow incorporates [tximport](https://bioconductor.org/packages/release/bioc/html/tximport.html) for producing quantification matrices. From [version 3.12.2](https://github.com/nf-core/rnaseq/releases/tag/3.13.2), it additionally provides transcript/gene length matrices which can be directly consumed by DESeq2 to model length bias across samples.
77
75
78
-
To use this approach, include the transcript lengths file with the **raw counts**:
76
+
To use this approach, include the corresponding lengths file with the **raw counts**:
Without the transcript lengths, for instance in earlier rnaseq workflow versions, follow the second recommendation in the [tximport documentation](https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#Downstream_DGE_in_Bioconductor):
83
+
Without the transcript/gene lengths, for instance in earlier rnaseq workflow versions, follow the second recommendation in the [tximport documentation](https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#Downstream_DGE_in_Bioconductor):
86
84
87
85
> "Use the tximport argument `countsFromAbundance='lengthScaledTPM'` or `'scaledTPM'`, then employ the gene-level count matrix `txi$counts` directly in downstream software, a method we call 'bias corrected counts without an offset'"
88
86
@@ -92,7 +90,7 @@ It is important to note that the documentation advises:
92
90
93
91
> "Do not manually pass the original gene-level counts to downstream methods without an offset."
94
92
95
-
So we **do not recommend** raw counts files such as `salmon.merged.gene_counts.tsv` as input for this workflow **except** where the transcript lengths are also provided.
93
+
So we **do not recommend** raw counts files such as `salmon.merged.gene_counts.tsv` as input for this workflow **except** where the transcript/gene lengths are also provided.
96
94
97
95
### MaxQuant intensities
98
96
@@ -130,13 +128,13 @@ Full list of features metadata are available on GEO platform pages.
130
128
131
129
The contrasts file references the observations file to define groups of samples to compare. It can be provided in **either** CSV/TSV or YAML format using the parameters `--contrasts` or `--contrasts_yml`, respectively.
132
130
133
-
### CSV contrasts file
131
+
### CSV/TSV contrasts file
134
132
135
133
```bash
136
-
--contrasts '[path to CSV contrasts file]'
134
+
--contrasts '[path to contrasts file].(csv|tsv)'
137
135
```
138
136
139
-
The contrasts file references the observations file to define groups of samples to compare. For example, based on the sample sheet above we could define contrasts like:
137
+
Based on the sample sheet above we could define contrasts as indicated below:
140
138
141
139
```csv
142
140
id,variable,reference,target,blocking
@@ -154,9 +152,7 @@ The necessary fields in order are:
154
152
You can optionally supply:
155
153
156
154
-`blocking` - semicolon-delimited, any additional variables (also observation columns) that should be modelled alongside the contrast variable
157
-
-`exclude_samples_col` and `exclude_samples_values` - the former being a valid column in the samples sheet, the latter a semicolon-delimited list of values in that column which should be used to select samples prior to differential modelling. This is helpful where certain samples need to be exluded prior to analysis of a given contrast.
158
-
159
-
The file can be tab or comma separated.
155
+
-`exclude_samples_col` and `exclude_samples_values` - the former being a valid column in the samples sheet, the latter a semicolon-delimited list of values in that column which should be used to select samples prior to differential modelling. This is helpful where certain samples need to be excluded prior to analysis of a given contrast.
160
156
161
157
### YAML contrasts file format
162
158
@@ -264,7 +260,7 @@ To run the pipeline with a specific config row, you can use the `--paramset_name
264
260
265
261
We provide a `paramsheet.csv` file in the `assets` directory that defines the parameter sets and tool parameters that make sense to run together, for specific study types.
266
262
267
-
Each row defines a combination of differential analysis tool and functional analysis tool (optional), with the respective arguments.
263
+
Each row defines a combination of a differential analysis tool and a functional analysis tool (optional), with the respective arguments.
268
264
269
265
To run a given combination of tools, you can use the `--paramset_name` parameter.
270
266
@@ -427,7 +423,8 @@ nextflow run nf-core/differentialabundance \
427
423
[--gtf mouse.gtf OR --features features.tsv] \
428
424
--outdir <OUTDIR> \
429
425
-profile docker \
430
-
[--paramset_name <paramset_name>]
426
+
[--paramset_name <paramset_name>] \
427
+
--report_contributors $'Jane Doe\nDirector of Institute of Microbiology\nUniversity of Smallville;John Smith\nPhD student\nInstitute of Microbiology\nUniversity of Smallville'
431
428
```
432
429
433
430
This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.
@@ -460,7 +457,7 @@ process {
460
457
}
461
458
```
462
459
463
-
You will not get the final reporting outcomes of the workflow, but you will get the differential tables produced by DESeq2 or Limma, and the results of any gene seta analysis you have enabled.
460
+
You will not get the final reporting outcomes of the workflow, but you will get the differential tables produced by DESeq2 or Limma, and the results of any gene sets analysis you have enabled.
464
461
465
462
We have also added a dedicated pipeline parameter, `--skip_reports` that allows you to skip only the RMarkdown notebook and bundled report while leaving other reporting processes active. The `RMARKDOWNNOTEBOOK` process assumes that every grouping variable you pass to it (from the contrasts file’s variable column or PCA-derived informative_variables) exists as a valid, named column in your sample metadata. If you know your metadata or contrasts might be incomplete or non-standard (such as using formula-based yaml files), the you can use this flag to skip these steps.
0 commit comments