Skip to content

Commit 1190e6c

Browse files
authored
Merge pull request #1367 from pmoris/clarify-deseq2-qc
Clarify design formula and blind dispersion estimation
2 parents b59f27f + 760e83f commit 1190e6c

File tree

3 files changed

+5
-1
lines changed

3 files changed

+5
-1
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,7 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements
120120
- [PR #1362](https://github.com/nf-core/rnaseq/pull/1362) - Move multiqc module prefix for nf-test to module
121121
- [PR #1363](https://github.com/nf-core/rnaseq/pull/1363) - Minor updates of nf-core modules and subworkflows
122122
- [PR #1363](https://github.com/nf-core/rnaseq/pull/1363) - Update dupradar script
123+
- [PR #1367](https://github.com/nf-core/rnaseq/pull/1367) - Clarify design formula and blind dispersion estimation
123124

124125
### Parameters
125126

bin/deseq2_qc.r

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,8 @@ if (decompose) {
9292
DDSFile <- paste(opt$outprefix,".dds.RData",sep="")
9393

9494
counts <- count.table[,samples.vec,drop=FALSE]
95-
dds <- DESeqDataSetFromMatrix(countData=round(counts), colData=coldata, design=~ 1)
95+
# `design=~1` creates intercept-only model, equivalent to setting `blind=TRUE` for transformation.
96+
dds <- DESeqDataSetFromMatrix(countData=round(counts), colData=coldata, design=~1)
9697
dds <- estimateSizeFactors(dds)
9798
if (min(dim(count.table))<=1) { # No point if only one sample, or one gene
9899
save(dds,file=DDSFile)

docs/output.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -642,6 +642,8 @@ The script included in the pipeline uses DESeq2 to normalise read counts across
642642

643643
By default, the pipeline uses the `vst` transformation which is more suited to larger experiments. You can set the parameter `--deseq2_vst false` if you wish to use the DESeq2 native `rlog` option. See [DESeq2 docs](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#data-transformations-and-visualization) for a more detailed explanation.
644644

645+
Both types of transformation are performed blind, i.e. using across-all-samples variability, without using any prior information on experimental groups (equivalent to using an intercept-only design), as recommended by the [DESeq2 docs](https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#blind-dispersion-estimation).
646+
645647
The PCA plots are generated based alternately on the top five hundred most variable genes, or all genes. The former is the conventional approach that is more likely to pick up strong effects (ie the biological signal) and the latter, when different, is picking up a weaker but consistent effect that is synchronised across many transcripts. We project both of these onto the first two PCs (shown in the top row of the figure below), which is the best two dimensional representation of the variation between samples.
646648

647649
We also explore higher components in terms of experimental factors inferred from sample names. If your sample naming convention follows a strict policy of using underscores to delimit values of experimental factors (for example `WT_UNTREATED_REP1`) and all names have the same number of underscores (so excluding `WT_TREATED_10ml_REP1` from being compatible with the previous label), then any of these factors that are informative (ie label some but not all samples the same) then we individually plot upto the first five PCs, per experimental level, for each of the experimental factors.

0 commit comments

Comments
 (0)