You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By default, library size estimation is by TMM, implemented in edgeR from BioConductor. You will need to install this manually if you haven't already:
26
+
By default, library size estimation is by TMM (Robinson and Oshlack, 2010), implemented in edgeR from BioConductor. You will need to install this manually if you haven't already:
27
27
28
28
```
29
29
source("http://bioconductor.org/biocLite.R")
@@ -59,7 +59,7 @@ Say you have a count matrix `counts` and a design matrix `design`. To perform a
59
59
y <- varistran::vst(counts, design=design)
60
60
```
61
61
62
-
By default, Anscombe's variance stabilizing transformation for the negative binomial distribution is used. This behaves like log2 for large counts (log2 Counts-Per-Million if `cpm=T` is given).
62
+
By default, Anscombe's (1948) variance stabilizing transformation for the negative binomial distribution is used. This behaves like log2 for large counts (log2 Counts-Per-Million if `cpm=T` is given).
63
63
64
64
An appropraite dispersion is estimated with the aid of the design matrix. If omitted, this defaults to a column of ones, for blind estimation of the dispersion. This might slightly over-estimate the dispersion. A third possibility is to estimate the dispersion with edgeR.
65
65
@@ -115,6 +115,25 @@ make test
115
115
116
116
Outputs are placed in a directory called `test_output`.
117
117
118
+
Sources of data used in these tests are:
119
+
120
+
* The [Bottomly dataset](http://bowtie-bio.sourceforge.net/recount/ExpressionSets/bottomly_eset.RData) from [ReCount](http://bowtie-bio.sourceforge.net/recount/).
121
+
122
+
* The "arab" dataset provided in the [NBPSeq package](https://cran.rstudio.com/web/packages/NBPSeq/index.html).
123
+
124
+
* Simulated data following negative binomial distributions.
125
+
126
+
Dispersion estimates are compared to those calculated by the [edgeR biocnoductor package's](https://bioconductor.org/packages/release/bioc/html/edgeR.html)`estimateGLMCommonDisp` function (McCarthy, Chen and Smyth, 2012) and by the [DESeq2 bioconductor package's](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)`DESeq` function (Love, Huber and Anders, 2014).
Please file bug reports and feature requests by [filing a bug report](https://github.com/MonashBioinformaticsPlatform/varistran/issues), or by [contacting the author](email:[email protected]).
134
+
135
+
Pull requests gratefully considered.
136
+
118
137
119
138
## Links
120
139
@@ -123,3 +142,16 @@ Outputs are placed in a directory called `test_output`.
123
142
*[RNA Systems Laboratory, Monash University](http://rnasystems.erc.monash.edu)
124
143
125
144
145
+
## References
146
+
147
+
Anscombe, Francis J. 1948. "The Transformation of Poisson, Binomial and Negative-Binomial Data." *Biometrika* 35 (3/4): 246–54.
148
+
149
+
Love, Michael I., Wolfgang Huber and Simon Anders. 2014. "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2." *Genome Biology* 15 (12): 550. doi:10.1186/s13059-014-0550-8
150
+
151
+
McCarthy, Davis J., Yunshun Chen and Gordon K. Smyth. 2012. "Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation." *Nucleic Acids Research* 40 (10): 4288-4297. doi:10.1093/nar/gks042
152
+
153
+
Robinson, Mark D. and Alicia Oshlack. 2010. "A scaling normalization method for differential expression analysis of RNA-seq data." *Genome Biology* 11 (3): R25. doi:10.1186/gb-2010-11-3-r25
0 commit comments