@@ -241,7 +241,96 @@ Files in PICKLE Format
241241^^^^^^^^^^^^^^^^^^^^^^
242242
243243These files are for internal usage only and can be ignored.
244-
244+
245+ Output Files -- ``test `` mode
246+ --------------------------------
247+ In the testing mode, SplAdder generates both tabulated output as well as some images for diagnosing
248+ properties of the data. The latter is still in beta mode. Please report an issue on the `tracker
249+ <https://github.com/ratschlab/spladder/issues> `_ in case you should encounter any problems.
250+
251+ Files in TXT Format
252+ ^^^^^^^^^^^^^^^^^^^
253+ The results of the ``test `` mode can be generally found in the ``testing `` subdirectory of the
254+ SplAdder output folder. For each event type {ET} and confidence level {C}, several different
255+ output files in text format are generated:
256+
257+ - ``test_results_C{C}_{ET}.tsv ``
258+ - ``test_results_C{C}_{ET}.gene_unique.tsv ``
259+ - ``test_results_extended_C{C}_{ET}.tsv ``
260+
261+ In the following, we will provide more description for each of the files.
262+
263+ **Basic test output per event **
264+
265+ The basic outputs of testing are stored in the file ``test_results_C{C}_{ET}.tsv ``. In addition to
266+ the header, the file contains one line per tested event. It contains 15 columns carrying the
267+ following information:
268+
269+ #. *event_id * -- ID of the event
270+ #. *chrm * -- event location: chromosome/contig
271+ #. *exon_pos * -- event location: exon position (start-stop:start-stop: ...)
272+ #. *alt_usage * -- list of binary values, indicating alternative usage of each exon (same order as in exon_pos)
273+ #. *gene_id * -- ID of gene
274+ #. *gene_name * -- Name of gene
275+ #. *p_val * -- raw p-value from differential test
276+ #. *p_val_adj * -- adjusted p-value from differential test
277+ #. *dPSI * -- delta PSI (absolute difference between mean-PSI of group A and mean-PSI of group B)
278+ #. *mean_event_count_A * -- mean support for tested splice path in group A
279+ #. *mean_event_count_B * -- mean support for tested splice path in group B
280+ #. *log2FC_event_count * -- log2 fold-change of mean support group A vs group B
281+ #. *mean_gene_exp_A * -- mean gene expression of gene in group A
282+ #. *mean_gene_exp_B * -- mean gene expression of gene in group B
283+ #. *log2FC_gene_exp * -- log2 fold-change of gene expression group A vs group B
284+
285+ **Basic test output per gene **
286+ The file ``test_results_C{C}_{ET}.gene_unique.tsv `` contains essentially the same information as the
287+ basic test output per event, just made unique per gene. That is, if a gene contains multiple events
288+ of the same type, here only the most significant one is reported. The columns are the same.
289+
290+ **Extended test output per event **
291+ The file ``test_results_extended_C{C}_{ET}.tsv `` contains additional output for each tested event
292+ and can be used for debugging purposes. The number of columns is variable and depends on the size of
293+ the input groups used for testing. For the following explanation, we assume that input group A has
294+ size 2 and input group B has size 3. The first 15 columns are identical to the basic event
295+ output file. The additional columns are as follows:
296+
297+ 16. *event_count:group_A_sample1 * -- support for tested splice path in group A sample 1
298+ #. *event_count:group_A_sample2 * -- support for tested splice path in group A sample 2
299+ #. *event_count:group_B_sample1 * -- support for tested splice path in group B sample 1
300+ #. *event_count:group_B_sample2 * -- support for tested splice path in group B sample 2
301+ #. *event_count:group_B_sample3 * -- support for tested splice path in group B sample 3
302+ #. *disp_raw * -- raw dispersion estimate for the tested event
303+ #. *disp_adj * -- corrected dispersion estimate for the tested event
304+
305+ Diagnose Plots
306+ ^^^^^^^^^^^^^^
307+
308+ The testing mode can generate some diagnose plots (via ``--diagnose-plots ``) that can help you
309+ assess the data you are looking at. These plots are still in beta mode and might change in future
310+ versions of SplAdder.
311+
312+ The plots reside in the SplAdder output directory in the folder ``testing/plots ``. Currently, the
313+ following plots are available:
314+
315+ Count distribution
316+ A plot showing the distribution of supporting counts and the gene expression over events per
317+ tested group / condition. The plot is available over raw counts and over log10 transformed counts.
318+
319+ MA plot
320+ A plot showing the log2 fold-change of each event over the mean normalized counts.
321+
322+ Dispersion
323+ Three plots showing the raw dispersion estimate, the dispersion fit and the adjusted dispersion..
324+
325+ QQ plot
326+ Quantile-quantile plots showing the distribution of p-values after testing over a uniform
327+ distribution to check for over-inflation. Available for raw and adjusted p-values.
328+
329+
330+ Files in PICKLE Format
331+ ^^^^^^^^^^^^^^^^^^^^^^
332+
333+ Similar to ``build `` mode, these files are for internal usage only and can be ignored.
245334
246335.. _ensembl : http://www.ensembl.org/info/website/upload/gff.html
247336.. _broad : http://www.broadinstitute.org/annotation/argo/help/gff3.html
0 commit comments