Skip to content

Commit 5f4613f

Browse files
Merge pull request #329 from petrelharp/stat_fun_docs
listed summary functions
2 parents 1fadc92 + bc8ad36 commit 5f4613f

File tree

1 file changed

+60
-0
lines changed

1 file changed

+60
-0
lines changed

docs/stats.rst

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -408,3 +408,63 @@ For site statistics, summary functions are applied to the total weight or number
408408
associated with each allele; but if polarised, then the ancestral allele is left out of this sum.
409409
For branch or node statistics, summary functions are applied to the total weight or number of samples
410410
below, and above each branch or node; if polarised, then only the weight below is used.
411+
412+
.. _sec_stat_functions:
413+
414+
*****************
415+
Summary functions
416+
*****************
417+
418+
For convenience, here are the summary functions used for many of the statistics.
419+
Below, :math:`x` denotes the number of samples in a sample set below a node,
420+
`n` denotes the total size of a sample set,
421+
and boolean expressions (e.g., :math:`(x > 0)`) are interpreted as 0/1.
422+
423+
``diversity``
424+
:math:`f(x) = \frac{x (n - x)}{n (n-1)}`
425+
426+
``segregating_sites``
427+
:math:`f(x) = (x > 0) (1 - x / n)`
428+
429+
(Note: this works because if :math:`\sum_i p_1 = 1` then :math:`\sum_{i=1}^k (1-p_i) = k-1`.)
430+
431+
``Y1``
432+
:math:`f(x) = \frac{x (n - x) (n - x - 1)}{n (n-1) (n-2)}`
433+
434+
``divergence``
435+
:math:`f(x_1, x_2) = \frac{x_1 (n_2 - x_2)}{n_1 n_2}`,
436+
437+
unless the two indices are the same, when the diversity function is used.
438+
439+
``Y2``
440+
:math:`f(x_1, x_2) = \frac{x_1 (n_2 - x_2) (n_2 - x_2 - 1)}{n_1 n_2 (n_2 - 1)}`
441+
442+
``f2``
443+
:math:`f(x_1, x_2) = \frac{x_1 (x_1 - 1) (n_2 - x_2) (n_2 - x_2 - 1)}{n_1 (n_1 - 1) n_2 (n_2 - 1)}`
444+
445+
``Y3``
446+
:math:`f(x_1, x_2, x_3) = \frac{x_1 (n_2 - x_2) (n_3 - x_3)}{n_1 n_2 n_3}`
447+
448+
``f3``
449+
:math:`f(x_1, x_2, x_3) = \frac{x_1 (x_1 - 1) (n_2 - x_2) (n_3 - x_3)}{n_1 (n_1 - 1) n_2 n_3}`
450+
451+
``f4``
452+
:math:`f(x_1, x_2, x_3, x_4) = \frac{x_1 x_3 (n_2 - x_2) (n_4 - x_4)}{n_1 n_2 n_3 n_4}`
453+
454+
``trait_covariance``
455+
:math:`f(w) = \frac{w^2}{2 (n-1)^2}`,
456+
457+
where :math:`w` is the sum of all trait values of the samples below the node.
458+
459+
``trait_correlation``
460+
:math:`f(w, x) = \frac{w^2}{2 x (1 - x/n) (n - 1)}`,
461+
462+
where as before :math:`x` is the total number of samples below the node,
463+
and :math:`n` is the total number of samples.
464+
465+
``trait_regression``
466+
:math:`f(w, z, x) = \frac{1}{2}\left( \frac{w - \sum_{j=1}^k z_j v_j}{x - \sum_{j=1}^k z_j^2} \right)^2`,
467+
468+
where :math:`w` and :math:`x` are as before,
469+
:math:`z_j` is the sum of the j-th normalised covariate values below the node,
470+
and :math:`v_j` is the covariance of the trait with the j-th covariate.

0 commit comments

Comments
 (0)