@@ -408,3 +408,63 @@ For site statistics, summary functions are applied to the total weight or number
408408associated with each allele; but if polarised, then the ancestral allele is left out of this sum.
409409For branch or node statistics, summary functions are applied to the total weight or number of samples
410410below, and above each branch or node; if polarised, then only the weight below is used.
411+
412+ .. _sec_stat_functions :
413+
414+ *****************
415+ Summary functions
416+ *****************
417+
418+ For convenience, here are the summary functions used for many of the statistics.
419+ Below, :math: `x` denotes the number of samples in a sample set below a node,
420+ `n ` denotes the total size of a sample set,
421+ and boolean expressions (e.g., :math: `(x > 0 )`) are interpreted as 0/1.
422+
423+ ``diversity ``
424+ :math: `f(x) = \frac {x (n - x)}{n (n-1 )}`
425+
426+ ``segregating_sites ``
427+ :math: `f(x) = (x > 0 ) (1 - x / n)`
428+
429+ (Note: this works because if :math: `\sum _i p_1 = 1 ` then :math: `\sum _{i=1 }^k (1 -p_i) = k-1 `.)
430+
431+ ``Y1 ``
432+ :math: `f(x) = \frac {x (n - x) (n - x - 1 )}{n (n-1 ) (n-2 )}`
433+
434+ ``divergence ``
435+ :math: `f(x_1 , x_2 ) = \frac {x_1 (n_2 - x_2 )}{n_1 n_2 }`,
436+
437+ unless the two indices are the same, when the diversity function is used.
438+
439+ ``Y2 ``
440+ :math: `f(x_1 , x_2 ) = \frac {x_1 (n_2 - x_2 ) (n_2 - x_2 - 1 )}{n_1 n_2 (n_2 - 1 )}`
441+
442+ ``f2 ``
443+ :math: `f(x_1 , x_2 ) = \frac {x_1 (x_1 - 1 ) (n_2 - x_2 ) (n_2 - x_2 - 1 )}{n_1 (n_1 - 1 ) n_2 (n_2 - 1 )}`
444+
445+ ``Y3 ``
446+ :math: `f(x_1 , x_2 , x_3 ) = \frac {x_1 (n_2 - x_2 ) (n_3 - x_3 )}{n_1 n_2 n_3 }`
447+
448+ ``f3 ``
449+ :math: `f(x_1 , x_2 , x_3 ) = \frac {x_1 (x_1 - 1 ) (n_2 - x_2 ) (n_3 - x_3 )}{n_1 (n_1 - 1 ) n_2 n_3 }`
450+
451+ ``f4 ``
452+ :math: `f(x_1 , x_2 , x_3 , x_4 ) = \frac {x_1 x_3 (n_2 - x_2 ) (n_4 - x_4 )}{n_1 n_2 n_3 n_4 }`
453+
454+ ``trait_covariance ``
455+ :math: `f(w) = \frac {w^2 }{2 (n-1 )^2 }`,
456+
457+ where :math: `w` is the sum of all trait values of the samples below the node.
458+
459+ ``trait_correlation ``
460+ :math: `f(w, x) = \frac {w^2 }{2 x (1 - x/n) (n - 1 )}`,
461+
462+ where as before :math: `x` is the total number of samples below the node,
463+ and :math: `n` is the total number of samples.
464+
465+ ``trait_regression ``
466+ :math: `f(w, z, x) = \frac {1 }{2 }\left ( \frac {w - \sum _{j=1 }^k z_j v_j}{x - \sum _{j=1 }^k z_j^2 } \right )^2 `,
467+
468+ where :math: `w` and :math: `x` are as before,
469+ :math: `z_j` is the sum of the j-th normalised covariate values below the node,
470+ and :math: `v_j` is the covariance of the trait with the j-th covariate.
0 commit comments