Skip to content

Commit 1faa722

Browse files
hyanwongpetrelharp
authored andcommitted
Remove "random" from doc wording
And reword divergence note
1 parent fbfdc13 commit 1faa722

File tree

1 file changed

+45
-38
lines changed

1 file changed

+45
-38
lines changed

python/tskit/trees.py

Lines changed: 45 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -6589,28 +6589,32 @@ def diversity(
65896589
What is computed depends on ``mode``:
65906590
65916591
"site"
6592-
Mean pairwise genetic diversity: the average across distinct,
6593-
randomly chosen pairs of chromosomes, of the density of sites at
6592+
Mean pairwise genetic diversity: the average over all n choose 2 pairs of
6593+
sample nodes, of the density of sites at
65946594
which the two carry different alleles, per unit of chromosome length.
65956595
65966596
"branch"
6597-
Mean distance in the tree: the average across distinct, randomly chosen pairs
6598-
of chromosomes and locations in the window, of the mean distance in the tree
6599-
between the two samples (in units of time).
6597+
Mean distance in the tree: the average across over all n choose 2 pairs of
6598+
sample nodes and locations in the window, of the mean distance in
6599+
the tree between the two samples (in units of time).
66006600
66016601
"node"
66026602
For each node, the proportion of genome on which the node is an ancestor to
6603-
only one of a random pair from the sample set, averaged over choices of pair.
6603+
only one of a pair of sample nodes from the sample set, averaged
6604+
over over all n choose 2 pairs of sample nodes.
66046605
66056606
:param list sample_sets: A list of lists of Node IDs, specifying the
6606-
groups of nodes to compute the statistic with.
6607+
groups of nodes for which the statistic is computed. If any of the
6608+
sample sets contain only a single node, the returned diversity will be
6609+
NaN. If ``None`` (default), average over all n choose 2 pairs of distinct
6610+
sample nodes in the tree sequence.
66076611
:param list windows: An increasing list of breakpoints between the windows
66086612
to compute the statistic in.
66096613
:param str mode: A string giving the "type" of the statistic to be computed
66106614
(defaults to "site").
66116615
:param bool span_normalise: Whether to divide the result by the span of the
66126616
window (defaults to True).
6613-
:return: A numpy array.
6617+
:return: A numpy array whose length is equal to the number of sample sets.
66146618
"""
66156619
return self.__one_way_sample_set_stat(
66166620
self._ll_tree_sequence.diversity,
@@ -6628,9 +6632,9 @@ def divergence(
66286632
sets of nodes from ``sample_sets``.
66296633
This is the "average number of differences", usually referred to as "dxy";
66306634
a common citation for this definition is Nei and Li (1979), who called it
6631-
:math:`\pi_{XY}`. Note that computing the divergence of a population to itself
6632-
gives the mean pairwise nucleotide diversity within that population,
6633-
which is :meth:`diversity <.TreeSequence.diversity>`.
6635+
:math:`\pi_{XY}`. Note that the mean pairwise nucleotide diversity of a
6636+
sample set to itself (computed by passing an index of the form (j,j))
6637+
is its :meth:`diversity <.TreeSequence.diversity>` (see the note below).
66346638
66356639
Operates on ``k = 2`` sample sets at a time; please see the
66366640
:ref:`multi-way statistics <sec_stats_sample_sets_multi_way>`
@@ -6642,27 +6646,30 @@ def divergence(
66426646
:ref:`span normalise <sec_stats_span_normalise>`,
66436647
and :ref:`return value <sec_stats_output_format>`.
66446648
6645-
As a special case, an index ``(j, j)`` will compute the
6646-
:meth:`diversity <.TreeSequence.diversity>` of ``sample_set[j]``.
6649+
..note ::
6650+
To avoid unexpected results, sample sets should be nonoverlapping,
6651+
since comparisons of individuals to themselves are not removed when computing
6652+
divergence between distinct sample sets. (However, specifying an index
6653+
``(j, j)`` computes the :meth:`diversity <.TreeSequence.diversity>`
6654+
of ``sample_set[j]``, which removes self comparisons to provide
6655+
an unbiased estimate.)
66476656
66486657
What is computed depends on ``mode``:
66496658
66506659
"site"
6651-
Mean pairwise genetic divergence: the average across distinct,
6652-
randomly chosen pairs of chromosomes (one from each sample set), of
6653-
the density of sites at which the two carry different alleles, per
6654-
unit of chromosome length.
6660+
Mean pairwise genetic divergence: the average across every possible pair of
6661+
chromosomes (one from each sample set), of the density of sites at which
6662+
the two carry different alleles, per unit of chromosome length.
66556663
66566664
"branch"
6657-
Mean distance in the tree: the average across distinct, randomly
6658-
chosen pairs of chromosomes (one from each sample set) and locations
6659-
in the window, of the mean distance in the tree between the two
6660-
samples (in units of time).
6665+
Mean distance in the tree: the average across every possible pair of
6666+
chromosomes (one from each sample set) and locations in the window, of
6667+
the mean distance in the tree between the two samples (in units of time).
66616668
66626669
"node"
66636670
For each node, the proportion of genome on which the node is an ancestor to
6664-
only one of a random pair (one from each sample set), averaged over
6665-
choices of pair.
6671+
only one of a pair of chromosomes from the sample set, averaged
6672+
over all possible pairs.
66666673
66676674
:param list sample_sets: A list of lists of Node IDs, specifying the
66686675
groups of nodes to compute the statistic with.
@@ -7373,8 +7380,8 @@ def Y3(
73737380
:ref:`span normalise <sec_stats_span_normalise>`,
73747381
and :ref:`return value <sec_stats_output_format>`.
73757382
7376-
What is computed depends on ``mode``. Each is an average across
7377-
randomly chosen trios of samples ``(a, b, c)``, one from each sample set:
7383+
What is computed depends on ``mode``. Each is an average across every
7384+
combination of trios of samples ``(a, b, c)``, one chosen from each sample set:
73787385
73797386
"site"
73807387
The average density of sites at which ``a`` differs from ``b`` and
@@ -7425,7 +7432,7 @@ def Y2(
74257432
and :ref:`return value <sec_stats_output_format>`.
74267433
74277434
What is computed depends on ``mode``. Each is computed exactly as
7428-
``Y3``, except that the average across randomly chosen trios of samples
7435+
``Y3``, except that the average is across every possible trio of samples
74297436
``(a, b1, b2)``, where ``a`` is chosen from the first sample set, and
74307437
``b1, b2`` are chosen (without replacement) from the second sample set.
74317438
See :meth:`Y3 <.TreeSequence.Y3>` for more details.
@@ -7465,7 +7472,7 @@ def Y1(self, sample_sets, windows=None, mode="site", span_normalise=True):
74657472
Operates on ``k = 1`` sample set at a time.
74667473
74677474
What is computed depends on ``mode``. Each is computed exactly as
7468-
``Y3``, except that the average is across a randomly chosen trio of
7475+
``Y3``, except that the average is across every possible trio of samples
74697476
samples ``(a1, a2, a3)`` all chosen without replacement from the same
74707477
sample set. See :meth:`Y3 <.TreeSequence.Y3>` for more details.
74717478
@@ -7503,8 +7510,8 @@ def f4(
75037510
:ref:`span normalise <sec_stats_span_normalise>`,
75047511
and :ref:`return value <sec_stats_output_format>`.
75057512
7506-
What is computed depends on ``mode``. Each is an average across
7507-
randomly chosen set of four samples ``(a, b; c, d)``, one from each sample set:
7513+
What is computed depends on ``mode``. Each is an average across every possible
7514+
combination of four samples ``(a, b; c, d)``, one chosen from each sample set:
75087515
75097516
"site"
75107517
The average density of sites at which ``a`` and ``c`` agree but
@@ -7571,9 +7578,9 @@ def f3(
75717578
and :ref:`return value <sec_stats_output_format>`.
75727579
75737580
What is computed depends on ``mode``. Each works exactly as
7574-
:meth:`f4 <.TreeSequence.f4>`, except the average is across randomly
7575-
chosen set of four samples ``(a1, b; a2, c)``, with `a1` and `a2` both
7576-
chosen (without replacement) from the first sample set. See
7581+
:meth:`f4 <.TreeSequence.f4>`, except the average is across every possible
7582+
combination of four samples ``(a1, b; a2, c)`` where `a1` and `a2` have both
7583+
been chosen (without replacement) from the first sample set. See
75777584
:meth:`f4 <.TreeSequence.f4>` for more details.
75787585
75797586
:param list sample_sets: A list of lists of Node IDs, specifying the
@@ -7614,11 +7621,11 @@ def f2(
76147621
and :ref:`return value <sec_stats_output_format>`.
76157622
76167623
What is computed depends on ``mode``. Each works exactly as
7617-
:meth:`f4 <.TreeSequence.f4>`, except the average is across randomly
7618-
chosen set of four samples ``(a1, b1; a2, b2)``, with `a1` and `a2`
7619-
both chosen (without replacement) from the first sample set and ``b1``
7620-
and ``b2`` chosen randomly without replacement from the second sample
7621-
set. See :meth:`f4 <.TreeSequence.f4>` for more details.
7624+
:meth:`f4 <.TreeSequence.f4>`, except the average is across every possible
7625+
combination of four samples ``(a1, b1; a2, b2)`` where `a1` and `a2` have
7626+
both been chosen (without replacement) from the first sample set, and ``b1``
7627+
and ``b2`` have both been chosen (without replacement) from the second
7628+
sample set. See :meth:`f4 <.TreeSequence.f4>` for more details.
76227629
76237630
:param list sample_sets: A list of lists of Node IDs, specifying the
76247631
groups of nodes to compute the statistic with.
@@ -7848,7 +7855,7 @@ def get_pairwise_diversity(self, samples=None):
78487855
def pairwise_diversity(self, samples=None):
78497856
"""
78507857
Returns the pairwise nucleotide site diversity, the average number of sites
7851-
that differ between a randomly chosen pair of samples. If `samples` is
7858+
that differ between a every possible pair of distinct samples. If `samples` is
78527859
specified, calculate the diversity within this set.
78537860
78547861
.. deprecated:: 0.2.0

0 commit comments

Comments
 (0)