Skip to content

Commit 3b01b89

Browse files
committed
Edit trees.py to add docstring to ld_matrix()
1 parent 50c2898 commit 3b01b89

File tree

1 file changed

+79
-2
lines changed

1 file changed

+79
-2
lines changed

python/tskit/trees.py

Lines changed: 79 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10930,12 +10930,89 @@ def impute_unknown_mutations_time(
1093010930
def ld_matrix(
1093110931
self,
1093210932
sample_sets=None,
10933-
sites=None,
10934-
positions=None,
1093510933
mode="site",
1093610934
stat="r2",
10935+
sites=None,
10936+
positions=None,
1093710937
indexes=None,
1093810938
):
10939+
r"""
10940+
10941+
Returns a matrix of the specified two-locus statistic (default
10942+
:math:`r^2`) computed from sample allelic states or branch lengths.
10943+
The resulting linkage disequilibrium (LD) matrix represents either the
10944+
two-locus statistic as computed between all pairs of specified
10945+
``sites`` ("site" mode, producing a num_sites-by-num_sites sized
10946+
matrix), or as computed from the branch structures at marginal trees
10947+
between pairs of trees at all specified ``positions`` ("branch" mode,
10948+
producing a num_positions-by-num_positions sized matrix).
10949+
10950+
In the site mode, the sites under consideration can be restricted using
10951+
the ``sites`` argument. Sites can be passed as a list of lists,
10952+
specifying the ``[[row_sites], [col_sites]]``, resulting in a
10953+
rectangular matrix, or by specifying a single list of ``[sites]``, in
10954+
which a square matrix will be produced (see
10955+
:ref:`sec_stats_two_locus_site` for examples).
10956+
10957+
Similarly, in the branch mode, the ``positions`` argument specifies
10958+
loci for which the expectation for the two-locus statistic is computed
10959+
over pairs of trees at those positions. LD statis are computed between
10960+
trees whose ``[start, end)`` contains the given position (such that
10961+
repeats of trees are possible). Similar to the site mode, a nested list
10962+
of row and column positions can be specified separately (resulting in a
10963+
rectangular matrix) or a single list of a specified positions results
10964+
in a square matrix (see :ref:`sec_stats_two_locus_branch` for
10965+
examples).
10966+
10967+
Some LD statistics are defined for two sample sets instead of within a
10968+
single set of samples. If the ``indexes`` argument is specified, at
10969+
least two sample sets must also be specified. ``indexes`` specifies the
10970+
sample set indexes between which to compute LD.
10971+
10972+
For more on how the ``indexes`` and ``sample_sets`` interact with the
10973+
output dimensions, see the :ref:`sec_stats_two_locus_sample_sets`
10974+
section.
10975+
10976+
**Available Stats** (use ``Stat Name`` in the ``stat`` keyword
10977+
argument).
10978+
10979+
======================= ========== ================ ==============
10980+
Stat Polarised Multi Sample Set Stat Name
10981+
======================= ========== ================ ==============
10982+
:math:`r^2` n y "r2"
10983+
:math:`r` y n "r"
10984+
:math:`D^2` n y "D2"
10985+
:math:`D` y n "D"
10986+
:math:`D'` y n "D_prime"
10987+
:math:`D_z` n n "Dz"
10988+
:math:`\pi2` n n "pi2"
10989+
:math:`\widehat{D^2}` n y "D2_unbiased"
10990+
:math:`\widehat{D_z}` n n "Dz_unbiased"
10991+
:math:`\widehat{\pi_2}` n n "pi2_unbiased"
10992+
======================= ========== ================ ==============
10993+
10994+
:param list sample_sets: A list of lists of sample Node IDs, specifying
10995+
the groups of nodes to compute the statistic with. Defaults to all
10996+
samples grouped by population.
10997+
:param str mode: A string giving the "type" of the statistic to be
10998+
computed. Defaults to "site", can be "site" or "branch".
10999+
:param str stat: A string giving the selected two-locus statistic to
11000+
compute. Defaults to "r2".
11001+
:param list sites: A list of sites over which to compute LD. Can be
11002+
specified as a list of lists to control the row and column sites.
11003+
Only applicable in site mode. Specify as
11004+
``[[row_sites], [col_sites]]`` or ``[all_sites]``.
11005+
:param list positions: A list of genomic positions where expected LD is
11006+
computed. Only applicable in branch mode. Can be specified as a list
11007+
of lists to control the row and column positions. Specify as
11008+
``[[row_positions], [col_positions]]`` or ``[all_positions]``.
11009+
:param list indexes: A list of 2-tuples or a single 2-tuple, specifying
11010+
the indexes of two sample sets over which to compute a two-way LD
11011+
statistic. Only :math:`r^2`, :math:`D^2`, and :math:`\widehat{D^2}`
11012+
are implemented for two-way statistics.
11013+
:return: A 2D or 3D array of LD matrices.
11014+
:rtype: numpy.ndarray
11015+
"""
1093911016
one_way_stats = {
1094011017
"D": self._ll_tree_sequence.D_matrix,
1094111018
"D2": self._ll_tree_sequence.D2_matrix,

0 commit comments

Comments
 (0)