Skip to content

Commit e5b38e4

Browse files
authored
Merge pull request #653 from daniel-goldstein/combinatorics-doc
Document enumerating tree topologies in combinatorics module
2 parents bac46d6 + 856080c commit e5b38e4

File tree

11 files changed

+252
-100
lines changed

11 files changed

+252
-100
lines changed
16 KB
Loading

docs/_static/topology_0_0.svg

Lines changed: 1 addition & 0 deletions
Loading

docs/_static/topology_1_0.svg

Lines changed: 1 addition & 0 deletions
Loading

docs/_static/topology_1_1.svg

Lines changed: 1 addition & 0 deletions
Loading

docs/_static/topology_1_2.svg

Lines changed: 1 addition & 0 deletions
Loading

docs/combinatorics.rst

Lines changed: 208 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,210 @@
1+
.. currentmodule:: tskit
12
.. _sec_combinatorics:
23

3-
=====================================
4-
Ranking and Unranking Tree Topologies
5-
=====================================
6-
TODO
4+
=============
5+
Combinatorics
6+
=============
7+
tskit uses a combinatorial approach to identify unique topologies of
8+
rooted, leaf-labelled trees. It provides methods
9+
for enumerating all possible tree topologies, as well as converting
10+
back and forth between a tree and its position, or rank, in the
11+
enumeration of all possible topologies.
12+
These methods do not only apply to binary trees;
13+
rather, they cover general, rooted trees without unary nodes.
14+
15+
================================= =====================================
16+
:meth:`Tree.rank` Return the rank of this tree.
17+
:meth:`Tree.unrank` Return a Tree given its rank and
18+
a number of leaves.
19+
:func:`tskit.all_trees` Return a generator over all
20+
leaf-labelled trees of n leaves.
21+
:func:`tskit.all_tree_shapes` Return a generator over all
22+
tree shapes of n leaves.
23+
:func:`tskit.all_tree_labellings` Return a generator over all
24+
labellings of the given tree's shape.
25+
================================= =====================================
26+
27+
.. _sec_tree_ranks:
28+
29+
+++++++++++++++++++++++
30+
Interpreting Tree Ranks
31+
+++++++++++++++++++++++
32+
To understand tree ranks we must look at how leaf-labelled tree topologies
33+
are enumerated. For example, we can use :func:`tskit.all_trees`
34+
to generate all possible topologies of three leaves:
35+
36+
.. code-block:: python
37+
38+
for t in tskit.all_trees(num_leaves=3):
39+
display(SVG(t.draw(node_labels={0: 0, 1: 1, 2: 2}, order="tree")))
40+
41+
.. image:: _static/topology_0_0.svg
42+
:width: 24%
43+
.. image:: _static/topology_1_0.svg
44+
:width: 24%
45+
.. image:: _static/topology_1_1.svg
46+
:width: 24%
47+
.. image:: _static/topology_1_2.svg
48+
:width: 24%
49+
50+
In this sequence, there exist two distinct tree shapes and each shape
51+
can be labelled in at least one unique way. Given that topologies are
52+
ordered first by their shape and then by their labelling, a tree
53+
topology can be uniquely identified by
54+
55+
1.
56+
The shape of the tree
57+
2.
58+
The labelling of the tree's shape
59+
60+
We can refer to the first tree in the above enumeration as the
61+
first labelling of the first shape of trees with three leaves, or tree
62+
:math:`(0, 0)`. The second tree can be identified as the first labelling
63+
of the second shape, or :math:`(1, 0)`, and so on.
64+
This pair of indexes for the shape and labelling of a tree is referred
65+
to as the rank of the tree, and can be computed using the
66+
:meth:`Tree.rank` method.
67+
68+
.. code-block:: python
69+
70+
ranks = [t.rank() for t in tskit.all_trees(num_leaves=3)]
71+
print("Ranks of 3-leaf trees:", ranks)
72+
73+
::
74+
75+
Ranks of 3-leaf trees: [(0, 0), (1, 0), (1, 1), (1, 2)]
76+
77+
.. note::
78+
Ranks in combinatorics are typically natural numbers. However,
79+
we refer to this tuple of shape and label rank as a rank because
80+
it serves the same purpose of indexing trees in an enumeration.
81+
82+
For details on how shapes and labellings are ordered, see
83+
:ref:`sec_enumerating_topologies`.
84+
85+
We can also reconstruct a leaf-labelled tree given its rank. This process
86+
is known as unranking, and can be performed using the :meth:`Tree.unrank`
87+
method.
88+
89+
.. code-block:: python
90+
91+
for rank in [(0, 0), (1, 0), (1, 1), (1, 2)]:
92+
t = Tree.unrank(rank, num_leaves=3)
93+
display(SVG(t.draw(node_labels={0: 0, 1: 1, 2: 2}, order="tree")))
94+
95+
.. image:: _static/topology_0_0.svg
96+
:width: 24%
97+
.. image:: _static/topology_1_0.svg
98+
:width: 24%
99+
.. image:: _static/topology_1_1.svg
100+
:width: 24%
101+
.. image:: _static/topology_1_2.svg
102+
:width: 24%
103+
104+
++++++++
105+
Examples
106+
++++++++
107+
108+
One application of tree ranks is to count the different
109+
leaf-labelled topologies in a tree sequence. Since the ranks
110+
are just tuples, we can use a Python ``Counter`` to track them.
111+
Here, we count and unrank the most frequently seen
112+
topology in a tree sequence. For brevity, this example assumes
113+
samples are synonymous with leaves.
114+
115+
.. code-block:: python
116+
117+
rank_counts = collections.Counter(t.rank() for t in ts.trees())
118+
most_freq_rank, count = rank_counts.most_common(1)[0]
119+
Tree.unrank(most_freq_rank, num_leaves=ts.num_samples())
120+
121+
.. _sec_enumerating_topologies:
122+
123+
++++++++++++++++++++++
124+
Enumerating Topologies
125+
++++++++++++++++++++++
126+
127+
This section expands briefly on the approach used to enumerate
128+
tree topologies that serves as the basis for :meth:`Tree.rank`
129+
and :meth:`Tree.unrank`.
130+
To enumerate all rooted, leaf-labelled tree topologies, we first
131+
formulate a system of ordering and enumerating tree shapes. Then
132+
we define an enumeration of labellings given an arbitrary tree shape.
133+
134+
***********************
135+
Enumerating Tree Shapes
136+
***********************
137+
138+
Starting with :math:`n = 1`, we see that the only shape for a tree
139+
with a single leaf is a single root leaf. A tree with :math:`n > 1`
140+
leaves can be obtained by joining at least two trees whose number of
141+
leaves sum to :math:`n`.
142+
This maps very closely to the concept of integer partitions.
143+
Each tree shape of :math:`n` leaves can be represented by taking a
144+
nondecreasing integer partition of :math:`n` (elements of the partition
145+
are sorted in nondecreasing order) and recursively partitioning its
146+
elements. The order in which we select partitions of :math:`n` is
147+
determined by the efficient
148+
`rule_asc <http://jeromekelleher.net/generating-integer-partitions.html>`_
149+
algorithm for generating them.
150+
151+
All tree shapes with four leaves, and the partitions that generate
152+
them, are:
153+
154+
.. image:: _static/four_leaf_tree_shapes.png
155+
:alt: All four-leaf tree shapes and their generating partitions
156+
157+
Note that the middle column reflects all tree shapes of three leaves
158+
in the right subtree!
159+
160+
`*` This excludes the partition [:math:`n`], since this would create a unary node
161+
and trees with unary nodes are inumerable (and potentially infinite).
162+
163+
.. note::
164+
Using nondecreasing integer partitions enforces a
165+
*canonical orientation* on the tree shapes, where children under a node are
166+
ordered by the number of leaves below them.
167+
This is important because it prevents us from repeating trees that are
168+
topologically the same but whose children are ordered differently.
169+
170+
*********************
171+
Labelling Tree Shapes
172+
*********************
173+
174+
Tree shapes are useful in and of themselves, but we can use the enumeration
175+
formulated above to go further and assign labels to the leaves of each shape.
176+
177+
Say we are given a tree :math:`T` with :math:`n` leaves, whose left-most
178+
subtree, :math:`T_l`, has `k` leaves. For each of the :math:`n \choose k`
179+
ways to select labels to assign to :math:`T_l`, we produce a unique labelling
180+
of :math:`T`. This process of choosing labels is repeated for the other
181+
children of :math:`T` and then recursively for the subtrees.
182+
183+
Looking back to the example from :ref:`sec_tree_ranks`, we can see
184+
the different unique ways to label a particular tree of three leaves.
185+
186+
.. image:: _static/topology_1_0.svg
187+
:width: 32%
188+
.. image:: _static/topology_1_1.svg
189+
:width: 32%
190+
.. image:: _static/topology_1_2.svg
191+
:width: 32%
192+
193+
The order of the tree labellings is a direct result of the way in which
194+
combinations of labels are chosen. The implementation in tskit uses a
195+
standard lexicographic ordering to choose labels. See how the trees
196+
are sorted by the order in which the left leaf's label was chosen.
197+
198+
.. note::
199+
There is a caveat here regarding symmetry, similar to that of repeating
200+
tree shapes. Symmetrical trees run the risk of creating redundant labellings
201+
if all combinations of labels were exhausted. To prevent redundant labellings
202+
we impose a *canonical labelling*. In the case of two symmetrical subtrees,
203+
the left subtree must receive the minimum label from the label set. Notice
204+
how this is the case in the right subtrees above.
205+
206+
These two enumerations create a complete ordering of topologies where trees are
207+
ordered first by size (number of leaves), then by shape, then by their minimum
208+
label. It is this canonical order that enables efficient ranking and unranking
209+
of topologies.
210+

docs/python-api.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -487,6 +487,21 @@ using a schema. See :ref:`sec_metadata`, :ref:`sec_metadata_api_overview` and
487487

488488
.. _sec_stats_api:
489489

490+
*************
491+
Combinatorics
492+
*************
493+
The following are generators for fully enumerating unique tree topologies.
494+
The position of a tree in the enumeration ``all_trees`` is given by
495+
:meth:`Tree.rank`. Inversely, a :class:`Tree` can be constructed from a
496+
position in the enumeration with :meth:`Tree.unrank`.
497+
See :ref:`sec_combinatorics` for details.
498+
499+
.. autofunction:: tskit.all_trees
500+
501+
.. autofunction:: tskit.all_tree_shapes
502+
503+
.. autofunction:: tskit.all_tree_labellings
504+
490505
**********************
491506
Linkage disequilibrium
492507
**********************

python/tests/test_combinatorics.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -173,16 +173,23 @@ def test_all_labelled_trees_4(self):
173173
def test_generate_trees_roundtrip(self):
174174
n = 5
175175
all_rank_trees = RankTree.all_labelled_trees(n)
176-
all_tsk_trees = comb.all_trees(n)
176+
all_tsk_trees = tskit.all_trees(n)
177177
for rank_tree, tsk_tree in zip(all_rank_trees, all_tsk_trees):
178178
self.assertEqual(rank_tree, RankTree.from_tsk_tree(tsk_tree))
179179

180+
def test_all_shapes_roundtrip(self):
181+
n = 5
182+
all_rank_tree_shapes = RankTree.all_unlabelled_trees(n)
183+
all_tsk_tree_shapes = tskit.all_tree_shapes(n)
184+
for rank_tree, tsk_tree in zip(all_rank_tree_shapes, all_tsk_tree_shapes):
185+
self.assertTrue(rank_tree.shape_equal(RankTree.from_tsk_tree(tsk_tree)))
186+
180187
def test_all_labellings_roundtrip(self):
181188
n = 5
182189
rank_tree = RankTree.unrank((comb.num_shapes(n) - 1, 0), n)
183190
tsk_tree = rank_tree.to_tsk_tree()
184191
rank_tree_labellings = RankTree.all_labellings(rank_tree)
185-
tsk_tree_labellings = comb.all_tree_labellings(tsk_tree)
192+
tsk_tree_labellings = tskit.all_tree_labellings(tsk_tree)
186193
for rank_t, tsk_t in zip(rank_tree_labellings, tsk_tree_labellings):
187194
self.assertEqual(rank_t, RankTree.from_tsk_tree(tsk_t))
188195

python/tskit/__init__.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,11 @@
5252
from tskit.trees import * # NOQA
5353
from tskit.tables import * # NOQA
5454
from tskit.stats import * # NOQA
55-
from tskit.combinatorics import * # NOQA
55+
from tskit.combinatorics import ( # NOQA
56+
all_trees,
57+
all_tree_shapes,
58+
all_tree_labellings,
59+
)
5660
from tskit.exceptions import * # NOQA
5761
from tskit.util import * # NOQA
5862
from tskit.metadata import * # NOQA

0 commit comments

Comments
 (0)