|
| 1 | +.. currentmodule:: tskit |
1 | 2 | .. _sec_combinatorics: |
2 | 3 |
|
3 | | -===================================== |
4 | | -Ranking and Unranking Tree Topologies |
5 | | -===================================== |
6 | | -TODO |
| 4 | +============= |
| 5 | +Combinatorics |
| 6 | +============= |
| 7 | +tskit uses a combinatorial approach to identify unique topologies of |
| 8 | +rooted, leaf-labelled trees. It provides methods |
| 9 | +for enumerating all possible tree topologies, as well as converting |
| 10 | +back and forth between a tree and its position, or rank, in the |
| 11 | +enumeration of all possible topologies. |
| 12 | +These methods do not only apply to binary trees; |
| 13 | +rather, they cover general, rooted trees without unary nodes. |
| 14 | + |
| 15 | +================================= ===================================== |
| 16 | +:meth:`Tree.rank` Return the rank of this tree. |
| 17 | +:meth:`Tree.unrank` Return a Tree given its rank and |
| 18 | + a number of leaves. |
| 19 | +:func:`tskit.all_trees` Return a generator over all |
| 20 | + leaf-labelled trees of n leaves. |
| 21 | +:func:`tskit.all_tree_shapes` Return a generator over all |
| 22 | + tree shapes of n leaves. |
| 23 | +:func:`tskit.all_tree_labellings` Return a generator over all |
| 24 | + labellings of the given tree's shape. |
| 25 | +================================= ===================================== |
| 26 | + |
| 27 | +.. _sec_tree_ranks: |
| 28 | + |
| 29 | ++++++++++++++++++++++++ |
| 30 | +Interpreting Tree Ranks |
| 31 | ++++++++++++++++++++++++ |
| 32 | +To understand tree ranks we must look at how leaf-labelled tree topologies |
| 33 | +are enumerated. For example, we can use :func:`tskit.all_trees` |
| 34 | +to generate all possible topologies of three leaves: |
| 35 | + |
| 36 | +.. code-block:: python |
| 37 | +
|
| 38 | + for t in tskit.all_trees(num_leaves=3): |
| 39 | + display(SVG(t.draw(node_labels={0: 0, 1: 1, 2: 2}, order="tree"))) |
| 40 | +
|
| 41 | +.. image:: _static/topology_0_0.svg |
| 42 | + :width: 24% |
| 43 | +.. image:: _static/topology_1_0.svg |
| 44 | + :width: 24% |
| 45 | +.. image:: _static/topology_1_1.svg |
| 46 | + :width: 24% |
| 47 | +.. image:: _static/topology_1_2.svg |
| 48 | + :width: 24% |
| 49 | + |
| 50 | +In this sequence, there exist two distinct tree shapes and each shape |
| 51 | +can be labelled in at least one unique way. Given that topologies are |
| 52 | +ordered first by their shape and then by their labelling, a tree |
| 53 | +topology can be uniquely identified by |
| 54 | + |
| 55 | +1. |
| 56 | + The shape of the tree |
| 57 | +2. |
| 58 | + The labelling of the tree's shape |
| 59 | + |
| 60 | +We can refer to the first tree in the above enumeration as the |
| 61 | +first labelling of the first shape of trees with three leaves, or tree |
| 62 | +:math:`(0, 0)`. The second tree can be identified as the first labelling |
| 63 | +of the second shape, or :math:`(1, 0)`, and so on. |
| 64 | +This pair of indexes for the shape and labelling of a tree is referred |
| 65 | +to as the rank of the tree, and can be computed using the |
| 66 | +:meth:`Tree.rank` method. |
| 67 | + |
| 68 | +.. code-block:: python |
| 69 | +
|
| 70 | + ranks = [t.rank() for t in tskit.all_trees(num_leaves=3)] |
| 71 | + print("Ranks of 3-leaf trees:", ranks) |
| 72 | +
|
| 73 | +:: |
| 74 | + |
| 75 | + Ranks of 3-leaf trees: [(0, 0), (1, 0), (1, 1), (1, 2)] |
| 76 | + |
| 77 | +.. note:: |
| 78 | + Ranks in combinatorics are typically natural numbers. However, |
| 79 | + we refer to this tuple of shape and label rank as a rank because |
| 80 | + it serves the same purpose of indexing trees in an enumeration. |
| 81 | + |
| 82 | +For details on how shapes and labellings are ordered, see |
| 83 | +:ref:`sec_enumerating_topologies`. |
| 84 | + |
| 85 | +We can also reconstruct a leaf-labelled tree given its rank. This process |
| 86 | +is known as unranking, and can be performed using the :meth:`Tree.unrank` |
| 87 | +method. |
| 88 | + |
| 89 | +.. code-block:: python |
| 90 | +
|
| 91 | + for rank in [(0, 0), (1, 0), (1, 1), (1, 2)]: |
| 92 | + t = Tree.unrank(rank, num_leaves=3) |
| 93 | + display(SVG(t.draw(node_labels={0: 0, 1: 1, 2: 2}, order="tree"))) |
| 94 | +
|
| 95 | +.. image:: _static/topology_0_0.svg |
| 96 | + :width: 24% |
| 97 | +.. image:: _static/topology_1_0.svg |
| 98 | + :width: 24% |
| 99 | +.. image:: _static/topology_1_1.svg |
| 100 | + :width: 24% |
| 101 | +.. image:: _static/topology_1_2.svg |
| 102 | + :width: 24% |
| 103 | + |
| 104 | +++++++++ |
| 105 | +Examples |
| 106 | +++++++++ |
| 107 | + |
| 108 | +One application of tree ranks is to count the different |
| 109 | +leaf-labelled topologies in a tree sequence. Since the ranks |
| 110 | +are just tuples, we can use a Python ``Counter`` to track them. |
| 111 | +Here, we count and unrank the most frequently seen |
| 112 | +topology in a tree sequence. For brevity, this example assumes |
| 113 | +samples are synonymous with leaves. |
| 114 | + |
| 115 | +.. code-block:: python |
| 116 | +
|
| 117 | + rank_counts = collections.Counter(t.rank() for t in ts.trees()) |
| 118 | + most_freq_rank, count = rank_counts.most_common(1)[0] |
| 119 | + Tree.unrank(most_freq_rank, num_leaves=ts.num_samples()) |
| 120 | +
|
| 121 | +.. _sec_enumerating_topologies: |
| 122 | + |
| 123 | +++++++++++++++++++++++ |
| 124 | +Enumerating Topologies |
| 125 | +++++++++++++++++++++++ |
| 126 | + |
| 127 | +This section expands briefly on the approach used to enumerate |
| 128 | +tree topologies that serves as the basis for :meth:`Tree.rank` |
| 129 | +and :meth:`Tree.unrank`. |
| 130 | +To enumerate all rooted, leaf-labelled tree topologies, we first |
| 131 | +formulate a system of ordering and enumerating tree shapes. Then |
| 132 | +we define an enumeration of labellings given an arbitrary tree shape. |
| 133 | + |
| 134 | +*********************** |
| 135 | +Enumerating Tree Shapes |
| 136 | +*********************** |
| 137 | + |
| 138 | +Starting with :math:`n = 1`, we see that the only shape for a tree |
| 139 | +with a single leaf is a single root leaf. A tree with :math:`n > 1` |
| 140 | +leaves can be obtained by joining at least two trees whose number of |
| 141 | +leaves sum to :math:`n`. |
| 142 | +This maps very closely to the concept of integer partitions. |
| 143 | +Each tree shape of :math:`n` leaves can be represented by taking a |
| 144 | +nondecreasing integer partition of :math:`n` (elements of the partition |
| 145 | +are sorted in nondecreasing order) and recursively partitioning its |
| 146 | +elements. The order in which we select partitions of :math:`n` is |
| 147 | +determined by the efficient |
| 148 | +`rule_asc <http://jeromekelleher.net/generating-integer-partitions.html>`_ |
| 149 | +algorithm for generating them. |
| 150 | + |
| 151 | +All tree shapes with four leaves, and the partitions that generate |
| 152 | +them, are: |
| 153 | + |
| 154 | +.. image:: _static/four_leaf_tree_shapes.png |
| 155 | + :alt: All four-leaf tree shapes and their generating partitions |
| 156 | + |
| 157 | +Note that the middle column reflects all tree shapes of three leaves |
| 158 | +in the right subtree! |
| 159 | + |
| 160 | +`*` This excludes the partition [:math:`n`], since this would create a unary node |
| 161 | +and trees with unary nodes are inumerable (and potentially infinite). |
| 162 | + |
| 163 | +.. note:: |
| 164 | + Using nondecreasing integer partitions enforces a |
| 165 | + *canonical orientation* on the tree shapes, where children under a node are |
| 166 | + ordered by the number of leaves below them. |
| 167 | + This is important because it prevents us from repeating trees that are |
| 168 | + topologically the same but whose children are ordered differently. |
| 169 | + |
| 170 | +********************* |
| 171 | +Labelling Tree Shapes |
| 172 | +********************* |
| 173 | + |
| 174 | +Tree shapes are useful in and of themselves, but we can use the enumeration |
| 175 | +formulated above to go further and assign labels to the leaves of each shape. |
| 176 | + |
| 177 | +Say we are given a tree :math:`T` with :math:`n` leaves, whose left-most |
| 178 | +subtree, :math:`T_l`, has `k` leaves. For each of the :math:`n \choose k` |
| 179 | +ways to select labels to assign to :math:`T_l`, we produce a unique labelling |
| 180 | +of :math:`T`. This process of choosing labels is repeated for the other |
| 181 | +children of :math:`T` and then recursively for the subtrees. |
| 182 | + |
| 183 | +Looking back to the example from :ref:`sec_tree_ranks`, we can see |
| 184 | +the different unique ways to label a particular tree of three leaves. |
| 185 | + |
| 186 | +.. image:: _static/topology_1_0.svg |
| 187 | + :width: 32% |
| 188 | +.. image:: _static/topology_1_1.svg |
| 189 | + :width: 32% |
| 190 | +.. image:: _static/topology_1_2.svg |
| 191 | + :width: 32% |
| 192 | + |
| 193 | +The order of the tree labellings is a direct result of the way in which |
| 194 | +combinations of labels are chosen. The implementation in tskit uses a |
| 195 | +standard lexicographic ordering to choose labels. See how the trees |
| 196 | +are sorted by the order in which the left leaf's label was chosen. |
| 197 | + |
| 198 | +.. note:: |
| 199 | + There is a caveat here regarding symmetry, similar to that of repeating |
| 200 | + tree shapes. Symmetrical trees run the risk of creating redundant labellings |
| 201 | + if all combinations of labels were exhausted. To prevent redundant labellings |
| 202 | + we impose a *canonical labelling*. In the case of two symmetrical subtrees, |
| 203 | + the left subtree must receive the minimum label from the label set. Notice |
| 204 | + how this is the case in the right subtrees above. |
| 205 | + |
| 206 | +These two enumerations create a complete ordering of topologies where trees are |
| 207 | +ordered first by size (number of leaves), then by shape, then by their minimum |
| 208 | +label. It is this canonical order that enables efficient ranking and unranking |
| 209 | +of topologies. |
| 210 | + |
0 commit comments