Skip to content

Commit dc45651

Browse files
hyanwongmergify[bot]
authored andcommitted
Clarify "canonical" in docs
And remove it where it was not being correctly used.
1 parent 31efb75 commit dc45651

File tree

1 file changed

+23
-16
lines changed

1 file changed

+23
-16
lines changed

python/tskit/tables.py

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1488,7 +1488,10 @@ def squash(self):
14881488
The new edge will have the same parent and child node, a left coordinate
14891489
equal to the smallest left coordinate in the set, and a right coordinate
14901490
equal to the largest right coordinate in the set.
1491-
The new edge table will be sorted in the canonical order (P, C, L, R).
1491+
The new edge table will be sorted in the order (P, C, L, R): if the node table
1492+
is ordered by increasing node time, as is common, this order will meet the
1493+
:ref:`sec_edge_requirements` for a valid tree sequence, otherwise you will need
1494+
to call :meth:`.sort` on the entire :class:`TableCollection`.
14921495
14931496
.. note::
14941497
Note that this method will fail if any edges have non-empty metadata.
@@ -3345,13 +3348,13 @@ def copy(self):
33453348

33463349
def tree_sequence(self):
33473350
"""
3348-
Returns a :class:`TreeSequence` instance from the tables defined in
3349-
this :class:`TableCollection`. If the table collection is not
3350-
in canonical form (i.e., does not meet sorting requirements) or cannot be
3351-
interpreted as a tree sequence an exception is raised. The
3352-
:meth:`.sort` method may be used to ensure that input sorting requirements
3353-
are met. If the table collection does not have indexes they will be
3354-
built.
3351+
Returns a :class:`TreeSequence` instance from the tables defined in this
3352+
:class:`TableCollection`, building the required indexes if they have not yet
3353+
been created by :meth:`.build_index`. If the table collection does not meet
3354+
the :ref:`sec_valid_tree_sequence_requirements`, for example if the tables
3355+
are not correctly sorted or if they cannot be interpreted as a tree sequence,
3356+
an exception is raised. Note that in the former case, the :meth:`.sort`
3357+
method may be used to ensure that sorting requirements are met.
33553358
33563359
:return: A :class:`TreeSequence` instance reflecting the structures
33573360
defined in this set of tables.
@@ -3610,14 +3613,17 @@ def sort_individuals(self):
36103613

36113614
def canonicalise(self, remove_unreferenced=None):
36123615
"""
3613-
This puts the tables in *canonical* form - to do this, the individual
3616+
This puts the tables in *canonical* form, imposing a stricter order on the
3617+
tables than :ref:`required <sec_valid_tree_sequence_requirements>` for
3618+
a valid tree sequence. In particular, the individual
36143619
and population tables are sorted by the first node that refers to each
3615-
(see :meth:`TreeSequence.subset`) Then, the remaining tables are sorted
3620+
(see :meth:`TreeSequence.subset`). Then, the remaining tables are sorted
36163621
as in :meth:`.sort`, with the modification that mutations are sorted by
36173622
site, then time, then number of descendant mutations (ensuring that
36183623
parent mutations occur before children), then node, then original order
3619-
in the tables. This ensures that any two tables with the same
3620-
information should be identical after canonical sorting.
3624+
in the tables. This ensures that any two tables with the same information
3625+
and node order should be identical after canonical sorting (note
3626+
that no canonical order exists for the node table).
36213627
36223628
By default, the method removes sites, individuals, and populations that
36233629
are not referenced (by mutations and nodes, respectively). If you wish
@@ -4139,10 +4145,11 @@ def ibd_segments(
41394145
documentation for more details, and use this method only if you specifically need
41404146
to work with a :class:`TableCollection` object.
41414147
4142-
This method has the same input data requirements as
4143-
:meth:`TableCollection.simplify`. In particular, the table collection must be
4144-
sorted in the canonical order. To find out if this is needed, see
4145-
:meth:`TableCollection.sort`. If the edge table contains any edges with identical
4148+
This method has the same data requirements as
4149+
:meth:`TableCollection.simplify`. In particular, the tables in the collection
4150+
have :ref:`required <sec_valid_tree_sequence_requirements>` sorting orders.
4151+
To enforce this, you can call :meth:`TableCollection.sort` before using this
4152+
method. If the edge table contains any edges with identical
41464153
parents and children over adjacent genomic intervals, any IBD intervals
41474154
underneath the edges will also be split across the breakpoint(s). To prevent this
41484155
behaviour in this situation, use :meth:`EdgeTable.squash` beforehand.

0 commit comments

Comments
 (0)