Python 0.3.0
Major feature release
This release adds metadata schemas, set-like operations, mutation times, SVG drawing improvements and many others. This release also comes with wheels for windows, osx and linux.
❤️ Many thanks go to the tskit community and contributors for their awesome work on this release. ❤️
Breaking changes
-
The default display order for tree visualisations has been changed to
minlex(see below) to stabilise the node ordering and to make trees more readily comparable. The old behaviour is still available withorder="tree". -
File system operations such as dump/load now raise an appropriate OSError instead of
tskit.FileFormatError. Loading from an empty file now raises andEOFError. -
Bad tree topologies are detected earlier, so that it is no longer possible to create a
TreeSequenceobject which contains a parent with contradictory children on an interval. Previously an error was thrown when some operation building the trees was attempted (@jeromekelleher, #709). -
The
TableCollection objectno longer implements the iterator protocol. Previouslylist(tables)returned a sequence of (table_name, table_instance) tuples. This has been replaced with the more intuitive and future-proofTableCollection.name_mapandTreeSequence.tables_dictattributes, which perform the same function (@jeromekelleher, #500, #694). -
The arguments to
TreeSequence.genotype_matrix,TreeSequence.haplotypesandTreeSequence.variantsmust now be keyword arguments, not positional. This is to support the change fromimpute_missing_datatoisolated_as_missingin the arguments to these methods (@benjeffery, #716, #794).
New features
-
New methods to perform set operations on TableCollections and TreeSequences.
TableCollection.subsetsubsets and reorders table collections by nodes (@mufernando, @petrelharp, #663, #690).TableCollection.unionforms the node-wise union of two table collections (@mufernando, @petrelharp, #381 #623). -
Mutations now have an optional double-precision floating-point
timecolumn. If not specified, this defaults to a particularNaNvalue (tskit.UNKNOWN_TIME) indicating that the time is unknown. For a tree sequence to be considered valid it must meet new criteria for mutation times, see mutation requirements. Also added functionTableCollection.compute_mutation_times. Table sorting orders mutations by non-increasing time per-site, which is also a requirement for a valid tree sequence (@benjeffery, #672). -
Add support for trees with internal samples for the Kendall-Colijn tree distance metric. (@daniel-goldstein, #610)
-
Add background shading to SVG tree sequences to reflect tree position along the sequence (@hyanwong, #563).
-
Tables with a metadata column now have a
metadata_schemathat is used to validate and encode metadata that is passed toadd_rowand decode metadata on calls totable[j]and e.g.tree_sequence.node(j)See metadata (@benjeffery, #491, #542, #543, #601). -
The tree-sequence now has top-level metadata with a schema (@benjeffery, #666, #644, #642).
-
Add classes to SVG drawings to allow easy adjustment and styling, and document the new
tskit.Tree.draw_svg()andtskit.TreeSequence.draw_svg()methods. This also fixes #467 for duplicate SVG entityids in Jupyter notebooks (@hyanwong, #555). -
Add a
to_nexusfunction that outputs a tree sequence in Nexus format (@saunack, #550). -
Add extension of Kendall-Colijn tree distance metric for tree sequences computed by
TreeSequence.kc_distance(@daniel-goldstein, #548). -
Add an optional node traversal order in
tskit.Treethat uses the minimum lexicographic order of leaf nodes visited. This ordering ("minlex_postorder") adds more determinism because it constraints the order in which children of a node are visited (@brianzhang01, #411). -
Add an
orderargument to the tree visualisation functions which supports two node orderings:"tree"(the previous default) and"minlex"which stabilises the node ordering (making it easier to compare trees). The default node ordering is changed to"minlex"(@brianzhang01, @jeromekelleher, #389, #566). -
Add
_repr_html_to tables, so that jupyter notebooks render them as html tables (@benjeffery, #514). -
Remove support for
kc_distanceon trees with unary nodes (@daniel-goldstein, #508). -
Improve Kendall-Colijn tree distance algorithm to operate in O(n^2) time instead of O(n^2 * log(n)) where n is the number of samples (@daniel-goldstein, #490).
-
Add a metadata column to the migrations table. Works similarly to existing metadata columns on other tables (@benjeffery, #505).
-
Add a metadata column to the edges table. Works similarly to existing metadata columns on other tables (@benjeffery, #496).
-
Allow sites with missing data to be output by the
haplotypesmethod, by default replacing with-. Errors are no longer raised for missing data withisolated_as_missing=True; the error types returned for bad alleles (e.g. multiletter or non-ascii) have also changed from_tskit.LibraryErrorto TypeError, or ValueError if the missing data character clashes (@hyanwong, #426). -
Access the number of children of a node in a tree directly using
tree.num_children(u)(@hyanwong, #436). -
User specified allele mapping for genotypes in
variantsandgenotype_matrix(@jeromekelleher, #430). -
New
root_thresholdoption for the Tree class, which allows us to efficiently iterate over 'real' roots when we have missing data (@jeromekelleher, #462). -
Add
tree.as_dict_of_dicts()function to enable use with networkx. See the tutorial (@winni2k, #457). -
Add
tree_sequence.to_macs()function to convert tree sequence to MACS format (@winni2k, #727). -
Add a
keep_input_rootsoption to simplify which, if enabled, adds edges from the MRCAs of samples in the simplified tree sequence back to the roots in the input tree sequence (@jeromekelleher, #775, #782).
Bugfixes
- #453 - Fix LibraryError when
tree.newick()is called with large node time values (@jeromekelleher, #637).
Deprecated
- The
sample_countsfeature has been deprecated and is now ignored. Sample counts are now always computed. - For
TreeSequence.genotype_matrix,TreeSequence.haplotypesandTreeSequence.variantstheimpute_missing_dataargument is deprecated and replaced withisolated_as_missing. Note that to get the same behaviourimpute_missing_data=Trueshould be replaced withisolated_as_missing=False(@benjeffery, #716, #794).