Python 0.4.0 BETA 1
BETA RELEASE
- Install with
pip install --pre tskit - Please report any issues.
Breaking changes
-
The
Tree.num_nodesmethod is now deprecated with a warning, because it confusingly
returns the number of nodes in the entire tree sequence, rather than in the tree. Text
summaries of trees (e.g.str(tree)) now return the number of nodes in the tree,
not in the entire tree sequence (@hyanwong, #1966 #1968) -
The CLI
infocommand now gives more detailed information on the tree sequence
(@benjeffery, #1611) -
64 bits are now used to store the sizes of ragged table columns such as metadata,
allowing them to hold more data. This change is fully backwards and forwards compatible
for all tree-sequences whose ragged column sizes fit into 32 bits. New tree-sequences with
large offset arrays that require 64 bits will fail to load in previous versions with
error_tskit.FileFormatError: An incompatible type for a column was found in the file.
(@jeromekelleher, #343, #1527, #1528, #1530,
#1554, #1573, #1589,#1598,#1628, #1571,
#1579, #1585, #1590, #1602, #1618, #1620, #1652). -
The Tree class now conceptually has an extra node, the "virtual root" whose
children are the roots of the tree. The quintuply linked tree arrays
(parent_array, left_child_array, right_child_array, left_sib_array and right_sib_array)
all have one extra element.
(@jeromekelleher, #1691, #1704). -
Tree traversal orders returned by the
nodesmethod have changed when there
are multiple roots. Previously orders were defined locally for each root, but
are now globally across all roots. (@jeromekelleher, #1704). -
Individuals are no longer guaranteed or required to be topologically sorted in a tree sequence.
TableCollection.sortno longer sorts individuals.
(@benjeffery, #1774, #1789) -
Metadata encoding errors now raise
MetadataEncodingError
(@benjeffery, #1505, #1827). -
For
TreeSequence.samplesall arguments afterpopulationare now keyword only
(@benjeffery, #1715, #1831). -
Remove the method
TreeSequence.to_nexusand replace withTreeSequence.as_nexus.
As the old method was not generating standards-compliant output, it seems unlikely
that it was used by anyone. Calls toto_nexuswill result in a
NotImplementedError, informing users of the change. See below for details on
as_nexus. -
Change default value for
missing_data_charin theTreeSequence.haplotypes
method from "-" to "N". This is a more idiomatic usage to indicate
missing data rather than a gap in an alignment. (@jeromekelleher,
#1893, #1894)
Features
-
Allow skipping of site and mutation tables in
TableCollection.sort
(@benjeffery, #1475, #1826). -
Add
TableCollection.sort_individualsto sort the individuals as this is no longer done by the
default sort (@benjeffery, #1774, #1789). -
Add
__setitem__to all tables allowing single rows to be updated. For example
tables.nodes[0] = tables.nodes[0].replace(flags=tskit.NODE_IS_SAMPLE)
(@jeromekelleher, @benjeffery, #1545, #1600). -
Added a new parameter
timetoTreeSequence.samples()allowing to select
samples at a specific time point or time interval.
(@mufernando, @petrelharp, #1692, #1700) -
Add
table.metadata_vectorto all table classes to allow easy extraction of a single
metadata key into an array
(@petrelharp, #1676, #1690). -
Add
time_unitstoTreeSequenceto describe the units of the time dimension of the
tree sequence. This is then used to generate an error iftime_unitsisuncalibratedwhen
using the branch lengths in statistics. (@benjeffery, #1644, #1760, #1832) -
Add the
virtual_rootproperty to the Tree class (@jeromekelleher, #1704). -
Add the
num_edgesproperty to the Tree class (@jeromekelleher, #1704). -
Improved performance for tree traversal methods in the
nodesiterator.
Roughly a 10X performance increase for "preorder", "postorder", "timeasc"
and "timedesc" (@jeromekelleher, #1704). -
Substantial performance improvement for
Tree.total_branch_length
(@jeromekelleher, #1794 #1799) -
Add the
discrete_genomeproperty to the TreeSequence class which is true if
all coordinates are discrete (@jeromekelleher, #1144, #1819) -
Add a
random_nucleotidesfunction. (user:jeromekelleher, #1825) -
Add the
TreeSequence.alignmentsmethod. (user:jeromekelleher, #1825) -
Add alignment export in the FASTA and nexus formats using the
TreeSequence.write_nexusandTreeSequence.write_fastamethods.
(@jeromekelleher, @hyanwong, #1894) -
Add the
discrete_timeproperty to the TreeSequence class which is true if
all time coordinates are discrete or unknown (@benjeffery, #1839, #1890) -
Add the
skip_tablesoption toloadto support only loading
top-level information from a file. Also add theignore_tablesoption to
TableCollection.equalsandTableCollection.assert_equalsto
compare only top-level information. (@clwgg, #1882, #1854). -
Add the
skip_reference_sequenceoption toload. Also add the
ignore_reference_sequenceoptionequalsto compare two table
collections without comparing their reference sequence. (@clwgg,
#2019, #1971). -
tskit now supports python 3.10 (@benjeffery, #1895, #1949)
Fixes
-
dump_tablesomitted individual parents. (@benjeffery, #1828, #1884) -
Add the
Tree.as_newickmethod and deprecateTree.newick. The
as_newickmethod by default labels samples with the pattern"n{node_id}"
which is much more useful that the behaviour ofTree.newick(which mimics
msoutput). (@jeromekelleher, #1671, #1838.) -
Add the
as_nexusandwrite_nexusmethods to the TreeSequence class,
replacing the brokento_nexusmethod (see above). This uses the same
sample labelling pattern asas_newick.
(@jeetsukumaran, @jeromekelleher, #1785, #1835,
#1836, #1838) -
load_textcreated additional populations even if the population table was specified,
and didn't strip newlines from input text (@hyanwong, #1909, #1910)