🎉 Version 1 is here!!! 🥳
tskit development doesn't end here, but this marks the point at which:
Breaking changes will not be made except where it is unavoidable to correct incorrect behaviour or where they are forced by external factors such as dependencies
Full credit for this release and for tskit generally goes to the wonderful community of contributors, who you can see here: https://tskit.dev/software/tskit.html
Full changelog:
Breaking changes
-
The
reference_sequenceargument toTreeSequence.alignmentsis now
required to be the same length as the tree sequence. Previously it was
required to be the length of the requested interval.
(@benjeffery, #3317) -
TreeSequence.tablesnow returns a zero-copy immutable view of the tables.
To get a mutable copy, useTreeSequence.dump_tables().
(@benjeffery, #3288, #760) -
For a tree sequence to be valid, the mutation parents in the table collection
must be correct and consistent with the topology of the tree at each mutation site.
TableCollection.tree_sequence()will raise a_tskit.LibraryErrorif this
is not the case.
(@benjeffery, #2729, #2732, #3212). -
Drop Python 3.9 support and require Python >= 3.10.
(#3267, @benjeffery) -
ltrim,rtrim,trimandshiftraise an error if they are
used on a tree sequence containing a reference sequence.
(@hyanwong, #3210, #2091)
Features
-
Add
tskit.jit.numba.jitwrapandNumbaTreeSequenceto allow simplified
use and development of Numba-jitted functions with tree sequences. See the
documentation <https://tskit.dev/tskit/docs/stable/numba.html>_ for details.
(@andrewkern, #3295, #3294) -
TreeSequence.map_to_vcf_modelnow also returns the transformed positions and
contig length. (@benjeffery, #3174, #3173) -
draw_svg()methods now associate tree branches with edge IDs.
(@hyanwong, #3193, #557) -
draw_svg()methods now allow the y-axis to be placed on the right-hand side
usingy_axis="right". (@hyanwong, #3201) -
Add
contig_idandisolated_as_missingtoVcfModelMapping
(@benjeffery, #3219, #3177). -
Add
TreeSequence.mutations_edge, which returns the edge ID for each mutation's
edge. (@benjeffery, #3226, #3189) -
Add
TreeSequence.sites_ancestral_state,TreeSequence.mutations_derived_stateand
TreeSequence.mutations_inherited_stateproperties to return the ancestral state of sites,
the derived state of mutations and the inherited state of mutations as NumPy arrays of
the new NumPy 2.0StringDType.
(@benjeffery, #3228, #2632, #3276, #2631) -
Tskit now requires NumPy version 2 or later. However, you can still use
tskit with NumPy 1.x by building tskit from source with NumPy 1.x using
pip install tskit --no-binary tskit. With NumPy 1.x, any use of the new
StringDTypeproperties will result in aRuntimeError. If you try to
use another Python module that was compiled against NumPy 1.x with NumPy 2.x
you may see the error "A module that was compiled using NumPy 1.x cannot be
run in NumPy 2.0.0 as it may crash.". If no newer version of the module is
available you will have to use the NumPy 1.x build as above. -
Add
Mutation.inherited_stateproperty which returns the inherited state
for a single mutation. (@benjeffery, #3277, #2631) -
Add
all_mutationsandall_edgesoptions toTreeSequence.union,
allowing greater flexibility in "disjoint union" situations.
(@hyanwong, @petrelharp, #3181) -
Add
TreeSequence.divergence_matrix, which was previously undocumented. -
TreeSequence.variants,.genotype_matrix,.haplotypes, and.alignmentsmethods
now fully supportisolated_as_missingbehaviour with internal nodes..alignmentsis
also around 10% faster.
(@benjeffery, #3313, #3317, #1896)
Bugfixes
-
In some tables with mutations out-of-order
TableCollection.sortdid not re-order
the mutations so they formed a valid TreeSequence.TableCollection.sortand
TableCollection.canonicalisenow sort mutations by site, then time (if known),
then the mutation's node's time, then number of descendant mutations
(ensuring that parent mutations occur before children), then node, then
their original order in the tables. (@benjeffery, #3257, #3253) -
Fix bug in
TreeSequence.genetic_relatedness_vectorthat previously ignored
span_normalise: previously,span_normalisewas always set toFalse;
now the default isTruein agreement with other statistics, so the returned
values will change. (@petrelharp, #3300, #3241) -
Fix bug in
TreeSequence.pair_coalescence_countswhenspan_normalise=True
and a window breakpoint falls within an internal missing interval.
(@nspope, #3176, #3175) -
Fix metadata schemas that are equal but have different byte representations not
being considered equal when usingTableCollection.assert_equalsand
Table.assert_equals.
(@benjeffery, #3246, #3244) -
k-way statistics no longer require k sample sets, allowing in particular
"self" comparisons forTreeSequence.genetic_relatedness. This changes the
error code returned in some situations.
(@andrewkern, @petrelharp, #3235, #3055) -
Fix
UnboundLocalErrorindraw_svg()when using numericmax_time
values with mutations over roots.
(@benjeffery, #3274, #3273) -
Prevent iterating over a
TopologyCounter.
(@benjeffery, #3202, #1462) -
Fix
TreeSequence.concatenate()to work with internal samples by using the
all_mutationsandall_edgesparameters inunion().
(@hyanwong, #3283, #3181)