Releases: tskit-dev/tskit
Python 0.6.0
Breaking Changes
- The definition of
TreeSequence.genetic_relatednessand
TreeSequence.genetic_relatedness_weightedare changed
to average over sample sets, rather than summing over them.
For computation with diploid sample sets, this will change the result
by a factor of four; for larger sample sets it will now produce
sensible values that are comparable between sample sets of different sizes.
The default for these methods is also changed topolarised=True,
but the output is unchanged forcentre=True(the default).
See the documentation for these methods for more discussion.
(@petrelharp, @mmosmond, #1623)
Bugfixes
-
Fix to
TreeSequence.genetic_relatednesswithindexes=Noneand
proportion=True. (@petrelharp, #2984, #1623) -
Fix to
TreeSequence.general_statwhen using non-strict summary functions
in the presence of non-ancestral material (very rare).
(@petrelharp, #2983, #1623) -
Printing
tskit.MetadataSchema(schema=None)now shows"Null_schema"rather
thanNone, to avoid confusion (@hyanwong, #2720) -
Limit output HTML when a tree sequence is displayed that has a large amount of metadata.
(@benjeffery, #2999) -
Fix warning in
draw_svgto use correct warnings module.
(@duncanMR, #2870, #2871)
Features
-
Add the
centreoption toTreeSequence.genetic_relatednessand
TreeSequence.genetic_relatedness_weighted.
(@petrelharp, @mmosmond, #1623) -
Edges now have an
.intervalattribute returning atskit.Intervalobject.
(@hyanwong, #2531) -
Variants now have a
states()method that returns the genotypes as an
(inefficient) array of strings, rather than integer indexes, to
aid comparison of genetic variation (@hyanwong, #2617) -
Added
distance_betweenthat calculates the total distance between two nodes in a tree.
(@Billyzhang1229, #2771) -
Added
genetic_relatedness_matrixmethod to compute
pairwise genetic relatedness between sample sets.
(@jeromekelleher, @petrelharp, #2823) -
Add
TreeSequence.extend_haplotypesmethod that extends ancestral haplotypes
using recombination information, leading to unary nodes in many trees and
fewer edges. (@petrelharp, @hfr1tz3, :user:nspope,
@avabamf, #2651, #2938) -
Add
Table.drop_metadatato make clearing metadata from tables easy.
(@jeromekelleher, #2944) -
Add
Interval.midandTree.midproperties to return the midpoint of the interval.
(@currocam, #2960) -
Added
genetic_relatedness_vectormethod to compute product of genetic relatedness
matrix and weight vector.
(@petrelharp, #2980) -
Added
pair_coalescence_countsmethod to calculate coalescence events per node or time
interval,pair_coalescence_quantilesmethod to estimate quantiles of pair
coalescence times using empirical CDF inversion, andpair_coalescence_ratesmethod to
estimate instantaneous rates of pair coalescence within time intervals from the empirical CDF.
(@nspope, #2915, #2976, #2985) -
Add provenance information to the HTML notebook representation of a tree sequence.
(@benjeffery, #3001) -
The
.draw_svg()methods can add annotated genomic regions (e.g. genes) to the
x-axis. (@hyanwong, #3002) -
Added a
node_titlesand amutation_titlesparameter to.draw_svg()methods
which assigns a string to node and mutation symbols, commonly shown on mouseover. This
can reduce label clutter while retaining useful info (@hyanwong, #3007) -
Added (currently undocumented) use of the
orderparameter inTree.draw_svg()to
pass a subset of nodes, so subtrees can be visually collapsed. Additionally, an option
pack_untracked_polytomiesallows large polytomies involving untracked samples to
be summarised as a dotted line (@hyanwong, #3011 #3010, #3012) -
Added a
titleparameter to.draw_svg()methods (@hyanwong, #3015) -
Add comma separation to all display numbers. (@benjeffery, #3017, #3018)
-
Add
resourcessection to provenance schema. (@benjeffery, #3016) -
Add
Tree.rf_distancemethod to calculate the unweighted Robinson-Foulds distance
between two trees. (@Billyzhang1229, #995, #2643, #3032)
C API C_1.1.3
Features
- Add the
tsk_treeseq_extend_haplotypesmethod that can compress a tree sequence
by extending edges into adjacent trees and thus creating unary nodes in those
trees (@petrelharp, @hfr1tze, @avabamf, #2651, #2938).
Python 0.5.8
- Add support for numpy 2 (@jeromekelleher, @benjeffery, #2964)
Python 0.5.7
Breaking Changes
- The VCF writing methods (
ts.write_vcf,ts.as_vcf) now error if a site with
position zero is encountered. The VCF spec does not allow zero position sites.
Suppress this error with theallow_position_zeroargument.
(@benjeffery, #2901, #2838)
Bugfixes
- Fix to the folded, expected allele frequency spectrum (i.e.,
TreeSequence.allele_frequency_spectrum(mode="branch", polarised=False),
which was half as big as it should have been. (@petrelharp,
@nspope, #2933)
Python 0.5.6
Breaking Changes
- tskit now requires Python 3.8, as Python 3.7 became end-of-life on 2023-06-27
Features
-
Tree.trmcanow accepts >2 nodes and returns nicer errors
(@hyanwong, :pr:2808, #2801, #2070, #2611) -
Add
TreeSequence.genetic_relatedness_weightedstats method.
(@petrelharp, @brieuclehmann, @jeromekelleher,
#2785, #1246) -
Add
TreeSequence.impute_unknown_mutations_timemethod to return an
array of mutation times based on the times of associated nodes
(@duncanMR, #2760, #2758) -
Add
asdictto all dataclasses. These are returned when you access a row or
other tree sequence object. (@benjeffery, #2759, #2719)
Bugfixes
- Fix incompatibility with
jsonschema>4.18.6which caused
AttributeError: module jsonschema has no attribute _validators
(@benjeffery, #2844, #2840)
Python 0.5.5
Performance improvements
- Methods like ts.at() which seek to a specified position on the sequence from
a new Tree instance are now much faster (@molpopgen, #2661).
Features
-
Add
__repr__for variants to return a string representation of the raw data
without spewing megabytes of text (@chriscrsmith, #2695, #2694) -
Add
keep_rowsmethod to table classes to support efficient in-place
table subsetting (@jeromekelleher, #2700)
Bugfixes
- Fix
UnicodeDecodeErrorwhen callingVariant.alleleson theemscriptenplatform.
(@benjeffery, #2754, #2737)
C API C_1.1.2
Performance improvements
- tsk_tree_seek is now much faster at seeking to arbitrary points along
the sequence from the null tree (@molpopgen, #2661).
Features
-
The struct
tsk_treeseq_tnow has the variablesmin_timeandmax_time,
which are the minimum and maximum among the node times and mutation times,
respectively.min_timeandmax_timecan be accessed using the functions
tsk_treeseq_get_min_timeandtsk_treeseq_get_max_time, respectively.
(@szhan, #2612, #2271) -
Add the
TSK_SIMPLIFY_NO_FILTER_NODESoption to simplify to allow unreferenced
nodes be kept in the output (@jeromekelleher, @hyanwong,
#2606, #2619). -
Add the
TSK_SIMPLIFY_NO_UPDATE_SAMPLE_FLAGSoption to simplify which ensures
no node sample flags are changed to allow calling code to manage sample status.
(@jeromekelleher, #2662, #2663). -
Guarantee that unfiltered tables are not written to unnecessarily
during simplify (@jeromekelleher #2619). -
Add
x_table_keep_rowsmethods to provide efficient in-place table subsetting
(@jeromekelleher, #2700). -
Add
tsk_tree_seek_indexfunction
Python 0.5.4
Features
-
A new
Tree.is_rootmethod avoids the need to to search the potentially
large list ofTree.roots(@hyanwong, #2669, #2620) -
The
TreeSequenceobject now has the attributesmin_timeandmax_time,
which are the minimum and maximum among the node times and mutation times,
respectively. (@szhan, #2612, #2271) -
The
draw_svgmethods now have amax_num_treesparameter to truncate
the total number of trees shown, giving a readable display for tree
sequences with many trees (@hyanwong, #2652) -
The
draw_svgmethods now accept acanvas_sizeparameter to allow
extra room on the canvas e.g. for long labels or repositioned graphical
elements (@hyanwong, #2646, #2645) -
The
Treeobject now has the methodsiblingsto get
the siblings of a node. It returns an empty tuple if the node
has no siblings, is not a node in the tree, is the virtual root,
or is an isolated non-sample node.
(@szhan, #2618, #2616) -
The
msprime.RateMapclass has been ported into tskit: functionality should
be identical to the version in msprime, apart from minor changes in the formatting
of tabular text output (@hyanwong, @jeromekelleher, #2678) -
Tskit now supports and has wheels for Python 3.11. This Python version has a significant performance boost. (@benjeffery , #2624 , #2248 )
Breaking Changes
Python 0.5.3
Fixes
-
The
Variantobject can now be initialized with 64 bit numpy ints as
returned e.g. from np.where (@hyanwong, #2518, #2514) -
Fix
tree.mrcafor the case of a tree with multiple roots.
(@benjeffery, #2533, #2521)
Features
-
The
ts.nodesmethod now takes anorderparameter so that nodes
can be visited in time order (@hyanwong, #2471, #2370) -
Add
samplesargument toTreeSequence.genotype_matrix.
Default isNone, where all the sample nodes are selected.
(@szhan, #2493, #678) -
ts.drawand thedraw_svgmethods now have an optionalomit_sites
parameter, aiding drawing large trees with many sites and mutations
(@hyanwong, #2519, #2516)
Breaking Changes
-
Single statistics computed with
TreeSequence.general_statare now
returned as numpy scalars if windows=None, AND; samples is a single
list or None (for a 1-way stat), OR indexes is None or a single list of
length k (instead of a list of length-k lists).
(@gtsambos, #2417, #2308) -
Accessor methods such as ts.edge(n) and ts.node(n) now allow negative
indexes (@hyanwong, #2478, #1008) -
ts.subset()produces valid tree sequences even if nodes are shuffled
out of time order (@hyanwong, #2479, #2473), and the
same fortables.subset()(@hyanwong, #2489). This involves
sorting the returned tables, potentially changing the returned edge order.
Performance improvements
Python 0.5.2
Fixes
-
Iterating over
ts.variants()could cause a segfault in tree sequences
with large numbers of alleles or very long alleles
(@jeromekelleher, #2437, #2429). -
Various circular references fixed, lowering peak memory usage
(@jeromekelleher, #2424, #2423, #2427). -
Fix bugs in VCF output when there isn't a 1-1 mapping between individuals
and sample nodes (@jeromekelleher, #2442, #2257,
#2446, #2448).
Performance improvements
-
TreeSequence.site position search performance greatly improved, with much lower
memory overhead (@jeromekelleher, #2424). -
TreeSequence.samples time/population search performance greatly improved, with
much lower memory overhead (@jeromekelleher, #2424, #1916). -
The
timeascandtimedescorders forTree.nodeshave much
improved performance and lower memory overhead
(@jeromekelleher, #2424, #2423).
Features
-
Variant objects now have a
.num_missingattribute and.counts()and
.frequenciesmethods (@hyanwong, #2390 #2393). -
Add the
Tree.num_lineages(t)method to return the number of lineages present
at time t in the tree (@jeromekelleher, #386, #2422) -
Efficient array access to table data now provided via attributes like
TreeSequence.nodes_time, etc (@jeromekelleher, #2424).
Breaking Changes
- Previously, accessing (e.g.)
tables.edgesreturned a different instance of
EdgeTable each time. This has been changed to return the same instance
for the lifetime of a given TableCollection instance. This is technically
a breaking change, although it's difficult to see how code would depend
on the property that (e.g.)tables.edges is not tables.edges.
(@jeromekelleher, #2441, #2080).