0.4.0
[0.4.0] - 2024-04-06
Changelog is relative to the last full release, 0.3.3.
Breaking Changes
- tsinfer 0.4.0 infers data from on-disk or in-memory vcf-zarr datasets, allowing users to leverage optimized
and parallel VCF parsing via the bio2zarr package. The SampleData file format and class are now deprecated. - If a mismatch ratio is provided to the infer command, it only applies during the
match_samples phase (#980, #981, @hyanwong)
Features
- Add batch ancestor and sample matching APIs for splitting work across many independent jobs.
(#954, #917, @benjeffery)
Performance improvements
- Reduce memory usage when running match_samples against large cohorts
containing sequences with substantial amounts of error.
(#761, @jeromekelleher) - truncate_ancestors no longer requires loading all the ancestors into RAM.
(#811, @benjeffery) - Increase parallelisation of match_ancestors by generating parallel groups from
their implied dependency graph. (#828, #147, @benjeffery) - Reduce memory requirements of the generate_ancestors function by providing
the genotype_encoding (#809) and mmap_temp_dir (#808) options
(@jeromekelleher).
Other Breaking Changes
- Removed the uuid field from SampleData; equality is now purely based on data
- If a mismatch ratio is provided to the infer command, it only applies during the match_samples phase
- A permissive JSON schema is now set on node table metadata
Fixes