Skip to content

0.4.0

Choose a tag to compare

@benjeffery benjeffery released this 06 Mar 15:01
· 110 commits to main since this release

[0.4.0] - 2024-04-06

Changelog is relative to the last full release, 0.3.3.

Breaking Changes

  • tsinfer 0.4.0 infers data from on-disk or in-memory vcf-zarr datasets, allowing users to leverage optimized
    and parallel VCF parsing via the bio2zarr package. The SampleData file format and class are now deprecated.
  • If a mismatch ratio is provided to the infer command, it only applies during the
    match_samples phase (#980, #981, @hyanwong)

Features

  • Add batch ancestor and sample matching APIs for splitting work across many independent jobs.
    (#954, #917, @benjeffery)

Performance improvements

  • Reduce memory usage when running match_samples against large cohorts
    containing sequences with substantial amounts of error.
    (#761, @jeromekelleher)
  • truncate_ancestors no longer requires loading all the ancestors into RAM.
    (#811, @benjeffery)
  • Increase parallelisation of match_ancestors by generating parallel groups from
    their implied dependency graph. (#828, #147, @benjeffery)
  • Reduce memory requirements of the generate_ancestors function by providing
    the genotype_encoding (#809) and mmap_temp_dir (#808) options
    (@jeromekelleher).

Other Breaking Changes

  • Removed the uuid field from SampleData; equality is now purely based on data
  • If a mismatch ratio is provided to the infer command, it only applies during the match_samples phase
  • A permissive JSON schema is now set on node table metadata

Fixes

  • Properly account for "N" as an unknown ancestral state, and ban "" from being
    set as an ancestral state (#963, @hyanwong)