Zarr v3 alternative by jeromekelleher · Pull Request #1110 · tskit-dev/tsinfer

jeromekelleher · 2026-03-04T14:32:35Z

No description provided.

zarr v3 removed the built-in LMDBStore. This implements the zarr v3 async Store ABC on top of LMDB, preserving the same flat key layout used by zarr v2's LMDBStore so the on-disk format of .samples and .ancestors files is unchanged (at the storage level). Update DataContainer for zarr v3 API - Replace zarr.LMDBStore with our custom LMDBStore - zarr.group() -> zarr.group(zarr_format=2) for in-memory stores - zarr.open_group(...) -> add zarr_format=2 for new stores - Replace zarr.copy_all/copy_store with _copy_zarr_group() helper - array.resize(n) -> array.resize((n,)) - zarr v3 takes a shape tuple - array.info -> repr(array) - array.info was removed in zarr v3 - Use store.info()/stat() directly for LMDB size shrink in finalise() - Remove _metadata_codec attribute (no longer used for array creation) Update SampleData for zarr v3: JSON-encode object/ragged arrays - Bump FORMAT_VERSION from (5, 1) to (6, 0) - Replace dtype=object (numcodecs.JSON object_codec) → dtype=str for populations/metadata, individuals/metadata, sites/alleles, sites/metadata, provenances/timestamp, provenances/record - Replace dtype="array:f8" (ragged float) → dtype=str + JSON encoding for individuals/location - All create_dataset() → create_array() with zarr_format=2 - Add json.dumps() in all write paths, json.loads() in all read paths - Mark each on-disk change with # ON-DISK COMPATIBILITY comment Files written with FORMAT_VERSION < (6, 0) cannot be read by this code. Update AncestorData for zarr v3: JSON-encode focal_sites, update create_dataset wrapper - Bump FORMAT_VERSION from (3, 0) to (4, 0) - Update create_dataset() wrapper to use create_array() with zarr_format=2 - Replace dtype="array:i4" (ragged int32) → dtype=str + JSON encoding for sample_focal_sites (ancestors_focal_sites) - add_ancestor(): json.dumps(focal_sites.tolist()) on write - ancestor() / ancestors(): json.loads() → np.int32 array on read - assert_data_equal(): decode JSON strings before np.testing.assert_array_equal - truncate_ancestors(): buffer focal_sites as JSON-encoded strings - Mark each on-disk change with # ON-DISK COMPATIBILITY comment Files written with FORMAT_VERSION < (4, 0) cannot be read by this code. Update VariantData, test utility, and pyproject.toml for zarr v3 - Add zarr_format=2 to zarr.array() and zarr.zeros() in VariantData - Fix tests/tsutil.py: zarr.group() → zarr.group(zarr_format=2), replace removed zarr.convenience.copy_all() with _copy_zarr_group() - pyproject.toml: zarr<3 → zarr>=3,<4 Fix add_ancestral_alleles(): create_dataset → create_array with zarr_format=2 Update test_variantdata.py for zarr v3 API - zarr.group() → zarr.group(zarr_format=2) - zarr.array() → zarr.array(..., zarr_format=2) - create_dataset() → create_array() with zarr_format=2, data assigned separately - individuals_metadata: object_codec=VLenBytes() → dtype=bytes + separate assignment - TestAddAncestralStateArray: extract _make_store() helper to DRY up setup - Remove unused numcodecs import Update test_formats.py for zarr v3 API - zarr.LMDBStore → tsinfer._lmdb_store.LMDBStore - Add zarr_format=2 to all zarr.array/ones/zeros/empty/empty_like calls - Replace dtype="array:i4" tests with JSON-encoded dtype=str equivalents - Replace dtype=object/object_codec=JSON() tests with dtype=str equivalents - Remove now-unnecessary object array comparison branch in verify_round_trip - Remove unused 'import numcodecs' (blosc still imported as numcodecs.blosc) Update zarr version guard: require zarr >=3,<4 Add missing abstract methods to LMDBStore: __eq__ and get_partial_values zarr v3's Store ABC requires these to be implemented. Remove zarr_format=2 from group.create_array() calls zarr v3's Group.create_array() does not accept zarr_format; arrays inherit the format from their parent group (which is opened/created with zarr_format=2). Keep zarr_format=2 only on standalone zarr.array/zeros/empty and zarr.group/open_group calls. Updates

codecov · 2026-03-04T14:34:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.82%. Comparing base (a72866c) to head (5e8cb38).

❗ There is a different number of reports uploaded between BASE (a72866c) and HEAD (5e8cb38). Click for more details.

HEAD has 6 uploads less than BASE

Flag BASE (a72866c) HEAD (5e8cb38)

python-tests 6 0

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1110      +/-   ##
==========================================
- Coverage   90.22%   81.82%   -8.41%     
==========================================
  Files          19        7      -12     
  Lines        7092     2823    -4269     
  Branches     1170      474     -696     
==========================================
- Hits         6399     2310    -4089     
+ Misses        562      434     -128     
+ Partials      131       79      -52

Flag	Coverage Δ
C	`81.41% <ø> (ø)`
c-python	`59.43% <ø> (ø)`
python-tests	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
Python API	`∅ <ø> (∅)`
Python C interface	`66.66% <ø> (ø)`
C library	`88.58% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zarr v3 alternative #1110

Zarr v3 alternative #1110
jeromekelleher wants to merge 1 commit intotskit-dev:mainfrom
jeromekelleher:zarr-v3-second-pass

jeromekelleher commented Mar 4, 2026

Uh oh!

codecov bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jeromekelleher commented Mar 4, 2026

Uh oh!

codecov bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Mar 4, 2026 •

edited

Loading