Skip to content

Problem: writing out of order data (acquire-zarr allows writing invalid dimension order - due to annoying part of ngff spec) #171

@tlambert03

Description

@tlambert03

I'm pretty certain you're aware of this tension, but haven't found an open issue to discuss it.

The problem is that acquire-zarr will happily write invalid OME-ZARR, where the dimension order is incorrect. As I'm sure you're painfully aware... the NGFF spec currently, and lamentably, states this:

The "axes" MUST contain 2 or 3 entries of "type:space" and MAY contain one additional entry of "type:time" and MAY contain one additional entry of "type:channel" or a null / custom type. The order of the entries MUST correspond to the order of dimensions of the zarr arrays. In addition, the entries MUST be ordered by "type" where the "time" axis must come first (if present), followed by the "channel" or custom axis (if present) and the axes of type "space". (emphasis mine)

(example at the bottom of a script that writes an invalid ome-zarr.)

Looks like this has come up in the past, with #89 ... which was "solved" in #90 ... but as @jeskesen points out in #90 (review) ... that really was quick patch that fixed a specific use case, but left the harder (real) problem for the future:

[@jeskesen]: I'll be honest, I was hoping this would lead to a more general "writing out-of-order data" solution that could also be used in C++, but this fixes the issue at hand.

So this issue serves to track progress on that harder fix. Specifically:

Regardless of what the OME-NGFF spec wants... it should be possible for a microscope acquisition to go in some order other than TCZYX. (i.e. the storage spec shouldn't constrain the acquisition spec). The job of transposing the acquisition to match the storage spec is, unfortunately, acquire-zarr's responsibility.

I would expect the following script to works:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#     "acquire-zarr",
#     "numpy",
#     "yaozarrs[io]",
# ]
# ///

"""Basic example of using ome_writers."""

from pathlib import Path

import numpy as np
import yaozarrs
from acquire_zarr import (
    ArraySettings,
    Dimension,
    DimensionType,
    StreamSettings,
    ZarrStream,
    ZarrVersion,
)

output = Path("az.zarr").expanduser()

# NOTE!!!
# TZCYX order
dtype = np.dtype("uint16")
data = np.random.randint(0, 65536, size=(10, 5, 2, 512, 512), dtype=dtype)
nt, nz, nc, ny, nx = data.shape

dimensions = [
    Dimension(
        name="t",
        kind=DimensionType.TIME,
        array_size_px=nt,
        chunk_size_px=1,
        shard_size_chunks=1,
    ),
    Dimension(
        name="z",
        kind=DimensionType.SPACE,
        array_size_px=nz,
        chunk_size_px=1,
        shard_size_chunks=1,
    ),
    Dimension(
        name="c",
        kind=DimensionType.CHANNEL,
        array_size_px=nc,
        chunk_size_px=1,
        shard_size_chunks=1,
    ),
    Dimension(
        name="y",
        kind=DimensionType.SPACE,
        array_size_px=ny,
        chunk_size_px=64,
        shard_size_chunks=1,
    ),
    Dimension(
        name="x",
        kind=DimensionType.SPACE,
        array_size_px=nx,
        chunk_size_px=64,
        shard_size_chunks=1,
    ),
]

settings = StreamSettings(
    arrays=[
        ArraySettings(
            dimensions=dimensions,
            data_type=dtype,
        )
    ],
    overwrite=True,
    store_path=str(output),
    version=ZarrVersion.V3,
)
stream = ZarrStream(settings)

for t, z, c in np.ndindex(nt, nz, nc):
    stream.append(data[t, z, c])
stream.close()


try:
    yaozarrs.validate_zarr_store(str(output))
except Exception as e:
    print("Zarr store validation FAILED:", e)
else:
    print("Zarr store validation PASSED")

but if you uv run that script:

➜ uv run example_bug.py
Installed 18 packages in 27ms
Zarr store validation FAILED: 1 validation error for OMEAttributes
ome.image.multiscales.0.axes
  Value error, Axes are not in the required order by type.
  Order must be [time,] [channel,] space.
  [type=value_error, input_value=[{'name': 't', 'type': 't...: 'x', 'type': 'space'}], input_type=list]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions