Skip to content

Conversation

@mdsumner
Copy link
Contributor

Zarr V3: support numpy.datetime64 extension data type, fixes missing extension support of #13912

Adds an else if branch in ParseDtypeV3 that handles JSON object extension types with "name": "numpy.datetime64" or "numpy.timedelta64", mapping them to GDT_Int64.

Adds three test cases to autotest/gdrivers/zarr_driver.py:

  • test_zarr_read_numpy_datetime64_extension_zarr_v3 — Parametrized over both numpy.datetime64 and numpy.timedelta64. Creates a synthetic V3 store with the extension type (no fallback key, matching what zarr-python 3.x writes). Asserts it opens and maps to GDT_Int64.

  • test_zarr_read_numpy_datetime64_unsupported_extension_zarr_v3 — Verifies that an unrecognized extension type (without fallback) correctly raises an error.

What are related issues/pull requests?

Fixes #13912 and relates to similar extension enhancement request #13912

Tasklist

  • AI (Claude, chat and agent) supported my development of this PR
  • Make sure code is correctly formatted (cf pre-commit configuration)
  • Add test case(s)
  • Add documentation
  • Review
  • Adjust for comments
  • All CI builds and checks have passed

"shape": [3],
"data_type": {
"name": extension_name,
"configuration": {"unit": "ns", "scale_factor": 1},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should likely expose those configuration attributes as "data_type.unit" and "data_type.scale_factor" attributes so users can make sense of the array values

And looking at https://github.com/zarr-developers/zarr-extensions/tree/main/data-types/numpy.datetime64#fill-value-representation, the "NaT" fill_nodata value should be taken into account

It would also be worth enhancing the test to read actual (non-zero) array values to check that then endianness is correctly handled

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should likely expose those configuration attributes as "data_type.unit" and "data_type.scale_factor"

and also "data_type.name"

elt.gdalType = GDALExtendedDataType::Create(GDT_Int64);
elt.gdalSize = elt.gdalType.GetSize();
elt.nativeSize = elt.gdalSize;
if (elt.nativeSize > 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the if() test can be removed: the condition is always true

@coveralls
Copy link
Collaborator

Coverage Status

coverage: 71.685% (-0.008%) from 71.693%
when pulling 24e8e3a on mdsumner:zarr-v3-numpy-datetime64-extension
into 7267c9f on OSGeo:master.

@mdsumner
Copy link
Contributor Author

Thanks Even! Addressed those in a second commit. I'm less comfortable with the second part so here is a review summary from the digital friend, which might be interesting in its own right:

  • NaT handling (lines 1666-1673) — correctly placed before ParseNoDataStringAsDouble which would choke on "NaT". Mapping to INT64_MIN matches the extension spec. The GDT_Int64 guard is appropriate.
  • oOrigDtype capture (line 1573) — correctly saved before the fallback substitution at line 1574, so the structural info block at line 1832 reads the original extension metadata regardless of whether a fallback key was present.
  • Structural info exposure (lines 1832-1856) — clean, defensive (checks validity/non-empty before each SetStructuralInfo), and placed alongside the existing SetStructuralInfo("COMPRESSOR", ...) pattern.
  • Test — thorough. Real chunk data, round-trip via multidim API, checks structural info, checks NaT nodata, removed the redundant @gdaltest.enable_exceptions() from the success test.

To consider:

The nativeSize > 1 guard removal at line 1196 is in the string-type branch, not the datetime branch. That affects bool, int8, uint8 — on big-endian systems, those 1-byte types now get needByteSwapping = true, which triggers an unnecessary decoded buffer allocation in NeedDecodedBuffer(). It's harmless (byte-swapping 1 byte is identity), but it's a drive-by change to unrelated code. The same removal at line 1210 (datetime branch) is fine since Int64 is always 8 bytes. Reviews might ask why the string branch was touched — I'd either restore the guard on line 1196 or split it into a separate cleanup commit so the datetime PR stays focused.

The scale_factor > 0 guard silently drops zero or missing values, which is correct for the spec. The NaT check being keyed on GDT_Int64 rather than specifically the datetime extension name is slightly broad, but any non-datetime Int64 with fill_value: "NaT" would be a nonsensical zarr.json anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Zarr V3: support numpy.datetime64 extension data type

3 participants