Skip to content

[v3]: TypeError when updating Array.attrs, thanks to incorrect inferred fill_value #2153

@TomAugspurger

Description

@TomAugspurger

Zarr version

v3

Numcodecs version

n/a

Python Version

n/a

Operating System

n/a

Installation

v3

Description

Discovered while looking at the xarray unit tests for Zarr.

Steps to reproduce

import zarr
import numpy as np
from zarr.api import asynchronous

store = zarr.store.MemoryStore({}, mode="w")
arr = zarr.open_array(store=store, path="a", shape=(), dtype=np.dtype("S6"))
arr.update_attributes({"key": "value"})

The .put call raises with

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 7
      5 store = zarr.store.MemoryStore({}, mode="w")
      6 arr = zarr.open_array(store=store, path="a", shape=(), dtype=np.dtype("S6"))
----> 7 arr.update_attributes({"k": "v"})

File ~/gh/zarr-developers/zarr-python/src/zarr/core/array.py:2055, in Array.update_attributes(self, new_attributes)
   2053 def update_attributes(self, new_attributes: dict[str, JSON]) -> Array:
   2054     return type(self)(
-> 2055         sync(
   2056             self._async_array.update_attributes(new_attributes),
   2057         )
   2058     )

File ~/gh/zarr-developers/zarr-python/src/zarr/core/sync.py:91, in sync(coro, loop, timeout)
     88 return_result = next(iter(finished)).result()
     90 if isinstance(return_result, BaseException):
---> 91     raise return_result
     92 else:
     93     return return_result

File ~/gh/zarr-developers/zarr-python/src/zarr/core/sync.py:50, in _runner(coro)
     45 """
     46 Await a coroutine and return the result of running it. If awaiting the coroutine raises an
     47 exception, the exception will be returned.
     48 """
     49 try:
---> 50     return await coro
     51 except Exception as ex:
     52     return ex

File ~/gh/zarr-developers/zarr-python/src/zarr/core/array.py:601, in AsyncArray.update_attributes(self, new_attributes)
    600 async def update_attributes(self, new_attributes: dict[str, JSON]) -> AsyncArray:
--> 601     new_metadata = self.metadata.update_attributes(new_attributes)
    603     # Write new metadata
    604     await self._save_metadata(new_metadata)

File ~/gh/zarr-developers/zarr-python/src/zarr/core/metadata.py:323, in ArrayV3Metadata.update_attributes(self, attributes)
    322 def update_attributes(self, attributes: dict[str, JSON]) -> Self:
--> 323     return replace(self, attributes=attributes)

File ~/mambaforge/lib/python3.10/dataclasses.py:1454, in replace(obj, **changes)
   1447         changes[f.name] = getattr(obj, f.name)
   1449 # Create the new object, which calls __init__() and
   1450 # __post_init__() (if defined), using all of the init fields we've
   1451 # added and/or left in 'changes'.  If there are values supplied in
   1452 # changes that aren't fields, this will correctly raise a
   1453 # TypeError.
-> 1454 return obj.__class__(**changes)

File ~/gh/zarr-developers/zarr-python/src/zarr/core/metadata.py:189, in ArrayV3Metadata.__init__(self, shape, data_type, chunk_grid, chunk_key_encoding, fill_value, codecs, attributes, dimension_names)
    187 chunk_key_encoding_parsed = ChunkKeyEncoding.from_dict(chunk_key_encoding)
    188 dimension_names_parsed = parse_dimension_names(dimension_names)
--> 189 fill_value_parsed = parse_fill_value_v3(fill_value, dtype=data_type_parsed)
    190 attributes_parsed = parse_attributes(attributes)
    191 codecs_parsed_partial = parse_codecs(codecs)

File ~/gh/zarr-developers/zarr-python/src/zarr/core/metadata.py:656, in parse_fill_value_v3(fill_value, dtype)
    654             raise ValueError(msg)
    655     msg = f"Cannot parse non-string sequence {fill_value} as a scalar with type {dtype}."
--> 656     raise TypeError(msg)
    657 return dtype.type(fill_value)

TypeError: Cannot parse non-string sequence b'' as a scalar with type |S6.

Additional output

The Array.update_attributes call is a bit of a red herring. That just happens to go through Metadata.__init__, which validates the fill value for a given dtype. Only this type we use fill_value=np.bytes(b""), which is incompatible with the dtype np.dtype("S6").

So I think the shorter reproducer is something like

arr = zarr.open_array(store=store, path="a", shape=(), dtype=np.dtype("S6"))
assert arr.metadata.fill_value != np.bytes_("b"))

in other words, maybe(?) our inferred fill value is incorrect for this specific dtype.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    Status

    Done

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions