-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
only tested with zarr v2.
I believe I am running into some kind of race condition when rapidly reading/writing to a zarr store using a celery multiprocessing environment. At one point, my code tries to open the dataset using xr.open_zarr()
but an error occurs:
TypeError: the JSON object must be str, bytes or bytearray, not dict
It occurs on line 400 in backends/zarr.py:
zarray_path = os.path.join(zarr_obj.path, ".zarray")
if _zarr_v3():
import asyncio
zarray_str = asyncio.run(zarr_obj.store.get(zarray_path)).to_bytes()
else:
zarray_str = zarr_obj.store.get(zarray_path)
zarray = json.loads(zarray_str)
The only reason I can think of is that zarr (or perhaps python) has some form of caching and is returning the dictionary immediately when calling zarr_obj.store.get(zarray_path)
and so the json
package cannot deserialize it. Or somewhere else in the xarray/zarr code, zarr_obj.store["zarray_path"]
is being overwritten with a dict.
The incidence is really low, about 0.004% of tasks fail because of it. But that's still several times per day in my case.
I am also catching the exception and then loading and printing the contents of the file during this catch clause, but it seems normal:
{
"chunks": [
1000
],
"compressor": null,
"dimension_separator": ".",
"dtype": "<f4",
"fill_value": "NaN",
"filters": null,
"order": "C",
"shape": [
347
],
"zarr_format": 2
}
What did you expect to happen?
I expected no errors trying to deserialize the json string into a dictionary
Minimal Complete Verifiable Example
it's simply `xarray.open_zarr(store)`. The problem with race conditions is that it's hard if not impossible to write verifiable examples.
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
File "/usr/local/lib/python3.12/site-packages/xarray/backends/zarr.py", line 1513, in open_zarr
ds = open_dataset(
^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/xarray/backends/api.py", line 715, in open_dataset
backend_ds = backend.open_dataset(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/xarray/backends/zarr.py", line 1604, in open_dataset
ds = store_entrypoint.open_dataset(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/xarray/backends/store.py", line 46, in open_dataset
vars, attrs = filename_or_obj.load()
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/xarray/backends/common.py", line 312, in load
(_decode_variable_name(k), v) for k, v in self.get_variables().items()
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/xarray/backends/zarr.py", line 826, in get_variables
return FrozenDict((k, self.open_store_variable(k)) for k in self.array_keys())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/xarray/core/utils.py", line 468, in FrozenDict
return Frozen(dict(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/xarray/backends/zarr.py", line 826, in <genexpr>
return FrozenDict((k, self.open_store_variable(k)) for k in self.array_keys())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/xarray/backends/zarr.py", line 785, in open_store_variable
dimensions, attributes = _get_zarr_dims_and_attrs(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/xarray/backends/zarr.py", line 347, in _get_zarr_dims_and_attrs
zarray = json.loads(zarray_str)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/json/__init__.py", line 339, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not dict
Anything else we need to know?
I guess it's always possible that another process is writing to the file while xarray tries to read from it. But then I don't understand why the error says that the input to json.loads() is in fact a dictionary...
Environment
OS = Debian 12, inside a podman container