-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What is your issue?
When making an empty zarr full of object dtypes, I got a MemoryError, despite setting compute=False.
What did I expect to happen?
I expected to save the empty zarr to disk.
My use case is that I am trying to create a zarr dataset which will be populated with variable length utf-8 strings, which I know was not previously supported with zarr, but it now is. The issue, I think, can be explained by this comment in zarr.py:
File ~/miniconda3/lib/python3.13/site-packages/xarray/backends/zarr.py:531, in encode_zarr_variable(var, needs_copy, name)
510 """
511 Converts an Variable into an Variable which follows some
512 of the CF conventions:
(...) 527 A variable which has been encoded as described above.
528 """
530 var = conventions.encode_cf_variable(var, name=name)
--> 531 var = ensure_dtype_not_object(var, name=name)
533 # zarr allows unicode, but not variable-length strings, so it's both
534 # simpler and more compact to always encode as UTF-8 explicitly.
535 # TODO: allow toggling this explicitly via dtype in encoding.
536 # TODO: revisit this now that Zarr _does_ allow variable-length strings
537 coder = coding.strings.EncodedStringCoder(allows_unicode=True)
MCVE:
dummies = dask.array.zeros((5000, 100, 2000, 50), chunks=(10, 10, 500, 50), dtype = np.dtypes.StringDType)
ds = xr.Dataset({"foo": (["x", "y", "z", "alpha"], dummies)}, coords={"x": np.arange(5000), "y" : np.arange(100), "z" : np.arange(2000), "alpha" : np.arange(50)})
bigZarr = xr.merge([ds,dsf])
bigZarr.to_zarr('myzarr.zarr', compute=False, consolidated=False)
-
Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
-
Complete example — the example is self-contained, including all data and the text of any traceback.
-
Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
-
New issue — a search of GitHub Issues suggests this is not a duplicate.
-
Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Also, in the mean time, if anyone has a recommendation for how to make my project work regardless of this limitation, I would be keen to hear how. Much appreciated.