-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
Description
What happened:
Reading and writing zarr dataset multiple times into different paths changes bool
dtype arrays to int8
. I think this issue is related to #2937.
What you expected to happen:
My array's dtype in numpy/dask should not change, even if certain storage backends store dtypes a certain way.
Minimal Complete Verifiable Example:
import xarray as xr
import numpy as np
ds = xr.Dataset({
"bool_field": xr.DataArray(
np.random.randn(5) < 0.5,
dims=('g'),
coords={'g': np.arange(5)}
)
})
ds.to_zarr('test.zarr', mode="w")
d2 = xr.open_zarr('test.zarr')
print(d2.bool_field.dtype)
print(d2.bool_field.encoding)
d2.to_zarr("test2.zarr", mode="w")
d3 = xr.open_zarr('test2.zarr')
print(d3.bool_field.dtype)
The above snippet prints the following. In d3, the dtype of bool_field
is int8
, presumably because d3 inherited d2's encoding
and it says int8
, despite the array having a bool
dtype.
bool
{'chunks': (5,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'dtype': dtype('int8')}
int8
Anything else we need to know?:
Currently workaround is to explicitly set encodings. This fixes the problem:
encoding = {k: {"dtype": d2[k].dtype} for k in d2}
d2.to_zarr('test2.zarr', mode="w", encoding=encoding)
Environment:
Output of xr.show_versions()
# I'll update with the the full output of xr.show_versions() soon.
In [4]: xr.__version__
Out[4]: '0.16.2'
In [2]: zarr.__version__
Out[2]: '2.6.1'
shaunc