Skip to content

Reading and writing a zarr dataset multiple times casts bools to int8Β #4826

@amatsukawa

Description

@amatsukawa

What happened:

Reading and writing zarr dataset multiple times into different paths changes bool dtype arrays to int8. I think this issue is related to #2937.

What you expected to happen:

My array's dtype in numpy/dask should not change, even if certain storage backends store dtypes a certain way.

Minimal Complete Verifiable Example:

import xarray as xr
import numpy as np

ds = xr.Dataset({
    "bool_field": xr.DataArray(
        np.random.randn(5) < 0.5, 
        dims=('g'), 
        coords={'g': np.arange(5)}
    )
})
ds.to_zarr('test.zarr', mode="w")

d2 = xr.open_zarr('test.zarr')
print(d2.bool_field.dtype)
print(d2.bool_field.encoding)
d2.to_zarr("test2.zarr", mode="w")

d3 = xr.open_zarr('test2.zarr')
print(d3.bool_field.dtype)

The above snippet prints the following. In d3, the dtype of bool_field is int8, presumably because d3 inherited d2's encoding and it says int8, despite the array having a bool dtype.

bool
{'chunks': (5,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'dtype': dtype('int8')}
int8

Anything else we need to know?:

Currently workaround is to explicitly set encodings. This fixes the problem:

encoding = {k: {"dtype": d2[k].dtype} for k in d2}
d2.to_zarr('test2.zarr', mode="w", encoding=encoding)

Environment:

Output of xr.show_versions()
# I'll update with the the full output of xr.show_versions() soon.
In [4]: xr.__version__                                                                                                                                     
Out[4]: '0.16.2'

In [2]: zarr.__version__                                                                                                                                      
Out[2]: '2.6.1'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions