Skip to content

[v3] String support for v3 array #2268

@jhamman

Description

@jhamman

Zarr version

3.0.0.alpha

Numcodecs version

N/A

Python Version

N/A

Operating System

N/A

Installation

N/A

Description

Zarr Python 2.x contains support for writing string=type data to Zarr arrays. Unfortunately, strings were left out of the core dtypes from the v3 spec and so we have not implemented anything to replace this functionality in 3.0. There are discussions on how this could be done in the spec

And some related work here:

Steps to reproduce

writing strings to zarr works using zarr_format=2:

In [1]: import zarr

In [2]: store = {}

In [3]: data = ["this", "is", "a", "1d", "string"]

In [4]: arr = zarr.array(shape=(5, ), store=store, zarr_format=2, data=data)

In [5]: store['.zarray'].to_bytes()
Out[5]: b'{\n  "shape": [\n    5\n  ],\n  "fill_value": "",\n  "zarr_format": 2,\n  "order": "C",\n  "filters": null,\n  "dimension_separator": ".",\n  "compressor": null,\n  "chunks": [\n    5\n  ],\n  "dtype": "<U6"\n}'

but obviously not with zarr_format=3

In [6]: arr = zarr.array(shape=(5, ), store=store, zarr_format=3, data=data)

...
ValueError: Invalid V3 data_type: <U6

xref: pydata/xarray#9515

Metadata

Metadata

Assignees

Labels

bugPotential issues with the zarr-python library

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions