-
-
Notifications
You must be signed in to change notification settings - Fork 364
Description
there's a v3 data type definition for a variable-length bytes data type: https://github.com/zarr-developers/zarr-extensions/tree/main/data-types/bytes which was not on my radar when I added variable-length bytes support in #2874.
The v3 bytes
data type is incompatible with the VariableLengthBytes
data type that I implemented in #2874. The differences are:
data type | identifier | fill value |
---|---|---|
v3 bytes dtype |
"bytes" |
array of ints (one per byte) |
Zarr Python VariableLengthBytes dtype |
"variable_length_bytes" |
string (base64-encoded bytes) |
As an ecosystem we should probably not have 2 nearly identical data types. That argues for consolidating these two. Since the VariableLengthBytes
data type doesn't have a spec, I think its current behavior should be deprecated and we should either modify it to comply with the v3 bytes
data type spec, or introduce a brand new data type class that complies with that spec.
Either way we can be compatible with older data by taking "vlen-bytes"
as an alias for "bytes"
, and reading (but not writing) the base64-encoded fill value.
Any thoughts or preferences for these two options? Modifying the JSON form of the existing data type would break the ability for older versions of zarr-python to read the data type metadata, but we also loudly warned about this with warnings on the data type.