Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions codecs/vlen-bytes/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Vlen-bytes codec

Defines an `array -> bytes` codec that serializes variable-length byte string arrays.

## Codec name

The value of the `name` member in the codec object MUST be `vlen-bytes`.

## Configuration parameters

None.

## Example

For example, the array metadata below specifies that the array contains variable-length byte strings:

```json
{
"data_type": "bytes",
"codecs": [{
"name": "vlen-bytes"
}],
}
```

## Format and algorithm

This is a `array -> bytes` codec.

This codec is only compatible with the [`"bytes"`](../../data-types/bytes/README.md) data type.

In the encoded format, each chunk is prefixed with a 32-bit little-endian unsigned integer (u32le) that specifies the number of elements in the chunk.
This prefix is followed by a sequence of encoded elements in lexicographical order.
Each element in the sequence is encoded by a u32le representing the number of bytes followed by the bytes themselves.

See https://numcodecs.readthedocs.io/en/stable/other/vlen.html#vlenbytes for details about the encoding.

## Change log

No changes yet.

## Current maintainers

* [zarr-python core development team](https://github.com/orgs/zarr-developers/teams/python-core-devs)
20 changes: 20 additions & 0 deletions codecs/vlen-bytes/schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"oneOf": [
{
"type": "object",
"properties": {
"name": {
"const": "vlen-bytes"
},
"configuration": {
"type": "object",
"additionalProperties": false
}
},
"required": ["name"],
"additionalProperties": false
},
{ "const": "vlen-bytes" }
]
}
45 changes: 45 additions & 0 deletions codecs/vlen-utf8/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Vlen-utf8 codec

Defines an `array -> bytes` codec that serializes variable-length UTF-8 string arrays.

## Codec name

The value of the `name` member in the codec object MUST be `vlen-utf8`.

## Configuration parameters

None.

## Example

For example, the array metadata below specifies that the array contains variable-length UTF-8 strings:

```json
{
"data_type": "string",
"codecs": [{
"name": "vlen-utf8"
}],
}
```

## Format and algorithm

This is a `array -> bytes` codec.

This codec is only compatible with the [`"string"`](../../data-types/string/README.md) data type.

In the encoded format, each chunk is prefixed with a 32-bit little-endian unsigned integer (u32le) that specifies the number of elements in the chunk.
This prefix is followed by a sequence of encoded elements in lexicographical order.
Each element in the sequence is encoded by a u32le representing the number of bytes followed by the bytes themselves.
The bytes for each element are obtained by encoding the element as UTF8 bytes.

See https://numcodecs.readthedocs.io/en/stable/other/vlen.html#vlenutf8 for details about the encoding.

## Change log

No changes yet.

## Current maintainers

* [zarr-python core development team](https://github.com/orgs/zarr-developers/teams/python-core-devs)
20 changes: 20 additions & 0 deletions codecs/vlen-utf8/schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"oneOf": [
{
"type": "object",
"properties": {
"name": {
"const": "vlen-utf8"
},
"configuration": {
"type": "object",
"additionalProperties": false
}
},
"required": ["name"],
"additionalProperties": false
},
{ "const": "vlen-utf8" }
]
}
33 changes: 33 additions & 0 deletions data-types/bytes/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Bytes data type

Defines a data type for variable-length byte strings.

## Permitted fill values

The value of the `fill_value` metadata key must be an array of byte values.

## Example

For example, the array metadata below specifies that the array contains variable-length byte strings:

```json
{
"data_type": "bytes",
"fill_value": [1, 2, 3],
"codecs": [{
"name": "vlen-bytes"
}],
}
```

## Notes

Currently, this data type is only compatible with the [`"vlen-bytes"`](../../codecs/vlen-bytes/README.md) codec.

## Change log

No changes yet.

## Current maintainers

* [zarr-python core development team](https://github.com/orgs/zarr-developers/teams/python-core-devs)
20 changes: 20 additions & 0 deletions data-types/bytes/schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"oneOf": [
{
"type": "object",
"properties": {
"name": {
"const": "bytes"
},
"configuration": {
"type": "object",
"additionalProperties": false
}
},
"required": ["name"],
"additionalProperties": false
},
{ "const": "bytes" }
]
}
33 changes: 33 additions & 0 deletions data-types/string/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# String data type

Defines a data type for variable-length UTF8 strings.

## Permitted fill values

The value of the `fill_value` metadata key must be unicode string.

## Example

For example, the array metadata below specifies that the array contains variable-length byte strings:

```json
{
"data_type": "string",
"fill_value": "foo",
"codecs": [{
"name": "vlen-utf8"
}],
}
```

## Notes

Currently, this data type is only compatible with the [`"vlen-utf8"`](../../codecs/vlen-utf8/README.md) codec.

## Change log

No changes yet.

## Current maintainers

* [zarr-python core development team](https://github.com/orgs/zarr-developers/teams/python-core-devs)
20 changes: 20 additions & 0 deletions data-types/string/schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"oneOf": [
{
"type": "object",
"properties": {
"name": {
"const": "string"
},
"configuration": {
"type": "object",
"additionalProperties": false
}
},
"required": ["name"],
"additionalProperties": false
},
{ "const": "string" }
]
}