Skip to content

Commit a0d021b

Browse files
committed
WIP
1 parent 1f1a550 commit a0d021b

File tree

7 files changed

+268
-2
lines changed

7 files changed

+268
-2
lines changed

docs/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
2929
# ones.
3030
extensions = [
31+
'sphinxcontrib.mermaid'
3132
]
3233

3334
# Add any paths that contain templates here, relative to this directory.

docs/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,9 @@ Under construction.
1010
:caption: Contents:
1111

1212
protocol
13-
stores
1413
codecs
14+
stores
15+
storage_transformers
1516

1617

1718
Indices and tables

docs/protocol/core/v3.0.rst

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -383,6 +383,17 @@ conceptual model underpinning the Zarr protocol.
383383
interface`_ which is a common set of operations that stores may
384384
provide.
385385

386+
.. _storage transformer:
387+
.. _storage transformers:
388+
389+
*Storage transformer*
390+
391+
To enhance the storage capabilities, storage transformers may
392+
change the storage structure and behaviour of data coming from
393+
an array_ in the underlying store_. Upon retrival the original data is
394+
restored within the transformer. Any number of `predefined storage
395+
transformers`_ can be registered and stacked.
396+
386397

387398
Node names
388399
==========
@@ -895,6 +906,8 @@ ignored if not understood::
895906
}
896907

897908

909+
.. _array-metadata:
910+
898911
Array metadata
899912
--------------
900913

@@ -1019,6 +1032,17 @@ The following names are optional:
10191032
specification. When the ``compressor`` name is absent, this means that no
10201033
compressor is used.
10211034

1035+
``storage_transformers``
1036+
1037+
Specifies a codec to be used for encoding and decoding chunks. The
1038+
value must be an object containing the name ``codec`` whose value
1039+
is a URI that identifies a codec and dereferences to a human-readable
1040+
representation of the codec specification. The codec
1041+
object may also contain a ``configuration`` object which consists of the
1042+
parameter names and values as defined by the corresponding codec
1043+
specification. When the ``compressor`` name is absent, this means that no
1044+
compressor is used.
1045+
10221046

10231047
All other names within the array metadata object are reserved for
10241048
future versions of this specification.
@@ -1499,6 +1523,39 @@ Let "+" be the string concatenation operator.
14991523
For listable store, ``list_dir(parent(P))`` can be an alternative.
15001524

15011525

1526+
Storage transformers
1527+
====================
1528+
1529+
A Zarr storage transformer allows to change the zarr-compatible data before storing it.
1530+
The stored transformed data is restored to its original state whenever data is requested
1531+
by the Array.
1532+
1533+
A storage transformer serves the same `Abstract store interface`_ as the store_.
1534+
However, it should not persistently store any information necessary to restore the original data,
1535+
but instead propagates this to the next storage transformer or the final store.
1536+
From the perspective of an Array or a previous stage transformer both store and storage transformer follow the same
1537+
protocol and can be interchanged regarding the protocol. The behaviour can still be different,
1538+
e.g. requests may be cached or the form of the underlying data can change.
1539+
1540+
Storage Transformers may be stacked to combine different functionalities:
1541+
1542+
.. mermaid::
1543+
1544+
graph LR
1545+
Array --> t1
1546+
subgraph stack [Storage transformers]
1547+
t1[Transformer 1] --> t2[...] --> t3[Transformer N]
1548+
end
1549+
t3 --> Store
1550+
1551+
A fixed set of storage providers is recommended for implementation with this protocol:
1552+
1553+
1554+
Predefined storage transformers
1555+
-------------------------------
1556+
1557+
- :ref:`sharding-storage-transformer-v1`
1558+
15021559
Protocol extensions
15031560
===================
15041561

docs/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
sphinx==2.0.1
22
pydata-sphinx-theme
3-
3+
sphinxcontrib-mermaid

docs/storage_transformers.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
====================
2+
Storage Transformers
3+
====================
4+
5+
Under construction.
6+
7+
.. toctree::
8+
:maxdepth: 1
9+
:caption: Contents:
10+
11+
storage_transformers/sharding/v1.0
27.4 KB
Loading
Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
.. _sharding-storage-transformer-v1:
2+
3+
==========================================
4+
Sharding storage transformer (version 1.0)
5+
==========================================
6+
-----------------------------
7+
Editor's draft 18 02 2022
8+
-----------------------------
9+
10+
Specification URI:
11+
@@TODO
12+
http://purl.org/zarr/spec/storage_transformers/sharding/1.0
13+
Issue tracking:
14+
`GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/storage_transformers-sharding-v1.0>`_
15+
Suggest an edit for this spec:
16+
`GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/core-protocol-v3.0-dev/docs/storage_transformers/sharding/v1.0.rst>`_
17+
18+
Copyright 2022 `Zarr core development
19+
team <https://github.com/orgs/zarr-developers/teams/core-devs>`_ (@@TODO
20+
list institutions?). This work is licensed under a `Creative Commons
21+
Attribution 3.0 Unported
22+
License <https://creativecommons.org/licenses/by/3.0/>`_.
23+
24+
----
25+
26+
27+
Abstract
28+
========
29+
30+
This specification defines an implementation of the Zarr abstract
31+
storage transformer API introducing sharding.
32+
33+
34+
Motivation
35+
==========
36+
37+
Sharding decouples the concept of chunks from storage keys, which become shards.
38+
This is helpful when the requirements for those don't align:
39+
40+
- Compressible units of chunks often need to be read and written in smaller
41+
chunks, whereas
42+
- storage often is optimized for larger data per entry and fewer entries, e.g.
43+
as restricted by the file block size and maximum inode number for typical
44+
file systems.
45+
46+
This does not necessarily fit the access patterns of the data, so chunks might
47+
need to be smaller than one storage key. In those cases sharding decouples those
48+
entities. One shard corresponds to one storage key, but can contain multiple chunks:
49+
50+
.. image:: sharding.png
51+
52+
53+
Document conventions
54+
====================
55+
56+
Conformance requirements are expressed with a combination of
57+
descriptive assertions and [RFC2119]_ terminology. The key words
58+
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
59+
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
60+
parts of this document are to be interpreted as described in
61+
[RFC2119]_. However, for readability, these words do not appear in all
62+
uppercase letters in this specification.
63+
64+
All of the text of this specification is normative except sections
65+
explicitly marked as non-normative, examples, and notes. Examples in
66+
this specification are introduced with the words "for example".
67+
68+
69+
Configuration
70+
=============
71+
72+
:ref:`array-metadata`.
73+
74+
.. code-block::
75+
76+
{
77+
storage_transformers: [
78+
{
79+
"storage_transformer": "https://purl.org/zarr/spec/storage_transformers/sharding/1.0",
80+
"configuration": {
81+
"format": "indexed",
82+
"chunks_per_shard": [
83+
2,
84+
2
85+
]
86+
}
87+
]
88+
}
89+
90+
91+
Sharding Mechanism
92+
=========================
93+
94+
@@TODO
95+
96+
97+
Binary shard format
98+
===================
99+
100+
The only binary format is the ``indexed`` format, as specified by the ``format``
101+
configuration key. Other binary formats might be added in future versions.
102+
103+
In the indexed binary format chunks are written successively in a shard, where
104+
unused space between them is allowed, followed by an index referencing them.
105+
The index holds an `offset, length` pair of little-endian uint64 per chunk,
106+
the chunks-order in the index is row-major (C) order, e.g. for (2, 2) chunks
107+
per shard an index would look like:
108+
109+
.. code-block::
110+
111+
| chunk (0, 0) | chunk (0, 1) | chunk (1, 0) | chunk (1, 1) |
112+
| offset | length | offset | length | offset | length | offset | length |
113+
| uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 |
114+
115+
116+
Empty chunks are denoted by setting both offset and length to `2^64 - 1``.
117+
The index always has the full shape of all possible chunks per shard,
118+
even if they are outside of the array size.
119+
120+
The actual order of the chunk-content is not fixed and may be chosen by the implementation
121+
as all possible write orders are valid according to this specification and therefore can
122+
be read by any other implementation. When writing partial chunks into an existing shard no
123+
specific order of the existing chunks may be expected. Some writing strategies might be
124+
125+
* **Fixed order**: Specify a fixed order (e.g. row-, column-major or Morton order).
126+
When replacing existing chunks larger or equal sized chunks may be replaced in-place,
127+
leaving unused space up to an upper limit which might possibly be specified.
128+
Please note that for regular-sized uncompressed data all chunks have the same size and
129+
can therefore be replaced in-place.
130+
* **Append-only**: Any chunk to write is appended to the existing shard, followed by an updated index.
131+
132+
Any configuration parameters for the write strategy must not be part of the metadata document,
133+
in a shard I'd propose to use Morton order, but this can easily be changed and customized, since any order can be read.
134+
135+
136+
Key translation
137+
===============
138+
139+
The Zarr store interface is defined in terms of `keys` and `values`,
140+
where a `key` is a sequence of characters and a `value` is a sequence
141+
of bytes.
142+
143+
@@TODO
144+
145+
146+
Store API implementation
147+
========================
148+
149+
@@TODO
150+
151+
The section below defines an implementation of the Zarr abstract store
152+
interface (@@TODO link) in terms of the native operations of this
153+
storage system. Below ``fspath_to_key()`` is a function that
154+
translates file system paths to store keys, and ``key_to_fspath()`` is
155+
a function that translates store keys to file system paths, as defined
156+
in the section above.
157+
158+
* ``get(key) -> value`` : Read and return the contents of the file at
159+
file system path ``key_to_fspath(key)``.
160+
161+
* ``set(key, value)`` : Write ``value`` as the contents of the file at
162+
file system path ``key_to_fspath(key)``.
163+
164+
* ``delete(key)`` : Delete the file or directory at file system path
165+
``key_to_fspath(key)``.
166+
167+
* ``list()`` : Recursively walk the file system from the base
168+
directory, returning an iterator over keys obtained by calling
169+
``fspath_to_key(fp)`` for each descendant file path ``fp``.
170+
171+
* ``list_prefix(prefix)`` : Obtain a file system path by calling
172+
``key_to_fspath(prefix)``. If the result is a directory path,
173+
recursively walk the file system from this directory, returning an
174+
iterator over keys obtained by calling ``fspath_to_key(fp)`` for
175+
each descendant file path ``fp``.
176+
177+
* ``list_dir(prefix)`` : Obtain a file system path by calling
178+
``key_to_fspath(prefix)``. If the result is a director path, list
179+
the directory children. Return a set of keys obtained by calling
180+
``fspath_to_key(fp)`` for each child file path ``fp``, and a set of
181+
prefixes obtained by calling ``fspath_to_key(dp)`` for each child
182+
directory path ``dp``.
183+
184+
185+
References
186+
==========
187+
188+
.. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
189+
Requirement Levels. March 1997. Best Current Practice. URL:
190+
https://tools.ietf.org/html/rfc2119
191+
192+
193+
Change log
194+
==========
195+
196+
@@TODO

0 commit comments

Comments
 (0)