Skip to content

Commit cae3ad3

Browse files
committed
Merge 'add-storage-transformers-and-sharding-v1.0' into core-protocol-v3.0-dev
As discussed during recent community meetings and steering council, merging this proposal into the dev branch as a common basis for discussions. The final list of features to be included in v3.0 is to be decided.
2 parents e889419 + 75d5e49 commit cae3ad3

File tree

8 files changed

+402
-3
lines changed

8 files changed

+402
-3
lines changed

docs/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
2929
# ones.
3030
extensions = [
31+
'sphinxcontrib.mermaid'
3132
]
3233

3334
# Add any paths that contain templates here, relative to this directory.

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ Under construction.
1212
protocol
1313
codecs
1414
stores
15+
storage_transformers
1516

1617

1718
Indices and tables

docs/protocol/core/v3.0.rst

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -384,6 +384,19 @@ conceptual model underpinning the Zarr protocol.
384384
interface`_ which is a common set of operations that stores may
385385
provide.
386386

387+
.. _storage transformer:
388+
.. _storage transformers:
389+
390+
*Storage transformer*
391+
392+
To enhance the storage capabilities, storage transformers may
393+
change the storage structure and behaviour of data coming from
394+
an array_ in the underlying store_. Upon retrival the original data is
395+
restored within the transformer. Any number of `predefined storage
396+
transformers`_ can be registered and stacked.
397+
See the `storage transformers details`_ below.
398+
399+
.. _`storage transformers details`: #storage-transformers-1
387400

388401
Node names
389402
==========
@@ -896,6 +909,8 @@ ignored if not understood::
896909
}
897910

898911

912+
.. _array-metadata:
913+
899914
Array metadata
900915
--------------
901916

@@ -1026,6 +1041,17 @@ The following names are optional:
10261041
specification. When the ``compressor`` name is absent, this means that no
10271042
compressor is used.
10281043

1044+
``storage_transformers``
1045+
1046+
Specifies a stack of `storage transformers`_. Each value in the list must
1047+
be an object containing the name ``storage_transformer`` whose value
1048+
is a URI that identifies a storage transformer and dereferences to a
1049+
human-readable representation of the codec specification. The
1050+
object may also contain a ``configuration`` object which consists of the
1051+
parameter names and values as defined by the corresponding storage transformer
1052+
specification. When the ``storage_transformers`` name is absent no storage
1053+
transformer is used, same for an empty list.
1054+
10291055

10301056
All other names within the array metadata object are reserved for
10311057
future versions of this specification.
@@ -1148,6 +1174,9 @@ interface`_ subsection. The store interface can be implemented using a
11481174
variety of underlying storage technologies, described in the
11491175
subsection on `Store implementations`_.
11501176

1177+
1178+
.. _abstract-store-interface:
1179+
11511180
Abstract store interface
11521181
------------------------
11531182

@@ -1169,6 +1198,23 @@ one such pair for any given `key`. I.e., a store is a mapping from
11691198
keys to values. It is also assumed that keys are case sensitive, i.e.,
11701199
the keys "foo" and "FOO" are different.
11711200

1201+
To read and write partial values, a `range` specifies two integers
1202+
`range_start` and `range_length`, that specify a part of the value
1203+
starting at byte `range_start` (inclusive) and having a length of
1204+
`range_length` bytes. `range_length` may be none, indicating all
1205+
available data until the end of the referenced value. For example
1206+
`range` ``[0, none]`` specifies the full value. Stores that do not
1207+
support partial access can still answer the requests using cutouts
1208+
of full values. It is recommended that the implementation of the
1209+
``get_partial_values``, ``set_partial_values`` and
1210+
``erase_values`` methods is made optional, providing fallbacks
1211+
for them by default. However, it is recommended to supply those operations
1212+
where possible for efficiency. Also, the ``get``, ``set`` and ``erase``
1213+
can easily be mapped onto their `partial_values` counterparts.
1214+
Therefore, it is also recommended to supply fallbacks for those if the
1215+
`partial_values` operations can be implemented.
1216+
An entity containing those fallbacks could be named ``StoreWithPartialAccess``.
1217+
11721218
The store interface also defines some operations involving
11731219
`prefixes`. In the context of this interface, a prefix is a string
11741220
containing only characters that are valid for use in `keys` and ending
@@ -1180,23 +1226,46 @@ a store implementation to support all of these capabilities.
11801226

11811227
A **readable store** supports the following operation:
11821228

1229+
@@TODO add bundled & partial access
1230+
11831231
``get`` - Retrieve the `value` associated with a given `key`.
11841232

11851233
| Parameters: `key`
11861234
| Output: `value`
11871235
1236+
``get_partial_values`` - Retrieve possibly partial `values` from given `key_ranges`.
1237+
1238+
| Parameters: `key_ranges`: ordered set of `key`, `range` pairs,
1239+
| a `key` may occur multiple times with different `ranges`
1240+
| Output: list of `values`, in the order of the `key_ranges`, may contain none
1241+
| for missing keys
1242+
11881243
A **writeable store** supports the following operations:
11891244

11901245
``set`` - Store a (`key`, `value`) pair.
11911246

11921247
| Parameters: `key`, `value`
11931248
| Output: none
11941249
1250+
``set_partial_values`` - Store `values` at a given `key`, starting at byte `range_start`.
1251+
1252+
| Parameters: `key_start_values`: set of `key`,
1253+
| `range_start`, `value` triples, a `key` may occur multiple
1254+
| times with different `range_starts`, `range_starts` with
1255+
| length of the respective `value` must not specify overlapping
1256+
| ranges for the same `key`
1257+
| Output: none
1258+
11951259
``erase`` - Erase the given key/value pair from the store.
11961260

11971261
| Parameters: `key`
11981262
| Output: none
11991263
1264+
``erase_values`` - Erase the given key/value pairs from the store.
1265+
1266+
| Parameters: `keys`: set of `keys`
1267+
| Output: none
1268+
12001269
``erase_prefix`` - Erase all keys with the given prefix from the store:
12011270

12021271
| Parameter: `prefix`
@@ -1314,6 +1383,8 @@ Note that any non-root hierarchy path will have ancestor paths that
13141383
identify ancestor nodes in the hierarchy. For example, the path
13151384
"/foo/bar" has ancestor paths "/foo" and "/".
13161385

1386+
.. _storage-keys:
1387+
13171388
Storage keys
13181389
------------
13191390

@@ -1505,6 +1576,42 @@ Let "+" be the string concatenation operator.
15051576
For listable store, ``list_dir(parent(P))`` can be an alternative.
15061577

15071578

1579+
Storage transformers
1580+
====================
1581+
1582+
A Zarr storage transformer allows to change the zarr-compatible data before storing it.
1583+
The stored transformed data is restored to its original state whenever data is requested
1584+
by the Array. Storage transformers can be configured per array via the
1585+
``storage_transformers`` name in the `array metadata`_. Storage transformers which do
1586+
not change the storage layout (e.g. for caching) may be specified at runtime without
1587+
adding them to the array metadata.
1588+
1589+
A storage transformer serves the same `Abstract store interface`_ as the store_.
1590+
However, it should not persistently store any information necessary to restore the original data,
1591+
but instead propagates this to the next storage transformer or the final store.
1592+
From the perspective of an Array or a previous stage transformer both store and storage transformer follow the same
1593+
protocol and can be interchanged regarding the protocol. The behaviour can still be different,
1594+
e.g. requests may be cached or the form of the underlying data can change.
1595+
1596+
Storage transformers may be stacked to combine different functionalities:
1597+
1598+
.. mermaid::
1599+
1600+
graph LR
1601+
Array --> t1
1602+
subgraph stack [Storage transformers]
1603+
t1[Transformer 1] --> t2[...] --> t3[Transformer N]
1604+
end
1605+
t3 --> Store
1606+
1607+
A fixed set of storage providers is recommended for implementation with this protocol:
1608+
1609+
1610+
Predefined storage transformers
1611+
-------------------------------
1612+
1613+
- :ref:`sharding-storage-transformer-v1`
1614+
15081615
Protocol extensions
15091616
===================
15101617

docs/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
sphinx==2.0.1
22
pydata-sphinx-theme
3-
3+
sphinxcontrib-mermaid

docs/storage_transformers.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
====================
2+
Storage Transformers
3+
====================
4+
5+
Under construction.
6+
7+
.. toctree::
8+
:maxdepth: 1
9+
:caption: Contents:
10+
11+
storage_transformers/sharding/v1.0
27.4 KB
Loading

0 commit comments

Comments
 (0)