Skip to content

Commit 720febb

Browse files
committed
improve motivation, add reference
1 parent c7c5905 commit 720febb

File tree

2 files changed

+14
-9
lines changed

2 files changed

+14
-9
lines changed

docs/protocol/core/v3.0.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1366,6 +1366,8 @@ Note that any non-root hierarchy path will have ancestor paths that
13661366
identify ancestor nodes in the hierarchy. For example, the path
13671367
"/foo/bar" has ancestor paths "/foo" and "/".
13681368

1369+
.. _storage-keys:
1370+
13691371
Storage keys
13701372
------------
13711373

docs/storage_transformers/sharding/v1.0.rst

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -30,19 +30,22 @@ Abstract
3030
This specification defines an implementation of the Zarr
3131
storage transformer protocol for sharding.
3232

33+
Sharding co-locates multiple chunks within a storage object, bundling them in shards.
34+
3335

3436
Motivation
3537
==========
3638

37-
Sharding decouples the concept of chunks from storage keys, which become shards.
38-
This is helpful when the requirements for those don't align:
39+
In many cases it becomes inefficient or impractical to store a large number of chunks in
40+
single files or objects due to the design constraints of the underlying storage,
41+
for example as restricted by the file block size and maximum inode number for typical file systems.
3942

40-
- Chunk sizes need to be small for read efficiency requirements, e.g. for data streaming in browser-based visualization software, whereas
41-
- it becomes inefficient or impractical to store a large number of chunks in single files or objects due to the design constraints of the underlying storage, e.g. as restricted by the file block size and maximum inode number for typical file systems.
43+
Increasing the chunk size works only up to a certain point, as chunk sizes need to be small for
44+
read efficiency requirements, for example to stream data in browser-based visualization software.
4245

43-
This does not necessarily fit the access patterns of the data, so chunks might
44-
need to be smaller than the minimum size of one storage key. In those cases sharding decouples those
45-
entities. One shard corresponds to one storage key, but can contain multiple chunks:
46+
Therefore, chunks may need to be smaller than the minimum size of one storage key.
47+
In those cases it is required to store objects at a more coarse granularity than reading chunks.
48+
Sharding solves this by allowing to store multiple chunks in one storage key, which is called a shard:
4649

4750
.. image:: sharding.png
4851

@@ -115,8 +118,8 @@ where a `key` is a sequence of characters and a `value` is a sequence
115118
of bytes. A key-value pair is called `entry` in the following part.
116119

117120
This sharding transformer only adapts entries where the key starts
118-
with `data/root`, as they indicate data keys for array chunks. All other
119-
entries are simply passed on.
121+
with `data/root`, as they indicate data keys for array chunks, see
122+
:ref:`storage-keys`. All other entries are simply passed on.
120123

121124
Entries starting with ``data/root`` are grouped by their common shard, assuming
122125
storage keys from a regular chunk grid which may use a customly configured

0 commit comments

Comments
 (0)