You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/storage_transformers/sharding/v1.0.rst
+12-9Lines changed: 12 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,19 +30,22 @@ Abstract
30
30
This specification defines an implementation of the Zarr
31
31
storage transformer protocol for sharding.
32
32
33
+
Sharding co-locates multiple chunks within a storage object, bundling them in shards.
34
+
33
35
34
36
Motivation
35
37
==========
36
38
37
-
Sharding decouples the concept of chunks from storage keys, which become shards.
38
-
This is helpful when the requirements for those don't align:
39
+
In many cases it becomes inefficient or impractical to store a large number of chunks in
40
+
single files or objects due to the design constraints of the underlying storage,
41
+
for example as restricted by the file block size and maximum inode number for typical file systems.
39
42
40
-
- Chunk sizes need to be small for read efficiency requirements, e.g. for data streaming in browser-based visualization software, whereas
41
-
- it becomes inefficient or impractical to store a large number of chunks in single files or objects due to the design constraints of the underlying storage, e.g. as restricted by the file block size and maximum inode number for typical file systems.
43
+
Increasing the chunk size works only up to a certain point, as chunk sizes need to be small for
44
+
read efficiency requirements, for example to stream data in browser-based visualization software.
42
45
43
-
This does not necessarily fit the access patterns of the data, so chunks might
44
-
need to be smaller than the minimum size of one storage key. In those cases sharding decouples those
45
-
entities. One shard corresponds to one storage key, but can contain multiple chunks:
46
+
Therefore, chunks may need to be smaller than the minimum size of one storage key.
47
+
In those cases it is required to store objects at a more coarse granularity than reading chunks.
48
+
Sharding solves this by allowing to store multiple chunks in one storage key, which is called a shard:
46
49
47
50
.. image:: sharding.png
48
51
@@ -115,8 +118,8 @@ where a `key` is a sequence of characters and a `value` is a sequence
115
118
of bytes. A key-value pair is called `entry` in the following part.
116
119
117
120
This sharding transformer only adapts entries where the key starts
118
-
with `data/root`, as they indicate data keys for array chunks. All other
119
-
entries are simply passed on.
121
+
with `data/root`, as they indicate data keys for array chunks, see
122
+
:ref:`storage-keys`. All other entries are simply passed on.
120
123
121
124
Entries starting with ``data/root`` are grouped by their common shard, assuming
122
125
storage keys from a regular chunk grid which may use a customly configured
0 commit comments