You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/storage_transformers/sharding/v1.0.rst
+5-8Lines changed: 5 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,14 +37,11 @@ Motivation
37
37
Sharding decouples the concept of chunks from storage keys, which become shards.
38
38
This is helpful when the requirements for those don't align:
39
39
40
-
- Compressible units of chunks often need to be read and written in smaller
41
-
chunks, whereas
42
-
- storage often is optimized for larger data per entry and fewer entries, e.g.
43
-
as restricted by the file block size and maximum inode number for typical
44
-
file systems.
40
+
- Chunk sizes need to be small for read efficiency requirements, e.g. for data streaming in browser-based visualization software, whereas
41
+
- it becomes inefficient or impractical to store a large number of chunks in single files or objects due to the design constraints of the underlying storage, e.g. as restricted by the file block size and maximum inode number for typical file systems.
45
42
46
43
This does not necessarily fit the access patterns of the data, so chunks might
47
-
need to be smaller than one storage key. In those cases sharding decouples those
44
+
need to be smaller than the minimum size of one storage key. In those cases sharding decouples those
48
45
entities. One shard corresponds to one storage key, but can contain multiple chunks:
49
46
50
47
.. image:: sharding.png
@@ -113,7 +110,7 @@ Key & value transformation
113
110
The storage transformer protocol defines the abstract interface to be the same
114
111
as the :ref:`abstract-store-interface`.
115
112
116
-
The Zarr store interface is defined in terms of `keys` and `values`,
113
+
The Zarr store interface is defined as a mapping of `keys` and `values`,
117
114
where a `key` is a sequence of characters and a `value` is a sequence
118
115
of bytes. A key-value pair is called `entry` in the following part.
119
116
@@ -143,7 +140,7 @@ configuration key. Other binary formats might be added in future versions.
143
140
144
141
In the indexed binary format chunks are written successively in a shard, where
145
142
unused space between them is allowed, followed by an index referencing them.
146
-
The index is placed at the end of the file and has a length of 16 bytes per chunk
143
+
The index is placed at the end of the file and has a length of 16 bytes multiplied by the number of chunks
147
144
in a shard, for example ``16 bytes * 64 = 1014 bytes`` for ``chunks_per_shard=[32, 2]``.
148
145
The index holds an `offset, length` pair of little-endian uint64 per chunk,
149
146
the chunks-order in the index is row-major (C) order, for example for
0 commit comments