Skip to content

Commit 15291b2

Browse files
jstriebelnormanrz
andauthored
Apply suggestions from code review
Co-authored-by: Norman Rzepka <[email protected]>
1 parent 28a9023 commit 15291b2

File tree

1 file changed

+5
-8
lines changed
  • docs/storage_transformers/sharding

1 file changed

+5
-8
lines changed

docs/storage_transformers/sharding/v1.0.rst

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -37,14 +37,11 @@ Motivation
3737
Sharding decouples the concept of chunks from storage keys, which become shards.
3838
This is helpful when the requirements for those don't align:
3939

40-
- Compressible units of chunks often need to be read and written in smaller
41-
chunks, whereas
42-
- storage often is optimized for larger data per entry and fewer entries, e.g.
43-
as restricted by the file block size and maximum inode number for typical
44-
file systems.
40+
- Chunk sizes need to be small for read efficiency requirements, e.g. for data streaming in browser-based visualization software, whereas
41+
- it becomes inefficient or impractical to store a large number of chunks in single files or objects due to the design constraints of the underlying storage, e.g. as restricted by the file block size and maximum inode number for typical file systems.
4542

4643
This does not necessarily fit the access patterns of the data, so chunks might
47-
need to be smaller than one storage key. In those cases sharding decouples those
44+
need to be smaller than the minimum size of one storage key. In those cases sharding decouples those
4845
entities. One shard corresponds to one storage key, but can contain multiple chunks:
4946

5047
.. image:: sharding.png
@@ -113,7 +110,7 @@ Key & value transformation
113110
The storage transformer protocol defines the abstract interface to be the same
114111
as the :ref:`abstract-store-interface`.
115112

116-
The Zarr store interface is defined in terms of `keys` and `values`,
113+
The Zarr store interface is defined as a mapping of `keys` and `values`,
117114
where a `key` is a sequence of characters and a `value` is a sequence
118115
of bytes. A key-value pair is called `entry` in the following part.
119116

@@ -143,7 +140,7 @@ configuration key. Other binary formats might be added in future versions.
143140

144141
In the indexed binary format chunks are written successively in a shard, where
145142
unused space between them is allowed, followed by an index referencing them.
146-
The index is placed at the end of the file and has a length of 16 bytes per chunk
143+
The index is placed at the end of the file and has a length of 16 bytes multiplied by the number of chunks
147144
in a shard, for example ``16 bytes * 64 = 1014 bytes`` for ``chunks_per_shard=[32, 2]``.
148145
The index holds an `offset, length` pair of little-endian uint64 per chunk,
149146
the chunks-order in the index is row-major (C) order, for example for

0 commit comments

Comments
 (0)