performance docs

normanrz · normanrz · commit 827cff00c76a · 2025-01-04T14:41:18.000+01:00
diff --git a/docs/user-guide/arrays.rst b/docs/user-guide/arrays.rst
@@ -595,8 +595,8 @@ Sharded arrays can be created by providing the ``shards`` parameter to :func:`za
   Zarr format        : 3
   Data type          : DataType.uint8
   Shape              : (10000, 10000)
-  Chunk shape        : (100, 100)
   Shard shape        : (1000, 1000)
+  Chunk shape        : (100, 100)
   Order              : C
   Read-only          : False
   Store type         : LocalStore
diff --git a/docs/user-guide/performance.rst b/docs/user-guide/performance.rst
@@ -62,6 +62,43 @@ will be one single chunk for the array::
    >>> z5.chunks
    (10000, 10000)
 
+
+Sharding
+~~~~~~~~
+
+If you have large arrays but need small chunks to efficiently access the data, you can
+use sharding. Sharding provides a mechanism to store multiple chunks in a single
+storage object or file. This can be useful because traditional file systems and object
+storage systems may have issues with many small files.
+
+Picking a good combination of chunk shape and shard shape is important for performance.
+The chunk shape determines what unit of your data can be read independently, while the
+shard shape determines what unit of your data can be written efficiently.
+
+For an example, consider you have a 100 GB array and need to read small chunks of 1 MB.
+Without sharding, each chunk would be one file resulting in 10000 files. That can
+already cause performance issues on some file systems.
+With sharding, you could use a shard size of 1 GB. This would result in 1000 chunks per
+file and 100 files in total, which seems manageable for most storage systems.
+You would still be able to read each 1 MB chunk independently, but you would need to
+write your data in 1 GB increments.
+
+To use sharding, you need to specify the ``shards`` parameter when creating the array.
+
+   >>> z6 = zarr.create_array(store={}, shape=(10000, 10000, 1000), shards=(1000, 1000, 1000), chunks=(100, 100, 100), dtype='uint8')
+   >>> z6.info
+   Type               : Array
+   Zarr format        : 3
+   Data type          : DataType.uint8
+   Shape              : (10000, 10000, 1000)
+   Shard shape        : (1000, 1000, 1000)
+   Chunk shape        : (100, 100, 100)
+   Order              : C
+   Read-only          : False
+   Store type         : MemoryStore
+   Codecs             : [{'chunk_shape': (100, 100, 100), 'codecs': ({'endian': <Endian.little: 'little'>}, {'level': 0, 'checksum': False}), 'index_codecs': ({'endian': <Endian.little: 'little'>}, {}), 'index_location': <ShardingCodecIndexLocation.end: 'end'>}]
+   No. bytes          : 100000000000 (93.1G)
+
 .. _user-guide-chunks-order:
 
 Chunk memory layout
diff --git a/src/zarr/core/_info.py b/src/zarr/core/_info.py
@@ -80,8 +80,8 @@ class ArrayInfo:
     _zarr_format: ZarrFormat
     _data_type: np.dtype[Any] | DataType
     _shape: tuple[int, ...]
-    _chunk_shape: tuple[int, ...] | None = None
     _shard_shape: tuple[int, ...] | None = None
+    _chunk_shape: tuple[int, ...] | None = None
     _order: Literal["C", "F"]
     _read_only: bool
     _store_type: str
@@ -97,14 +97,14 @@ def __repr__(self) -> str:
         Type               : {_type}
         Zarr format        : {_zarr_format}
         Data type          : {_data_type}
-        Shape              : {_shape}
-        Chunk shape        : {_chunk_shape}""")
+        Shape              : {_shape}""")
 
         if self._shard_shape is not None:
             template += textwrap.dedent("""
         Shard shape        : {_shard_shape}""")
 
         template += textwrap.dedent("""
+        Chunk shape        : {_chunk_shape}
         Order              : {_order}
         Read-only          : {_read_only}
         Store type         : {_store_type}""")