@@ -62,6 +62,43 @@ will be one single chunk for the array::
6262 >>> z5.chunks
6363 (10000, 10000)
6464
65+
66+ Sharding
67+ ~~~~~~~~
68+
69+ If you have large arrays but need small chunks to efficiently access the data, you can
70+ use sharding. Sharding provides a mechanism to store multiple chunks in a single
71+ storage object or file. This can be useful because traditional file systems and object
72+ storage systems may have issues with many small files.
73+
74+ Picking a good combination of chunk shape and shard shape is important for performance.
75+ The chunk shape determines what unit of your data can be read independently, while the
76+ shard shape determines what unit of your data can be written efficiently.
77+
78+ For an example, consider you have a 100 GB array and need to read small chunks of 1 MB.
79+ Without sharding, each chunk would be one file resulting in 10000 files. That can
80+ already cause performance issues on some file systems.
81+ With sharding, you could use a shard size of 1 GB. This would result in 1000 chunks per
82+ file and 100 files in total, which seems manageable for most storage systems.
83+ You would still be able to read each 1 MB chunk independently, but you would need to
84+ write your data in 1 GB increments.
85+
86+ To use sharding, you need to specify the ``shards `` parameter when creating the array.
87+
88+ >>> z6 = zarr.create_array(store = {}, shape = (10000 , 10000 , 1000 ), shards = (1000 , 1000 , 1000 ), chunks = (100 , 100 , 100 ), dtype = ' uint8' )
89+ >>> z6.info
90+ Type : Array
91+ Zarr format : 3
92+ Data type : DataType.uint8
93+ Shape : (10000, 10000, 1000)
94+ Shard shape : (1000, 1000, 1000)
95+ Chunk shape : (100, 100, 100)
96+ Order : C
97+ Read-only : False
98+ Store type : MemoryStore
99+ Codecs : [{'chunk_shape': (100, 100, 100), 'codecs': ({'endian': <Endian.little: 'little'>}, {'level': 0, 'checksum': False}), 'index_codecs': ({'endian': <Endian.little: 'little'>}, {}), 'index_location': <ShardingCodecIndexLocation.end: 'end'>}]
100+ No. bytes : 100000000000 (93.1G)
101+
65102.. _user-guide-chunks-order :
66103
67104Chunk memory layout
0 commit comments