You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+40Lines changed: 40 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,8 +13,48 @@ This textbook gives scientists:
13
13
- a guide to designing parallel processing algorithms to work efficiently with chunked datasets
14
14
- a guide to exporting chunked datasets to other 'tradditional' datasets
15
15
16
+
## Benchmarking for Zarr
17
+
18
+
We created a [set of benchmarks](https://github.com/HEFTIEProject/zarr-benchmarks) for reading / writing data to Zarr with a range of different configurations. These benchmarks provide guidance on how selection of different configurations affect data size and read/write performance.
19
+
The different parameters were:
20
+
21
+
- Type of image
22
+
- Heart: HiP-CT scan of a heart from the Human Organ Atlas
23
+
- Dense: segmented neurons from electron microscopy
24
+
- Sparse: A few select segmented neurons from electron microscopy
25
+
- Software libraries
26
+
- Tensorstore (fastest for both reading and writing data)
27
+
- zarr-python version 3
28
+
- zarr-python version 2 (slowest for both reading and writing data)
29
+
- Compressor
30
+
- blosc-zstd provides the best compression ratio, for image and segmentation data. (options were blosc-blosclz, blosc-lz4, blosc-lz4hc, blosc-zlib, blosc-zstd as well as gzip and zstd)
31
+
- Compression level
32
+
- Setting compression levels beyond ~3 results in slightly better data compression but much longer write times. Compression level does not affect read time.
33
+
- Shuffle
34
+
- Setting the shuffle option increases data compression with no adverse effect on read/write times (shuffle, bitshuffle and noshuffle were the 3 options)
35
+
- Zarr format version
36
+
- There was no noticeable difference between Zarr format 2 and Zarr format 3 data
37
+
- Chunk size
38
+
- Setting a low chunk size (below around 90) has an adverse effect on read and write times.
39
+
16
40
## Tools for working with chunked datasets
17
41
42
+
Contributions have been made to the zarr-python repository:
43
+
44
+
-[Add CLI for converting v2 metadata to v3](https://github.com/zarr-developers/zarr-python/pull/3257)
0 commit comments