Skip to content

Commit 8c71dc7

Browse files
authored
Merge pull request #1 from HEFTIEProject/rmg/benchmarks
Add summaries for benchmarking and tools for working with chunked datasets
2 parents 3c6c8a1 + 1a0983b commit 8c71dc7

File tree

1 file changed

+40
-0
lines changed

1 file changed

+40
-0
lines changed

README.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,48 @@ This textbook gives scientists:
1313
- a guide to designing parallel processing algorithms to work efficiently with chunked datasets
1414
- a guide to exporting chunked datasets to other 'tradditional' datasets
1515

16+
## Benchmarking for Zarr
17+
18+
We created a [set of benchmarks](https://github.com/HEFTIEProject/zarr-benchmarks) for reading / writing data to Zarr with a range of different configurations. These benchmarks provide guidance on how selection of different configurations affect data size and read/write performance.
19+
The different parameters were:
20+
21+
- Type of image
22+
- Heart: HiP-CT scan of a heart from the Human Organ Atlas
23+
- Dense: segmented neurons from electron microscopy
24+
- Sparse: A few select segmented neurons from electron microscopy
25+
- Software libraries
26+
- Tensorstore (fastest for both reading and writing data)
27+
- zarr-python version 3
28+
- zarr-python version 2 (slowest for both reading and writing data)
29+
- Compressor
30+
- blosc-zstd provides the best compression ratio, for image and segmentation data. (options were blosc-blosclz, blosc-lz4, blosc-lz4hc, blosc-zlib, blosc-zstd as well as gzip and zstd)
31+
- Compression level
32+
- Setting compression levels beyond ~3 results in slightly better data compression but much longer write times. Compression level does not affect read time.
33+
- Shuffle
34+
- Setting the shuffle option increases data compression with no adverse effect on read/write times (shuffle, bitshuffle and noshuffle were the 3 options)
35+
- Zarr format version
36+
- There was no noticeable difference between Zarr format 2 and Zarr format 3 data
37+
- Chunk size
38+
- Setting a low chunk size (below around 90) has an adverse effect on read and write times.
39+
1640
## Tools for working with chunked datasets
1741

42+
Contributions have been made to the zarr-python repository:
43+
44+
- [Add CLI for converting v2 metadata to v3](https://github.com/zarr-developers/zarr-python/pull/3257)
45+
- [Added ArrayNotFoundError](https://github.com/zarr-developers/zarr-python/pull/3367)
46+
- [Better document acceptable values for StoreLike](https://github.com/zarr-developers/zarr-python/pull/3480)
47+
48+
PRs have been opened in the zarr-python repository:
49+
50+
- [Prevent creation of arrays/groups under a parent array](https://github.com/zarr-developers/zarr-python/pull/3407)
51+
- [Holding space - LRUStoreCache]
52+
53+
PRs have also been opened for:
54+
55+
- [Document supported file formats for dask_image.imread](https://github.com/dask/dask-image/issues/407)
56+
- [Document supported file formats for skimage.io](https://github.com/scikit-image/scikit-image/issues/7879)
57+
1858
## Improvements to cloud visualisation
1959

2060
## Acknowledgements

0 commit comments

Comments
 (0)