@@ -5,6 +5,8 @@ Zarr - scalable storage of tensor data for parallel and distributed computing
55
66Alistair Miles ([ @alimanfoo ] ( https://github.com/alimanfoo ) ) - SciPy 2019
77
8+ These slides: @@TODO URL
9+
810====
911
1012@@TODO mosquito image
@@ -116,8 +118,8 @@ Align the chunks!
116118``` python
117119import dask.array as da
118120
119- # @@ TODO check this works!
120- x = da.from_array(storage )
121+ a = ... # what goes here?
122+ x = da.from_array(a )
121123y = (x - x.mean(axis = 1 )) / x.std(axis = 1 )
122124u, s, v = da.svd_compressed(y, 20 )
123125u = u.compute()
@@ -556,6 +558,75 @@ class ZipStore(MutableMapping):
556558
557559====
558560
561+ ## Parallel computing with Zarr
562+
563+ * A Zarr array can have multiple concurrent readers*
564+ * A Zarr array can have multiple concurrent writers*
565+ * Both multi-thread and multi-process parallelism are supported
566+ * GIL is released during critical sections (compression and decompression)
567+
568+ <small >* depending on the store</small >
569+
570+ ===
571+
572+ ### Dask + Zarr
573+
574+ ``` python
575+ import dask.array as da
576+ import zarr
577+
578+ # set up input
579+ store = ... # some Zarr store
580+ root = zarr.group(store)
581+ big = root[' big' ]
582+ big = da.from_array(big)
583+
584+ # define computation
585+ output = big * 42 + ...
586+
587+ # if output is small, compute to memory
588+ o = output.compute()
589+
590+ # if output is big, compute and write directly to Zarr
591+ output.to_zarr(@@ TODO )
592+ ```
593+
594+ See docs for ` da.from_array ` , ` da.from_zarr ` , ` da.to_zarr ` . @@TODO links
595+
596+ ===
597+
598+ ### Write locks?
599+
600+ <p class =" stretch " ><img src =" scipy-2019-files/nolock.png " ></p >
601+
602+ * If each writer is writing to a different region of an array, and all
603+ writes are ** aligned with chunk boundaries** , then locking is ** not
604+ required** .
605+
606+ ===
607+
608+ ### Write locks?
609+
610+ <p class =" stretch " ><img src =" scipy-2019-files/lock.png " ></p >
611+
612+ * If each writer is writing to a different region of an array, and
613+ writes are ** not aligned** with chunk boundaries, then locking ** is
614+ required** to avoid contention and/or data loss.
615+
616+ ===
617+
618+ ### Write locks?
619+
620+ * Zarr does support chunk-level write locks for either multi-thread or
621+ multi-process writes.
622+
623+ * But generally easier and better to align writes with chunk
624+ boundaries where possible.
625+
626+ @@TODO link to docs
627+
628+ ====
629+
559630## Compressors
560631
561632===
@@ -566,15 +637,55 @@ class ZipStore(MutableMapping):
566637
567638<small ><a href =" http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html " >http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html </a ></small >
568639
640+ ===
641+
642+ ### Available compressors (via numcodecs)
643+
644+ @@TODO
645+
646+ ===
647+
648+ ### Compressor (codec) interface
649+
650+ @@TODO
651+
652+ ===
653+
654+ ### E.g., zlib implementation
655+
656+ @@TODO
657+
569658====
570659
571- ## TODO
660+ ## Zarr specification
572661
573- *
662+ @@TODO image
663+
664+ ====
665+
666+ ## Integrations, applications and other implementations
667+
668+ * @@TODO dask, xarray, intake (e.g., Pangeo data catalog)
669+ * @@TODO z5 - C++ implementation
670+ * @@TODO Zarr.jl - native Julia implementation
671+ * @@TODO Scala implementation
672+ * @@TODO other implementations?
673+ * @@TODO Unidata working on implementation in NetCDF C library
674+ * @@TODO OME, microscopy
675+ * @@TODO single cell examples
676+ * @@TODO Met office use cases
574677
575678====
576679
577680## Future
578681
579682* Zarr/N5
580683* v3 protocol spec
684+
685+ Community!
686+
687+ ====
688+
689+ ## Acknowledgments
690+
691+ @@TODO
0 commit comments