Skip to content

Commit a4a600e

Browse files
committed
parallel computing
1 parent acd78f9 commit a4a600e

File tree

3 files changed

+115
-4
lines changed

3 files changed

+115
-4
lines changed

slides/scipy-2019-files/lock.png

23.3 KB
Loading

slides/scipy-2019-files/nolock.png

19.9 KB
Loading

slides/scipy-2019.md

Lines changed: 115 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ Zarr - scalable storage of tensor data for parallel and distributed computing
55

66
Alistair Miles ([@alimanfoo](https://github.com/alimanfoo)) - SciPy 2019
77

8+
These slides: @@TODO URL
9+
810
====
911

1012
@@TODO mosquito image
@@ -116,8 +118,8 @@ Align the chunks!
116118
```python
117119
import dask.array as da
118120

119-
# @@TODO check this works!
120-
x = da.from_array(storage)
121+
a = ... # what goes here?
122+
x = da.from_array(a)
121123
y = (x - x.mean(axis=1)) / x.std(axis=1)
122124
u, s, v = da.svd_compressed(y, 20)
123125
u = u.compute()
@@ -556,6 +558,75 @@ class ZipStore(MutableMapping):
556558

557559
====
558560

561+
## Parallel computing with Zarr
562+
563+
* A Zarr array can have multiple concurrent readers*
564+
* A Zarr array can have multiple concurrent writers*
565+
* Both multi-thread and multi-process parallelism are supported
566+
* GIL is released during critical sections (compression and decompression)
567+
568+
<small>* depending on the store</small>
569+
570+
===
571+
572+
### Dask + Zarr
573+
574+
```python
575+
import dask.array as da
576+
import zarr
577+
578+
# set up input
579+
store = ... # some Zarr store
580+
root = zarr.group(store)
581+
big = root['big']
582+
big = da.from_array(big)
583+
584+
# define computation
585+
output = big * 42 + ...
586+
587+
# if output is small, compute to memory
588+
o = output.compute()
589+
590+
# if output is big, compute and write directly to Zarr
591+
output.to_zarr(@@TODO)
592+
```
593+
594+
See docs for `da.from_array`, `da.from_zarr`, `da.to_zarr`. @@TODO links
595+
596+
===
597+
598+
### Write locks?
599+
600+
<p class="stretch"><img src="scipy-2019-files/nolock.png"></p>
601+
602+
* If each writer is writing to a different region of an array, and all
603+
writes are **aligned with chunk boundaries**, then locking is **not
604+
required**.
605+
606+
===
607+
608+
### Write locks?
609+
610+
<p class="stretch"><img src="scipy-2019-files/lock.png"></p>
611+
612+
* If each writer is writing to a different region of an array, and
613+
writes are **not aligned** with chunk boundaries, then locking **is
614+
required** to avoid contention and/or data loss.
615+
616+
===
617+
618+
### Write locks?
619+
620+
* Zarr does support chunk-level write locks for either multi-thread or
621+
multi-process writes.
622+
623+
* But generally easier and better to align writes with chunk
624+
boundaries where possible.
625+
626+
@@TODO link to docs
627+
628+
====
629+
559630
## Compressors
560631

561632
===
@@ -566,15 +637,55 @@ class ZipStore(MutableMapping):
566637

567638
<small><a href="http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html">http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html</a></small>
568639

640+
===
641+
642+
### Available compressors (via numcodecs)
643+
644+
@@TODO
645+
646+
===
647+
648+
### Compressor (codec) interface
649+
650+
@@TODO
651+
652+
===
653+
654+
### E.g., zlib implementation
655+
656+
@@TODO
657+
569658
====
570659

571-
## TODO
660+
## Zarr specification
572661

573-
*
662+
@@TODO image
663+
664+
====
665+
666+
## Integrations, applications and other implementations
667+
668+
* @@TODO dask, xarray, intake (e.g., Pangeo data catalog)
669+
* @@TODO z5 - C++ implementation
670+
* @@TODO Zarr.jl - native Julia implementation
671+
* @@TODO Scala implementation
672+
* @@TODO other implementations?
673+
* @@TODO Unidata working on implementation in NetCDF C library
674+
* @@TODO OME, microscopy
675+
* @@TODO single cell examples
676+
* @@TODO Met office use cases
574677

575678
====
576679

577680
## Future
578681

579682
* Zarr/N5
580683
* v3 protocol spec
684+
685+
Community!
686+
687+
====
688+
689+
## Acknowledgments
690+
691+
@@TODO

0 commit comments

Comments
 (0)