You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Multiple arrays (a.k.a. datasets) can be created and organised into
232
+
a hierarchy of groups.
233
+
234
+
* Each array is divided into regular shaped chunks.
235
+
236
+
* Each chunk is compressed before storage.
237
+
238
+
===
239
+
226
240
### Creating a hierarchy
227
241
228
242
```python
@@ -554,16 +568,16 @@ class ZipStore(MutableMapping):
554
568
yield key
555
569
```
556
570
557
-
<small>(Actual implementation is slightly more complicated, but this is the essence.)</small>
571
+
<small>(<ahref="https://github.com/zarr-developers/zarr-python/blob/e61d6ae77f18e881be0b80e38b5366793f5a2860/zarr/storage.py#L1033">Actual implementation</a> is slightly more complicated, but this is the essence.)</small>
558
572
559
573
====
560
574
561
575
## Parallel computing with Zarr
562
576
563
-
* A Zarr array can have multiple concurrent readers*
564
-
* A Zarr array can have multiple concurrent writers*
565
-
* Both multi-thread and multi-process parallelism are supported
566
-
* GIL is released during critical sections (compression and decompression)
577
+
* A Zarr array can have multiple concurrent readers*.
578
+
* A Zarr array can have multiple concurrent writers*.
579
+
* Both multi-thread and multi-process parallelism are supported.
580
+
* GIL is released during critical sections (compression and decompression).
567
581
568
582
<small>* Depending on the store.</small>
569
583
@@ -588,10 +602,14 @@ output = big * 42 + ...
588
602
o = output.compute()
589
603
590
604
# if output is big, compute and write directly to Zarr
591
-
output.to_zarr(@@TODO)
605
+
da.to_zarr(output, store, component='output')
592
606
```
593
607
594
-
See docs for `da.from_array`, `da.from_zarr`, `da.to_zarr`. @@TODO links
The numcodecs Codec interface defines the API for filters and compressors for use with Zarr. Built around the Python buffer protocol.
681
+
<tableclass="stretch">
682
+
<tr>
683
+
<tdstyle="vertical-align: top">
684
+
<p>
685
+
The numcodecs <ahref="https://numcodecs.readthedocs.io/en/stable/abc.html">Codec API</a> defines the interface for filters and compressors for use with Zarr.
669
686
</p>
670
-
671
-
@@TODO link to buffer protocol
687
+
<p>
688
+
Built around the <ahref="https://docs.python.org/3/c-api/buffer.html">Python buffer protocol</a>.
689
+
</p>
690
+
</td>
691
+
<tdstyle="vertical-align: top">
692
+
<imgsrc="scipy-2019-files/codec-api.png">
693
+
</td>
694
+
</tr>
695
+
</table>
672
696
673
697
===
674
698
@@ -684,7 +708,7 @@ class Zlib(Codec):
684
708
buf = ensure_contiguous_ndarray(buf)
685
709
686
710
# do compression
687
-
return_zlib.compress(buf, self.level)
711
+
returnzlib.compress(buf, self.level)
688
712
689
713
defdecode(self, buf, out=None):
690
714
@@ -694,7 +718,7 @@ class Zlib(Codec):
694
718
out = ensure_contiguous_ndarray(out)
695
719
696
720
# do decompression
697
-
dec =_zlib.decompress(buf)
721
+
dec =zlib.decompress(buf)
698
722
699
723
return ndarray_copy(dec, out)
700
724
@@ -710,12 +734,10 @@ class Zlib(Codec):
710
734
711
735
## Other Zarr implementations
712
736
713
-
* z5 - C++ implementation using xtensor
714
-
* Zarr.jl - native Julia implementation
715
-
* @@TODO - Scala implementation
716
-
* WIP: Zarr support in NetCDF C library
717
-
718
-
@@TODO links
737
+
*[z5](https://github.com/constantinpape/z5) - C++ implementation using xtensor
738
+
*[Zarr.jl](https://github.com/meggart/Zarr.jl) - native Julia implementation
<small>(Here's the [underlying data catalog entry](https://github.com/pangeo-data/pangeo-datastore/blob/aa3f12bcc3be9584c1a9071235874c9d6af94a4e/intake-catalogs/atmosphere.yaml#L6).)</small>
See [OME's position regarding file formats](https://blog.openmicroscopy.org/community/file-formats/2019/06/25/formats/).
741
786
742
787
===
743
788
744
789
### Single cell biology
745
790
746
-
@@TODO
791
+
*[Work by Laserson lab](https://github.com/lasersonlab/single-cell-experiments) using Zarr with [ScanPy](https://scanpy.readthedocs.io/en/stable/) and [AnnData](https://icb-anndata.readthedocs-hosted.com/en/stable/index.html) to scale single cell gene expression analyses.
792
+
* The [Human Cell Atlas](https://prod.data.humancellatlas.org/) data portal uses Zarr for [storage of gene expression matrices](https://prod.data.humancellatlas.org/pipelines/hca-pipelines/data-processing-pipelines/file-formats).
793
+
* Use Zarr for image-based transcriptomics ([starfish](https://spacetx-starfish.readthedocs.io/en/latest/))?
0 commit comments