filled in some examples and links

alimanfoo · alimanfoo · commit 231f3eaf6d34 · 2019-05-23T16:21:48.000+01:00
diff --git a/_posts/2019-05-02-zarr-2.3-release.md b/_posts/2019-05-02-zarr-2.3-release.md
@@ -1,63 +1,81 @@
 ---
 layout: post
 title:  "Zarr Python 2.3 release"
-date:   2019-05-02
+date:   2019-05-23
 categories: zarr python release
 ---
 
-Recently we released version 2.3 of the Python Zarr package, which
-implements the Zarr protocol for storing N-dimensional typed arrays,
-and is designed for use in distributed and parallel computing. This
-post provides an overview of new features in this release, and some
-information about future directions for Zarr.
+Recently we released version 2.3 of the [Python Zarr
+package](https://zarr.readthedocs.io/en/stable/), which implements the
+Zarr protocol for storing N-dimensional typed arrays, and is designed
+for use in distributed and parallel computing. This post provides an
+overview of new features in this release, and some information about
+future directions for Zarr.
 
 ## New storage options for distributed computing
 
 A key feature of the Zarr protocol is that the underlying storage
 system is decoupled from other components via a simple key/value
 interface. In Python, this interface corresponds to the
-[`MutableMapping` interface](@@TODO), which is the interface that
-Python dictionaries (`dict`) implement. The simplicity of this
-interface means it is relatively straightforward to add support for a
-range of different storage systems. The 2.3 release adds support for
-storage using [SQLite]( https://zarr.readthedocs.io/en/stable/api/storage.html#zarr.storage.SQLiteStore ), [Redis]( https://zarr.readthedocs.io/en/stable/api/storage.html#zarr.storage.RedisStore ), [MongoDB]( https://zarr.readthedocs.io/en/stable/api/storage.html#zarr.storage.MongoDBStore ) and
-[Azure Blob Storage]( https://zarr.readthedocs.io/en/stable/api/storage.html#zarr.storage.ABSStore ).
-
-For example, here's code that stores an array using Redis:
+[`MutableMapping`
+interface](https://docs.python.org/3/glossary.html#term-mapping),
+which is the interface that Python
+[`dict`](https://docs.python.org/3/library/stdtypes.html#dict)
+implements. The simplicity of this interface means it is relatively
+straightforward to add support for a range of different storage
+systems. The 2.3 release adds support for storage using [SQLite](
+https://zarr.readthedocs.io/en/stable/api/storage.html#zarr.storage.SQLiteStore
+), [Redis](
+https://zarr.readthedocs.io/en/stable/api/storage.html#zarr.storage.RedisStore
+), [MongoDB](
+https://zarr.readthedocs.io/en/stable/api/storage.html#zarr.storage.MongoDBStore
+) and [Azure Blob Storage](
+https://zarr.readthedocs.io/en/stable/api/storage.html#zarr.storage.ABSStore
+).
+
+For example, here's code that creates an array using MongoDB:
 
 {% highlight python %}
-TODO
+import zarr
+store = zarr.MongoDBStore('localhost')
+root = zarr.group(store=store, overwrite=True)
+foo = bar.create_group('foo')
+bar = foo.create_dataset('bar', shape=(10000, 1000), chunks=(1000, 100))
+bar[:] = 42
+store.close()
 {% endhighlight %}
 
-Here's the same example but storing the data in the cloud via Azure
-Blob Storage:
+To do the same thing but storing the data in the cloud via Azure
+Blob Storage, replace the instantiation of the `store` object with:
 
 {% highlight python %}
-TODO
+store = zarr.ABSStore(container='test', account_name='foo', account_key='bar')
 {% endhighlight %}
 
 Support for other cloud object storage storage services was already
-available via other packages, with Amazon S3 supported via the
-[s3fs]( http://s3fs.readthedocs.io/en/latest/ ) package, and Google Cloud Storage supported via the
-[gcsfs]( https://gcsfs.readthedocs.io/en/latest/ ) package.
+available via other packages, with Amazon S3 supported via the [s3fs](
+http://s3fs.readthedocs.io/en/latest/ ) package, and Google Cloud
+Storage supported via the [gcsfs](
+https://gcsfs.readthedocs.io/en/latest/ ) package. Further notes on
+using cloud storage are available from the [Zarr
+tutorial](https://zarr.readthedocs.io/en/stable/tutorial.html#distributed-cloud-storage).
 
 The attraction of cloud storage is that total I/O bandwidth scales
 linearly with the size of a computing cluster, so there are no
 technical limits to the size of the data or computation you can scale
-up to. Here's a slide from a recent presentation from Ryan Abernathy
-showing how I/O scales when using Zarr, and comparing that to reading
-data from a remote data service which does not scale in the same way:
+up to. Here's a slide from a recent presentation by Ryan Abernathey
+showing how I/O scales when using Zarr over Google Cloud Storage:
 
-@@TODO plot
+<script async class="speakerdeck-embed" data-slide="22" data-id="1621118c5987411fb55fdcf503cb331d" data-ratio="1.77777777777778" src="//speakerdeck.com/assets/embed.js"></script>
 
 ## Optimisations for cloud storage: consolidated metadata
 
 One issue with using cloud object storage is that, although total I/O
 throughput can be high, the latency involved in each request to read
-the contents of an object can be around 100 ms, even when reading from
+the contents of an object can be >100 ms, even when reading from
 compute nodes within the same data centre. This latency can add up
-when reading many arrays, because in Zarr each array has its own
-metadata stored in a separate object.
+when reading metadata from many arrays, because in Zarr each array has
+its own metadata stored in a separate object.
 
 To work around this, the 2.3 release adds an experimental feature to
 consolidate metadata for all arrays and groups within a hierarchy into
@@ -66,22 +84,39 @@ this is not suitable for rapidly changing datasets, it can be good for
 large datasets which are relatively static.
 
 To use this feature, two new convenience functions have been
-added. The [`consolidate_metadata()`](@@TODO) function performs the
-initial consolidation, reading all metadata and combining them into a
-single object. Once you have done that and deployed the data to a
-cloud object store, the [`open_consolidated()`] function can be used
-to read data, making use of the consolidated metadata.
-
-Support for this new feature is also now available via the
-[xarray](@@TODO) and [intake](@@TODO), and the Pangeo project is using
-consolidated metadata for all new Zarr datasets. E.g., here's an
-example of how to open a Zarr dataset from Pangeo's data catalog via
-intake:
+added. The
+[`consolidate_metadata()`](https://zarr.readthedocs.io/en/stable/api/convenience.html#zarr.convenience.consolidate_metadata)
+function performs the initial consolidation, reading all metadata and
+combining them into a single object. Once you have done that and
+deployed the data to a cloud object store, the
+[`open_consolidated()`](https://zarr.readthedocs.io/en/stable/api/convenience.html#zarr.convenience.open_consolidated)
+function can be used to read data, making use of the consolidated
+metadata.
+
+Support for the new consolidated metadata feature is also now
+available via
+[xarray](http://xarray.pydata.org/en/stable/generated/xarray.open_zarr.html)
+and
+[intake-xarray](https://intake-xarray.readthedocs.io/en/latest/index.html)
+(see [this blog
+post](https://www.anaconda.com/intake-taking-the-pain-out-of-data-access/)
+for an introduction to intake), and many of the datasets in [Pangeo's
+cloud data catalog](https://pangeo-data.github.io/pangeo-datastore/)
+use Zarr with consolidated metadata.
+
+Here's an example of how to open a Zarr dataset from Pangeo's data
+catalog via intake:
 
 {% highlight python %}
-TODO
+import intake
+cat_url = 'https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml'
+cat = intake.Catalog(cat_url)
+ds = cat.atmosphere.gmet_v1.to_dask()
 {% endhighlight %}
 
+...and [here's the underlying catalog
+entry](https://github.com/pangeo-data/pangeo-datastore/blob/aa3f12bcc3be9584c1a9071235874c9d6af94a4e/intake-catalogs/atmosphere.yaml#L6).
+
 ## Compatibility with N5
 
 Around the same time that development on Zarr was getting started, a