docs: replace usages of "Dataset" with "Array" (#571)

joshmoore · alimanfoo · web-flow · commit 27f07262ade3 · 2020-06-26T14:19:34.000+01:00
* hierarchy.py: Add blurb to create_dataset docs

* storage.py: replace 'dataset' with 'array' in docs

* utils.py: replace 'dataset' with 'array' in docs

* tutorial.rst: replace 'dataset' with 'array' in docs

* Update release.rst

Co-authored-by: Alistair Miles &lt;alimanfoo@googlemail.com&gt;
diff --git a/docs/release.rst b/docs/release.rst
@@ -7,12 +7,18 @@ Next release
 
 * Fix minor bug in `N5Store`. 
   By :user:`gsakkis`, :issue:`550`.
+
 * Improve error message in Jupyter when trying to use the ``ipytree`` widget
   without ``ipytree`` installed.
-  By :user:`Zain Patel <mzjp2>; :issue:`537`
+  By :user:`Zain Patel <mzjp2>`; :issue:`537`
+
 * Explicitly close stores during testing.
   By :user:`Elliott Sales de Andrade <QuLogic>`; :issue:`442`
 
+* Improve consistency of terminology regarding arrays and datasets in the 
+  documentation.
+  By :user:`Josh Moore <joshmoore>`; :issue:`571`.
+
 
 .. _release_2.4.0:
 
diff --git a/docs/tutorial.rst b/docs/tutorial.rst
@@ -863,13 +863,13 @@ Consolidating metadata
 
 Since there is a significant overhead for every connection to a cloud object
 store such as S3, the pattern described in the previous section may incur
-significant latency while scanning the metadata of the dataset hierarchy, even
+significant latency while scanning the metadata of the array hierarchy, even
 though each individual metadata object is small.  For cases such as these, once
 the data are static and can be regarded as read-only, at least for the
-metadata/structure of the dataset hierarchy, the many metadata objects can be
+metadata/structure of the array hierarchy, the many metadata objects can be
 consolidated into a single one via
 :func:`zarr.convenience.consolidate_metadata`. Doing this can greatly increase
-the speed of reading the dataset metadata, e.g.::
+the speed of reading the array metadata, e.g.::
 
    >>> zarr.consolidate_metadata(store)  # doctest: +SKIP
 
@@ -886,7 +886,7 @@ backend storage.
 
 Note that, the hierarchy could still be opened in the normal way and altered,
 causing the consolidated metadata to become out of sync with the real state of
-the dataset hierarchy. In this case,
+the array hierarchy. In this case,
 :func:`zarr.convenience.consolidate_metadata` would need to be called again.
 
 To protect against consolidated metadata accidentally getting out of sync, the
@@ -930,8 +930,8 @@ copying a group named 'foo' from an HDF5 file to a Zarr group::
              └── baz (100,) int64
     >>> source.close()
 
-If rather than copying a single group or dataset you would like to copy all
-groups and datasets, use :func:`zarr.convenience.copy_all`, e.g.::
+If rather than copying a single group or array you would like to copy all
+groups and arrays, use :func:`zarr.convenience.copy_all`, e.g.::
 
     >>> source = h5py.File('data/example.h5', mode='r')
     >>> dest = zarr.open_group('data/example2.zarr', mode='w')
@@ -1004,7 +1004,7 @@ String arrays
 There are several options for storing arrays of strings.
 
 If your strings are all ASCII strings, and you know the maximum length of the string in
-your dataset, then you can use an array with a fixed-length bytes dtype. E.g.::
+your array, then you can use an array with a fixed-length bytes dtype. E.g.::
 
     >>> z = zarr.zeros(10, dtype='S6')
     >>> z
diff --git a/zarr/hierarchy.py b/zarr/hierarchy.py
@@ -746,6 +746,9 @@ def require_groups(self, *names):
     def create_dataset(self, name, **kwargs):
         """Create an array.
 
+        Arrays are known as "datasets" in HDF5 terminology. For compatibility
+        with h5py, Zarr groups also implement the require_dataset() method.
+
         Parameters
         ----------
         name : string
@@ -819,8 +822,12 @@ def _create_dataset_nosync(self, name, data=None, **kwargs):
         return a
 
     def require_dataset(self, name, shape, dtype=None, exact=False, **kwargs):
-        """Obtain an array, creating if it doesn't exist. Other `kwargs` are
-        as per :func:`zarr.hierarchy.Group.create_dataset`.
+        """Obtain an array, creating if it doesn't exist.
+
+        Arrays are known as "datasets" in HDF5 terminology. For compatibility
+        with h5py, Zarr groups also implement the create_dataset() method.
+
+        Other `kwargs` are as per :func:`zarr.hierarchy.Group.create_dataset`.
 
         Parameters
         ----------
diff --git a/zarr/storage.py b/zarr/storage.py
@@ -1656,7 +1656,7 @@ def __init__(self, path, buffers=True, **kwargs):
         import lmdb
 
         # set default memory map size to something larger than the lmdb default, which is
-        # very likely to be too small for any moderate dataset (logic copied from zict)
+        # very likely to be too small for any moderate array (logic copied from zict)
         map_size = (2**40 if sys.maxsize >= 2**32 else 2**28)
         kwargs.setdefault('map_size', map_size)
 
@@ -2464,14 +2464,14 @@ class ConsolidatedMetadataStore(MutableMapping):
     a single key.
 
     The purpose of this class, is to be able to get all of the metadata for
-    a given dataset in a single read operation from the underlying storage.
+    a given array in a single read operation from the underlying storage.
     See :func:`zarr.convenience.consolidate_metadata` for how to create this
     single metadata key.
 
     This class loads from the one key, and stores the data in a dict, so that
     accessing the keys no longer requires operations on the backend store.
 
-    This class is read-only, and attempts to change the dataset metadata will
+    This class is read-only, and attempts to change the array metadata will
     fail, but changing the data is possible. If the backend storage is changed
     directly, then the metadata stored here could become obsolete, and
     :func:`zarr.convenience.consolidate_metadata` should be called again and the class
@@ -2484,7 +2484,7 @@ class ConsolidatedMetadataStore(MutableMapping):
     Parameters
     ----------
     store: MutableMapping
-        Containing the zarr dataset.
+        Containing the zarr array.
     metadata_key: str
         The target in the store where all of the metadata are stored. We
         assume JSON encoding.
diff --git a/zarr/util.py b/zarr/util.py
@@ -54,7 +54,7 @@ def normalize_shape(shape):
 
 def guess_chunks(shape, typesize):
     """
-    Guess an appropriate chunk layout for a dataset, given its shape and
+    Guess an appropriate chunk layout for an array, given its shape and
     the size of each element in bytes.  Will allocate chunks only as large
     as MAX_SIZE.  Chunks are generally close to some power-of-2 fraction of
     each axis, slightly favoring bigger values for the last index.