Skip to content

Commit 27f0726

Browse files
joshmoorealimanfoo
andauthored
docs: replace usages of "Dataset" with "Array" (#571)
* hierarchy.py: Add blurb to create_dataset docs * storage.py: replace 'dataset' with 'array' in docs * utils.py: replace 'dataset' with 'array' in docs * tutorial.rst: replace 'dataset' with 'array' in docs * Update release.rst Co-authored-by: Alistair Miles <[email protected]>
1 parent dc5e9fa commit 27f0726

File tree

5 files changed

+28
-15
lines changed

5 files changed

+28
-15
lines changed

docs/release.rst

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,18 @@ Next release
77

88
* Fix minor bug in `N5Store`.
99
By :user:`gsakkis`, :issue:`550`.
10+
1011
* Improve error message in Jupyter when trying to use the ``ipytree`` widget
1112
without ``ipytree`` installed.
12-
By :user:`Zain Patel <mzjp2>; :issue:`537`
13+
By :user:`Zain Patel <mzjp2>`; :issue:`537`
14+
1315
* Explicitly close stores during testing.
1416
By :user:`Elliott Sales de Andrade <QuLogic>`; :issue:`442`
1517

18+
* Improve consistency of terminology regarding arrays and datasets in the
19+
documentation.
20+
By :user:`Josh Moore <joshmoore>`; :issue:`571`.
21+
1622

1723
.. _release_2.4.0:
1824

docs/tutorial.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -863,13 +863,13 @@ Consolidating metadata
863863

864864
Since there is a significant overhead for every connection to a cloud object
865865
store such as S3, the pattern described in the previous section may incur
866-
significant latency while scanning the metadata of the dataset hierarchy, even
866+
significant latency while scanning the metadata of the array hierarchy, even
867867
though each individual metadata object is small. For cases such as these, once
868868
the data are static and can be regarded as read-only, at least for the
869-
metadata/structure of the dataset hierarchy, the many metadata objects can be
869+
metadata/structure of the array hierarchy, the many metadata objects can be
870870
consolidated into a single one via
871871
:func:`zarr.convenience.consolidate_metadata`. Doing this can greatly increase
872-
the speed of reading the dataset metadata, e.g.::
872+
the speed of reading the array metadata, e.g.::
873873

874874
>>> zarr.consolidate_metadata(store) # doctest: +SKIP
875875

@@ -886,7 +886,7 @@ backend storage.
886886

887887
Note that, the hierarchy could still be opened in the normal way and altered,
888888
causing the consolidated metadata to become out of sync with the real state of
889-
the dataset hierarchy. In this case,
889+
the array hierarchy. In this case,
890890
:func:`zarr.convenience.consolidate_metadata` would need to be called again.
891891

892892
To protect against consolidated metadata accidentally getting out of sync, the
@@ -930,8 +930,8 @@ copying a group named 'foo' from an HDF5 file to a Zarr group::
930930
└── baz (100,) int64
931931
>>> source.close()
932932

933-
If rather than copying a single group or dataset you would like to copy all
934-
groups and datasets, use :func:`zarr.convenience.copy_all`, e.g.::
933+
If rather than copying a single group or array you would like to copy all
934+
groups and arrays, use :func:`zarr.convenience.copy_all`, e.g.::
935935

936936
>>> source = h5py.File('data/example.h5', mode='r')
937937
>>> dest = zarr.open_group('data/example2.zarr', mode='w')
@@ -1004,7 +1004,7 @@ String arrays
10041004
There are several options for storing arrays of strings.
10051005

10061006
If your strings are all ASCII strings, and you know the maximum length of the string in
1007-
your dataset, then you can use an array with a fixed-length bytes dtype. E.g.::
1007+
your array, then you can use an array with a fixed-length bytes dtype. E.g.::
10081008

10091009
>>> z = zarr.zeros(10, dtype='S6')
10101010
>>> z

zarr/hierarchy.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -746,6 +746,9 @@ def require_groups(self, *names):
746746
def create_dataset(self, name, **kwargs):
747747
"""Create an array.
748748
749+
Arrays are known as "datasets" in HDF5 terminology. For compatibility
750+
with h5py, Zarr groups also implement the require_dataset() method.
751+
749752
Parameters
750753
----------
751754
name : string
@@ -819,8 +822,12 @@ def _create_dataset_nosync(self, name, data=None, **kwargs):
819822
return a
820823

821824
def require_dataset(self, name, shape, dtype=None, exact=False, **kwargs):
822-
"""Obtain an array, creating if it doesn't exist. Other `kwargs` are
823-
as per :func:`zarr.hierarchy.Group.create_dataset`.
825+
"""Obtain an array, creating if it doesn't exist.
826+
827+
Arrays are known as "datasets" in HDF5 terminology. For compatibility
828+
with h5py, Zarr groups also implement the create_dataset() method.
829+
830+
Other `kwargs` are as per :func:`zarr.hierarchy.Group.create_dataset`.
824831
825832
Parameters
826833
----------

zarr/storage.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1656,7 +1656,7 @@ def __init__(self, path, buffers=True, **kwargs):
16561656
import lmdb
16571657

16581658
# set default memory map size to something larger than the lmdb default, which is
1659-
# very likely to be too small for any moderate dataset (logic copied from zict)
1659+
# very likely to be too small for any moderate array (logic copied from zict)
16601660
map_size = (2**40 if sys.maxsize >= 2**32 else 2**28)
16611661
kwargs.setdefault('map_size', map_size)
16621662

@@ -2464,14 +2464,14 @@ class ConsolidatedMetadataStore(MutableMapping):
24642464
a single key.
24652465
24662466
The purpose of this class, is to be able to get all of the metadata for
2467-
a given dataset in a single read operation from the underlying storage.
2467+
a given array in a single read operation from the underlying storage.
24682468
See :func:`zarr.convenience.consolidate_metadata` for how to create this
24692469
single metadata key.
24702470
24712471
This class loads from the one key, and stores the data in a dict, so that
24722472
accessing the keys no longer requires operations on the backend store.
24732473
2474-
This class is read-only, and attempts to change the dataset metadata will
2474+
This class is read-only, and attempts to change the array metadata will
24752475
fail, but changing the data is possible. If the backend storage is changed
24762476
directly, then the metadata stored here could become obsolete, and
24772477
:func:`zarr.convenience.consolidate_metadata` should be called again and the class
@@ -2484,7 +2484,7 @@ class ConsolidatedMetadataStore(MutableMapping):
24842484
Parameters
24852485
----------
24862486
store: MutableMapping
2487-
Containing the zarr dataset.
2487+
Containing the zarr array.
24882488
metadata_key: str
24892489
The target in the store where all of the metadata are stored. We
24902490
assume JSON encoding.

zarr/util.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ def normalize_shape(shape):
5454

5555
def guess_chunks(shape, typesize):
5656
"""
57-
Guess an appropriate chunk layout for a dataset, given its shape and
57+
Guess an appropriate chunk layout for an array, given its shape and
5858
the size of each element in bytes. Will allocate chunks only as large
5959
as MAX_SIZE. Chunks are generally close to some power-of-2 fraction of
6060
each axis, slightly favoring bigger values for the last index.

0 commit comments

Comments
 (0)