Skip to content

Commit 569a4da

Browse files
gerritholldcherianmathause
authored
Handle scale_factor and add_offset as scalar (#4485)
* Handle scale_factor and add_offset as scalar The h5netcdf engine exposes single-valued attributes as arrays of shape (1,), which is correct according to the NetCDF standard, but may cause a problem when reading a value of shape () before the scale_factor and add_offset have been applied. This PR adds a check for the dimensionality of add_offset and scale_factor and ensures they are scalar before they are used for further processing, adds a unit test to verify that this works correctly, and a note to the documentation to warn users of this difference between the h5netcdf and netcdf4 engines. Fixes #4471. * DOC: Add whats-new entry for fixing 4471 Add a whats-new entry for the fix to issue #4471, corresponding to PR #4485. * Update doc/io.rst Co-authored-by: Mathias Hauser <[email protected]> Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Mathias Hauser <[email protected]>
1 parent 2f96fad commit 569a4da

File tree

4 files changed

+33
-0
lines changed

4 files changed

+33
-0
lines changed

doc/io.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,12 @@ Dataset and DataArray objects, and no array values are loaded into memory until
105105
you try to perform some sort of actual computation. For an example of how these
106106
lazy arrays work, see the OPeNDAP section below.
107107

108+
There may be minor differences in the :py:class:`Dataset` object returned
109+
when reading a NetCDF file with different engines. For example,
110+
single-valued attributes are returned as scalars by the default
111+
``engine=netcdf4``, but as arrays of size ``(1,)`` when reading with
112+
``engine=h5netcdf``.
113+
108114
It is important to note that when you modify values of a Dataset, even one
109115
linked to files on disk, only the in-memory copy you are manipulating in xarray
110116
is modified: the original file on disk is never touched.

doc/whats-new.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ New Features
3434
Bug fixes
3535
~~~~~~~~~
3636

37+
- Fix bug where reading a scalar value from a NetCDF file opened with the ``h5netcdf`` backend would raise a ``ValueError`` when ``decode_cf=True`` (:issue:`4471`, :pull:`4485`).
38+
By `Gerrit Holl <https://github.com/gerritholl>`_.
3739
- Fix bug where datetime64 times are silently changed to incorrect values if they are outside the valid date range for ns precision when provided in some other units (:issue:`4427`, :pull:`4454`).
3840
By `Andrew Pauling <https://github.com/andrewpauling>`_
3941
- Fix silently overwriting the `engine` key when passing :py:func:`open_dataset` a file object

xarray/coding/variables.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -269,6 +269,10 @@ def decode(self, variable, name=None):
269269
scale_factor = pop_to(attrs, encoding, "scale_factor", name=name)
270270
add_offset = pop_to(attrs, encoding, "add_offset", name=name)
271271
dtype = _choose_float_dtype(data.dtype, "add_offset" in attrs)
272+
if np.ndim(scale_factor) > 0:
273+
scale_factor = scale_factor.item()
274+
if np.ndim(add_offset) > 0:
275+
add_offset = add_offset.item()
272276
transform = partial(
273277
_scale_offset_decoding,
274278
scale_factor=scale_factor,

xarray/tests/test_backends.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4670,3 +4670,24 @@ def test_extract_zarr_variable_encoding():
46704670
actual = backends.zarr.extract_zarr_variable_encoding(
46714671
var, raise_on_invalid=True
46724672
)
4673+
4674+
4675+
@requires_h5netcdf
4676+
def test_load_single_value_h5netcdf(tmp_path):
4677+
"""Test that numeric single-element vector attributes are handled fine.
4678+
4679+
At present (h5netcdf v0.8.1), the h5netcdf exposes single-valued numeric variable
4680+
attributes as arrays of length 1, as oppesed to scalars for the NetCDF4
4681+
backend. This was leading to a ValueError upon loading a single value from
4682+
a file, see #4471. Test that loading causes no failure.
4683+
"""
4684+
ds = xr.Dataset(
4685+
{
4686+
"test": xr.DataArray(
4687+
np.array([0]), dims=("x",), attrs={"scale_factor": 1, "add_offset": 0}
4688+
)
4689+
}
4690+
)
4691+
ds.to_netcdf(tmp_path / "test.nc")
4692+
with xr.open_dataset(tmp_path / "test.nc", engine="h5netcdf") as ds2:
4693+
ds2["test"][0].load()

0 commit comments

Comments
 (0)