Skip to content

New h5netcdf engine may create files which netCDF can't read #10819

@pp-mo

Description

@pp-mo

What happened?

Shortly after cutting a new release of ncdata, I found I had breaking tests due to the '2025.09.1' release, which only just appeared on conda-forge.

My "test_xarray_load_and_save_equivalence:test_save_direct_vs_viancdata" test is comparing netcdf files created direct from ncdata with data converted to xarray datasets and saved with '.to_netcdf'.

The results from the new engine are different in multiple details, all of which I can fix with "engine='netcdf4'".

However, some of those failures are not simply structural content changes, but the output file is actually unreadable by netcdf.
(both netCDF4.Dataset() and "ncdump" command fail)

What did you expect to happen?

Would expect any output from the 'h5netcdf' engine to at least be compatible with the standard netCDF library.

Minimal Complete Verifiable Example

import datetime
import numpy as np
import xarray as xr
import netCDF4 as nc

t0 = np.array(datetime.datetime(2016,5,16,12), dtype='datetime64[ns]')

ds = xr.Dataset()
ds = ds.assign_coords(xr.Coordinates({'time': t0}))  # add scalar time coord

# Add a data array, including an 'x' coord
ds['a'] = xr.DataArray(np.zeros(5), coords=xr.Coordinates({'x': np.arange(5)}))

# Set the 'x' dim to be unlimited
ds.encoding['unlimited_dims'] = ['x']

# Save
ds.to_netcdf('tmp.nc')

# Attempt reload with netCDF4
ncds = nc.Dataset('tmp.nc')

#
# NOTE: typical error result
#
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "src/netCDF4/_netCDF4.pyx", line 2521, in netCDF4._netCDF4.Dataset.__init__
#   File "src/netCDF4/_netCDF4.pyx", line 2158, in netCDF4._netCDF4._ensure_nc_success
# OSError: [Errno -51] NetCDF: Unknown file format: 'tmp.nc'

Steps to reproduce

As code above.

The necessary dataset attributes have been deduced from a dataset loaded from a weather-data file.
I had multiple files from the iris test data which fail this way, when re-saved. E.G. notably this one

  • It seems to be essential that we have a 'normal' dimension, which is controlled to save as 'unlimited'.
  • I'm not certain that the 'a' variable needs to be there. But I used this to ensure that there is an 'x' dimension.
  • I'm also not sure if the time coord must be scalar for the problem to appear.

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Anything else we need to know?

Since I am loading data from netcdf files and re-saving the results should ideally be the same as the original, though there are multiple caveats to that.
However, the whole behaviour of the new engine seems not just incompatible with previous behaviour, but produces output much more different from the original file.
I'm not going to raise that as a separate issue just now, as I've run out of time to investigate further. But it does seem to me to be a potentially serious issue from a number of angles. Cf #10657 (comment)

Environment

partial "conda list"
package version source
numpy 2.3.2 py312h33ff503_0 conda-forge
libnetcdf 4.9.2 nompi_h0134ee8_117 conda-forge
netcdf4 1.7.2 nompi_py312h3805cb1_102 conda-forge
hdf5 1.14.6 nompi_h6e4c0c1_103 conda-forge
xarray 2025.9.1 pyhd8ed1ab_0 conda-forge

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions