-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
If you create a dataset with xr.concat along a new dimension of strings, e.g.
ds0 = xr.Dataset(data_vars={'dat':('x',np.arange(3))},
coords={'x': np.arange(3)})
ds1 = xr.Dataset(data_vars={'dat':('x',np.arange(3)*10)},
coords={'x': np.arange(3)})
ds = xr.concat([ds0,ds1],dim=pd.Index(['run0','run1'],name='run'))
the dataset is successfully created, but raises a ValueError if written to netCDF:
ds.to_netcdf('test.nc')
> ValueError: unsupported dtype for netCDF4 variable: StringDType(na_object=nan)
However, the netCDF file is actually created and can be opened without issue. This makes me think that the ValueError is spurious. The issue seems related to those in #10553 but don't seem to have been fixed by #11152.
I've tested this with fresh environments using xarray v2026.2.0 and pandas 3.0.1, or xarray 2026.1.0 and pandas 3.0.0, and get the same error in both. But this kind of concatenation used to work for me in older environments.
What did you expect to happen?
Expected result: the dataset writes to netCDF without complaining.
Minimal Complete Verifiable Example
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "xarray[complete]@git+https://github.com/pydata/xarray.git@main",
# ]
# ///
#
# This script automatically imports the development branch of xarray to check for issues.
# Please delete this header if you have _not_ tested this script with `uv run`!
import xarray as xr
import pandas as pd
import numpy as np
ds0 = xr.Dataset(data_vars={'dat':('x',np.arange(3))},
coords={'x': np.arange(3)})
ds1 = xr.Dataset(data_vars={'dat':('x',np.arange(3)*10)},
coords={'x': np.arange(3)})
runlist = np.array(['run0','run1'])
ds = xr.concat([ds0,ds1],dim=pd.Index(runlist,name='run'))
# this step raises a ValueError
ds.to_netcdf('test.nc')
# if you comment out the above lines and just open the file,
# it is present and the contents look as expected
ds = xr.open_dataset('test.nc')
print(ds)Steps to reproduce
No response
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Traceback (most recent call last):
File "/fs/homeu3/eccc/crd/cccma/rud001/scrap.py", line 20, in <module>
ds.to_netcdf('test.nc')
~~~~~~~~~~~~^^^^^^^^^^^
File "/home/ords/crd/cccma/rud001/envs/invproc/lib/python3.14/site-packages/xarray/core/dataset.py", line 2123, in to_netcdf
return to_netcdf( # type: ignore[return-value] # mypy cannot resolve the overloads:(
self,
...<10 lines>...
auto_complex=auto_complex,
)
File "/home/ords/crd/cccma/rud001/envs/invproc/lib/python3.14/site-packages/xarray/backends/writers.py", line 441, in to_netcdf
dump_to_store(
~~~~~~~~~~~~~^
dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/ords/crd/cccma/rud001/envs/invproc/lib/python3.14/site-packages/xarray/backends/writers.py", line 491, in dump_to_store
store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ords/crd/cccma/rud001/envs/invproc/lib/python3.14/site-packages/xarray/backends/common.py", line 533, in store
self.set_variables(
~~~~~~~~~~~~~~~~~~^
variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/ords/crd/cccma/rud001/envs/invproc/lib/python3.14/site-packages/xarray/backends/common.py", line 571, in set_variables
target, source = self.prepare_variable(
~~~~~~~~~~~~~~~~~~~~~^
name, v, check, unlimited_dims=unlimited_dims
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/ords/crd/cccma/rud001/envs/invproc/lib/python3.14/site-packages/xarray/backends/netCDF4_.py", line 633, in prepare_variable
datatype = _get_datatype(
variable, self.format, raise_on_invalid_encoding=check_encoding
)
File "/home/ords/crd/cccma/rud001/envs/invproc/lib/python3.14/site-packages/xarray/backends/netCDF4_.py", line 165, in _get_datatype
return _nc4_dtype(var)
File "/home/ords/crd/cccma/rud001/envs/invproc/lib/python3.14/site-packages/xarray/backends/netCDF4_.py", line 186, in _nc4_dtype
raise ValueError(f"unsupported dtype for netCDF4 variable: {var.dtype}")
ValueError: unsupported dtype for netCDF4 variable: StringDType(na_object=nan)Anything else we need to know?
No response
Environment
Details
INSTALLED VERSIONS
commit: None
python: 3.14.3 | packaged by conda-forge | (main, Feb 9 2026, 21:56:02) [GCC 14.3.0]
python-bits: 64
OS: Linux
OS-release: 5.14.0-427.100.1.el9_4.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.6
libnetcdf: 4.9.3
xarray: 2026.2.0
pandas: 3.0.1
numpy: 2.3.5
scipy: 1.17.0
netCDF4: 1.7.4
pydap: 3.5.8
h5netcdf: 1.8.1
h5py: 3.15.1
zarr: 3.1.5
cftime: 1.6.5
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.6.0
dask: 2026.1.2
distributed: 2026.1.2
matplotlib: 3.10.8
cartopy: 0.25.0
seaborn: 0.13.2
numbagg: 0.9.4
fsspec: 2026.2.0
cupy: None
pint: None
sparse: 0.18.0
flox: 0.11.1
numpy_groupies: 0.11.3
setuptools: None
pip: 26.0.1
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None