-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
Calling .values on a DataArray that has a _FillValue attribute causes a segmentation fault on Python 3.14.2 on Linux. The crash occurs in _apply_mask() (xarray/coding/variables.py) during the CF decoding step that replaces _FillValue entries with NaN.
The same code, same file, same machine works perfectly on Python 3.12 and Python 3.13.
We initially discovered this while reading Sentinel-3 OLCI satellite data (4091×4865 float32 arrays). Sub-sampled reads (e.g. var[::10, ::10].values) worked fine, but full-size reads crashed. Since small arrays didn't crash, we wrote a binary search script to find the exact threshold with synthetic data. The result: the crash occurs at exactly 512×512 (262,144 = 2^18 elements), which suggests a numpy internal dispatch threshold perhaps.
Binary search output
Python: 3.14.2 [Clang 21.1.4 ]
Testing array sizes to find segfault threshold...
2x2 ( 4 elements, 0.0 MB) OK
10x10 ( 100 elements, 0.0 MB) OK
50x50 ( 2,500 elements, 0.0 MB) OK
100x100 ( 10,000 elements, 0.0 MB) OK
200x200 ( 40,000 elements, 0.2 MB) OK
500x500 ( 250,000 elements, 1.0 MB) OK
750x750 ( 562,500 elements, 2.1 MB) SEGFAULT
--- Binary search between 500 and 750 ---
625x625 ( 390,625 elements, 1.5 MB) SEGFAULT
562x562 ( 315,844 elements, 1.2 MB) SEGFAULT
531x531 ( 281,961 elements, 1.1 MB) SEGFAULT
515x515 ( 265,225 elements, 1.0 MB) SEGFAULT
507x507 ( 257,049 elements, 1.0 MB) OK
511x511 ( 261,121 elements, 1.0 MB) OK
513x513 ( 263,169 elements, 1.0 MB) SEGFAULT
512x512 ( 262,144 elements, 1.0 MB) SEGFAULT
Threshold: crashes at 512x512 (262,144 elements, 1.0 MB)
Last OK: 511x511 (261,121 elements, 1.0 MB)
The threshold at 2^18 elements is a power of 2, this could suggest that this hits an internal numpy buffer/dispatch boundary (SIMD strategy, ufunc buffer size, or similar). This is likely a numpy bug on Python 3.14 that xarray exposes through _apply_mask → np.where.
$ python -X faulthandler test_crash.py
Fatal Python error: Segmentation fault
Current thread 0x0000793fc3d60740 [python] (most recent call first):
File ".../xarray/coding/variables.py", line 132 in _apply_mask
File ".../xarray/coding/common.py", line 80 in get_duck_array
File ".../xarray/coding/common.py", line 80 in get_duck_array
File ".../xarray/core/indexing.py", line 924 in get_duck_array
File ".../xarray/core/indexing.py", line 970 in get_duck_array
File ".../xarray/core/indexing.py", line 604 in __array__
File ".../xarray/core/variable.py", line 336 in _as_array_or_item
File ".../xarray/core/variable.py", line 556 in values
File ".../xarray/core/dataarray.py", line 798 in values
File ".../test_crash.py", line 12 in main
The crash path is: DataArray.values -> Variable.values -> _as_array_or_item() -> np.asarray() -> _ElementwiseFunctionArray.get_duck_array() -> _apply_mask() -> np.where() segfaults.
_apply_mask is wired up during CFMaskCoder.decode() via functools.partial and lazy_elemwise_func whenever the variable has _FillValue or missing_value attributes. The masking is deferred until .values is accessed, at which point np.where(condition, decoded_fill_value, data) is called on the full array - and this is where it crashes on Python 3.14.
Cross-references
This is most likely a numpy bug exposed through xarray. A parallel issue should be opened on numpy. Related: numpy#28197 (segfault on free-threaded build).
What did you expect to happen?
.values should return a numpy array with _FillValue entries replaced by NaN, without crashing.
Minimal Complete Verifiable Example
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "xarray[complete]@git+https://github.com/pydata/xarray.git@main",
# ]
# ///
#
# This script automatically imports the development branch of xarray to check for issues.
# Please delete this header if you have _not_ tested this script with `uv run`!
import xarray as xr
xr.show_versions()
# your reproducer code ...
import numpy
size = 512 # 511 works, 512 segfaults
data = numpy.random.rand(size, size).astype(numpy.float32)
data[0:5, 0:5] = 65534.0
da = xr.DataArray(data, dims=["rows", "columns"])
da.encoding["_FillValue"] = numpy.float32(65534.0)
da.to_netcdf("/tmp/test_fill.nc")
ds = xr.open_dataset("/tmp/test_fill.nc")
var = ds["__xarray_dataarray_variable__"]
print(var.values) # segfaults on Python 3.14 LinuxSteps to reproduce
- Use Python 3.14.2 on Linux (tested on Ubuntu 24.04)
- Install xarray and h5netcdf (or netCDF4) via pip or uv
- Run the MVCE above
- Observe: 511×511 arrays work, 512×512 arrays segfault
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
python -X faulthandler -c "
import numpy, xarray
data = numpy.random.rand(512, 512).astype(numpy.float32)
data[0:5, 0:5] = 65534.0
da = xarray.DataArray(data, dims=['rows', 'columns'])
da.encoding['_FillValue'] = numpy.float32(65534.0)
da.to_netcdf('/tmp/test.nc')
ds = xarray.open_dataset('/tmp/test.nc')
print(ds['__xarray_dataarray_variable__'].values)
"
Fatal Python error: Segmentation fault
Current thread 0x0000789b39442740 [python] (most recent call first):
File "/mnt/d/poly/dask_tests/.venv314/lib/python3.14/site-packages/xarray/coding/variables.py", line 132 in _apply_mask
File "/mnt/d/poly/dask_tests/.venv314/lib/python3.14/site-packages/xarray/coding/common.py", line 80 in get_duck_array
File "/mnt/d/poly/dask_tests/.venv314/lib/python3.14/site-packages/xarray/core/indexing.py", line 924 in get_duck_array
File "/mnt/d/poly/dask_tests/.venv314/lib/python3.14/site-packages/xarray/core/indexing.py", line 970 in get_duck_array
File "/mnt/d/poly/dask_tests/.venv314/lib/python3.14/site-packages/xarray/core/indexing.py", line 604 in __array__
File "/mnt/d/poly/dask_tests/.venv314/lib/python3.14/site-packages/xarray/core/variable.py", line 336 in _as_array_or_item
File "/mnt/d/poly/dask_tests/.venv314/lib/python3.14/site-packages/xarray/core/variable.py", line 556 in values
File "/mnt/d/poly/dask_tests/.venv314/lib/python3.14/site-packages/xarray/core/dataarray.py", line 798 in values
File "<string>", line 9 in <module>
Current thread's C stack trace (most recent call first):
[1] 106458 segmentation fault (core dumped)Anything else we need to know?
The 512×512 threshold (2^18 elements) is almost certainly a numpy internal boundary, likely NPY_BUFSIZE, a SIMD dispatch threshold, or a ufunc buffer size. The crash is in np.where(condition, scalar, large_array) called from _apply_mask. We suspect this is ultimately a numpy bug on the Python 3.14 + Linux + Clang 21 combination, but we're reporting here since xarray is where it surfaces.
It may be worth testing if a bare np.where(arr > 0.5, np.nan, arr) on a 512×512 float32 array also segfaults on Python 3.14 Linux, if so, the issue should be filed on numpy directly.