Skip to content

Segfault in _apply_mask / np.where when materializing DataArray with _FillValue on Python 3.14 (Linux) - threshold at 512×512 elements #11205

@polymood

Description

@polymood

What happened?

Calling .values on a DataArray that has a _FillValue attribute causes a segmentation fault on Python 3.14.2 on Linux. The crash occurs in _apply_mask() (xarray/coding/variables.py) during the CF decoding step that replaces _FillValue entries with NaN.

The same code, same file, same machine works perfectly on Python 3.12 and Python 3.13.

We initially discovered this while reading Sentinel-3 OLCI satellite data (4091×4865 float32 arrays). Sub-sampled reads (e.g. var[::10, ::10].values) worked fine, but full-size reads crashed. Since small arrays didn't crash, we wrote a binary search script to find the exact threshold with synthetic data. The result: the crash occurs at exactly 512×512 (262,144 = 2^18 elements), which suggests a numpy internal dispatch threshold perhaps.

Binary search output

Python: 3.14.2 [Clang 21.1.4 ]
Testing array sizes to find segfault threshold...

      2x2      (           4 elements,      0.0 MB)  OK
     10x10     (         100 elements,      0.0 MB)  OK
     50x50     (       2,500 elements,      0.0 MB)  OK
    100x100    (      10,000 elements,      0.0 MB)  OK
    200x200    (      40,000 elements,      0.2 MB)  OK
    500x500    (     250,000 elements,      1.0 MB)  OK
    750x750    (     562,500 elements,      2.1 MB)  SEGFAULT

--- Binary search between 500 and 750 ---

    625x625    (     390,625 elements,      1.5 MB)  SEGFAULT
    562x562    (     315,844 elements,      1.2 MB)  SEGFAULT
    531x531    (     281,961 elements,      1.1 MB)  SEGFAULT
    515x515    (     265,225 elements,      1.0 MB)  SEGFAULT
    507x507    (     257,049 elements,      1.0 MB)  OK
    511x511    (     261,121 elements,      1.0 MB)  OK
    513x513    (     263,169 elements,      1.0 MB)  SEGFAULT
    512x512    (     262,144 elements,      1.0 MB)  SEGFAULT

Threshold: crashes at 512x512 (262,144 elements, 1.0 MB)
Last OK:   511x511 (261,121 elements, 1.0 MB)

The threshold at 2^18 elements is a power of 2, this could suggest that this hits an internal numpy buffer/dispatch boundary (SIMD strategy, ufunc buffer size, or similar). This is likely a numpy bug on Python 3.14 that xarray exposes through _apply_masknp.where.

$ python -X faulthandler test_crash.py

Fatal Python error: Segmentation fault
Current thread 0x0000793fc3d60740 [python] (most recent call first):
  File ".../xarray/coding/variables.py", line 132 in _apply_mask
  File ".../xarray/coding/common.py", line 80 in get_duck_array
  File ".../xarray/coding/common.py", line 80 in get_duck_array
  File ".../xarray/core/indexing.py", line 924 in get_duck_array
  File ".../xarray/core/indexing.py", line 970 in get_duck_array
  File ".../xarray/core/indexing.py", line 604 in __array__
  File ".../xarray/core/variable.py", line 336 in _as_array_or_item
  File ".../xarray/core/variable.py", line 556 in values
  File ".../xarray/core/dataarray.py", line 798 in values
  File ".../test_crash.py", line 12 in main

The crash path is: DataArray.values -> Variable.values -> _as_array_or_item() -> np.asarray() -> _ElementwiseFunctionArray.get_duck_array() -> _apply_mask() -> np.where() segfaults.

_apply_mask is wired up during CFMaskCoder.decode() via functools.partial and lazy_elemwise_func whenever the variable has _FillValue or missing_value attributes. The masking is deferred until .values is accessed, at which point np.where(condition, decoded_fill_value, data) is called on the full array - and this is where it crashes on Python 3.14.

Cross-references

This is most likely a numpy bug exposed through xarray. A parallel issue should be opened on numpy. Related: numpy#28197 (segfault on free-threaded build).

What did you expect to happen?

.values should return a numpy array with _FillValue entries replaced by NaN, without crashing.

Minimal Complete Verifiable Example

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "xarray[complete]@git+https://github.com/pydata/xarray.git@main",
# ]
# ///
#
# This script automatically imports the development branch of xarray to check for issues.
# Please delete this header if you have _not_ tested this script with `uv run`!

import xarray as xr
xr.show_versions()
# your reproducer code ...

import numpy

size = 512  # 511 works, 512 segfaults
data = numpy.random.rand(size, size).astype(numpy.float32)
data[0:5, 0:5] = 65534.0

da = xr.DataArray(data, dims=["rows", "columns"])
da.encoding["_FillValue"] = numpy.float32(65534.0)
da.to_netcdf("/tmp/test_fill.nc")

ds = xr.open_dataset("/tmp/test_fill.nc")
var = ds["__xarray_dataarray_variable__"]
print(var.values)  # segfaults on Python 3.14 Linux

Steps to reproduce

  1. Use Python 3.14.2 on Linux (tested on Ubuntu 24.04)
  2. Install xarray and h5netcdf (or netCDF4) via pip or uv
  3. Run the MVCE above
  4. Observe: 511×511 arrays work, 512×512 arrays segfault

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

python -X faulthandler -c "
import numpy, xarray
data = numpy.random.rand(512, 512).astype(numpy.float32)
data[0:5, 0:5] = 65534.0
da = xarray.DataArray(data, dims=['rows', 'columns'])
da.encoding['_FillValue'] = numpy.float32(65534.0)
da.to_netcdf('/tmp/test.nc')
ds = xarray.open_dataset('/tmp/test.nc')
print(ds['__xarray_dataarray_variable__'].values)
"

Fatal Python error: Segmentation fault

Current thread 0x0000789b39442740 [python] (most recent call first):
  File "/mnt/d/poly/dask_tests/.venv314/lib/python3.14/site-packages/xarray/coding/variables.py", line 132 in _apply_mask
  File "/mnt/d/poly/dask_tests/.venv314/lib/python3.14/site-packages/xarray/coding/common.py", line 80 in get_duck_array
  File "/mnt/d/poly/dask_tests/.venv314/lib/python3.14/site-packages/xarray/core/indexing.py", line 924 in get_duck_array
  File "/mnt/d/poly/dask_tests/.venv314/lib/python3.14/site-packages/xarray/core/indexing.py", line 970 in get_duck_array
  File "/mnt/d/poly/dask_tests/.venv314/lib/python3.14/site-packages/xarray/core/indexing.py", line 604 in __array__
  File "/mnt/d/poly/dask_tests/.venv314/lib/python3.14/site-packages/xarray/core/variable.py", line 336 in _as_array_or_item
  File "/mnt/d/poly/dask_tests/.venv314/lib/python3.14/site-packages/xarray/core/variable.py", line 556 in values
  File "/mnt/d/poly/dask_tests/.venv314/lib/python3.14/site-packages/xarray/core/dataarray.py", line 798 in values
  File "<string>", line 9 in <module>

Current thread's C stack trace (most recent call first):
[1]    106458 segmentation fault (core dumped)

Anything else we need to know?

The 512×512 threshold (2^18 elements) is almost certainly a numpy internal boundary, likely NPY_BUFSIZE, a SIMD dispatch threshold, or a ufunc buffer size. The crash is in np.where(condition, scalar, large_array) called from _apply_mask. We suspect this is ultimately a numpy bug on the Python 3.14 + Linux + Clang 21 combination, but we're reporting here since xarray is where it surfaces.
It may be worth testing if a bare np.where(arr > 0.5, np.nan, arr) on a 512×512 float32 array also segfaults on Python 3.14 Linux, if so, the issue should be filed on numpy directly.

Environment

Details python: 3.14.2 (main, Jan 27 2026, 23:59:57) [Clang 21.1.4] OS: Ubuntu 24.04 LTS xarray==2026.2.0 numpy==2.4.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugneeds triageIssue that has not been reviewed by xarray team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions