Skip to content

Fix RecursionError in spatial aggregation from deepcopy of Shapely-containing xarray datasets#568

Open
Copilot wants to merge 3 commits intodevelopfrom
copilot/fix-recursion-error-spatial-aggregation
Open

Fix RecursionError in spatial aggregation from deepcopy of Shapely-containing xarray datasets#568
Copilot wants to merge 3 commits intodevelopfrom
copilot/fix-recursion-error-spatial-aggregation

Conversation

Copy link

Copilot AI commented Feb 23, 2026

In newer xarray versions, __deepcopy__ iterates more deeply through object-dtype arrays, causing a RecursionError when Shapely geometry objects are present. This broke all 13 spatial aggregation tests.

The root cause was an unnecessary blanket deepcopy(xarray_datasets) in aggregate_based_on_sub_to_sup_region_id_dict — wasteful because "Geometry" and all "Input" component datasets are fully replaced downstream anyway.

Changes

  • fine/aggregations/spatialAggregation/aggregation.py
    • Replace deepcopy(xarray_datasets) with a targeted construction of aggregated_xr_dataset:
      • "Parameters" — shallow xarray .copy() + deepcopy of .attrs only (the only part with mutable objects mutated in-place: set, pd.DataFrame)
      • "Input" — shallow dict copy (all component datasets are fully replaced in the loop below)
      • "Geometry" — omitted entirely (assigned fresh from aggregate_geometries() on the next line)
# Before
aggregated_xr_dataset = deepcopy(xarray_datasets)

# After
parameters_copy = xarray_datasets.get("Parameters").copy()
parameters_copy.attrs = deepcopy(xarray_datasets.get("Parameters").attrs)
aggregated_xr_dataset = {
    "Parameters": parameters_copy,
    "Input": {
        comp_class: dict(comp_dict)
        for comp_class, comp_dict in xarray_datasets.get("Input").items()
    },
}

Shapely geometries are immutable by design — deep-copying them has never been necessary.

Original prompt

Fix RecursionError in spatial aggregation caused by deepcopy on xarray datasets containing Shapely geometry objects

Problem

All 13 spatial aggregation tests fail with RecursionError: maximum recursion depth exceeded when using newer versions of xarray (the tests pass with xarray 2024.9.0 but fail with later versions).

The crash originates in fine/aggregations/spatialAggregation/aggregation.py, line 439:

aggregated_xr_dataset = deepcopy(xarray_datasets)

The xarray_datasets dict contains a "Geometry" key (added in manager.py via xr_datasets["Geometry"] = geom_xr). This geom_xr xarray Dataset holds Shapely geometry objects and centroids stored as object-dtype arrays. In newer xarray versions, __deepcopy__ iterates more deeply through object-dtype arrays, triggering Python's recursion limit on Shapely's complex internal structure.

Root Cause Analysis

Examining aggregate_based_on_sub_to_sup_region_id_dict, the deepcopy of the full xarray_datasets is wasteful and broken:

  • "Geometry" — fully replaced by aggregate_geometries() (which builds brand-new Shapely objects via unary_union). The deep copy of its contents is entirely wasted.
  • "Input" datasets — every component's dataset is fully replaced by a freshly-built xr.Dataset(). The deep copy is entirely wasted.
  • "Parameters".attrs — this actually needs a deep copy: it contains mutable Python objects (set, pd.DataFrame, etc.) that are modified in-place. It contains no Shapely objects and is safe to deepcopy.

Shapely geometry objects are immutable by design — every operation returns a new object. They never need to be deep-copied.

Fix

Replace the blanket deepcopy(xarray_datasets) with a targeted approach:

  1. Deep-copy only "Parameters".attrs — the only part that is mutated in-place and needs isolation from the original.
  2. Do not copy "Geometry" — it is fully replaced by aggregate_geometries() below.
  3. Build "Input" as a shallow structure — every component dataset is fully replaced, so no deep copy is needed.

Concretely, in fine/aggregations/spatialAggregation/aggregation.py, replace:

# Make a copy of xarray_dataset
aggregated_xr_dataset = deepcopy(xarray_datasets)

with something like:

# Build output dict explicitly to avoid deepcopy of Shapely object-dtype arrays
# (which causes RecursionError in newer xarray versions).
# Only "Parameters".attrs needs a real deep copy — it contains mutable Python
# objects (set, pd.DataFrame) that are modified in-place below.
# "Geometry" and "Input" are fully replaced further down, so no copy is needed.
parameters_copy = xarray_datasets.get("Parameters").copy()
parameters_copy.attrs = deepcopy(xarray_datasets.get("Parameters").attrs)
aggregated_xr_dataset = {
    "Parameters": parameters_copy,
    "Input": {
        comp_class: dict(comp_dict)
        for comp_class, comp_dict in xarray_datasets.get("Input").items()
    },
}

The "Geometry" key is then assigned on line 463 as before:

aggregated_xr_dataset["Geometry"] = aggregate_geometries(...)

Failing Tests (all 13 in spatial aggregation)

FAILED test/aggregations/spatialAggregation/test_manager.py::test_esm_to_xr_and_back_during_spatial_aggregation[False] - RecursionError: maximum recursion depth exceeded
FAILED test/aggregations/spatialAggregation/test_manager.py::test_esm_to_xr_and_back_during_spatial_aggregation[True] - RecursionError: maximum recursion depth exceeded
FAILED test/aggregations/spatialAggregation/test_manager.py::test_spatial_aggregation_string_based - RecursionError: maximum recursion depth exceeded
FAILED test/aggregations/spatialAggregation/test_manager.py::test_spatial_aggregation_parameter_based[3-None] - RecursionError: maximum recursion depth exceeded
FAILED test/aggregations/spatialAggregation/test_manager.py::test_spatial_aggregation_parameter_based[3-aggregation_function_dict1] - RecursionError: maximum recursion depth exceeded
FAILED test/aggregations/spatialAggregation/test_manager.py::test_spatial_aggregation_parameter_based[2-None] - RecursionError: maximum recursion depth exceeded
FAILED test/aggregations/spatialAggregation/test_manager.py::test_spatial_aggregation_parameter_based[2-aggregation_function_dict1] - RecursionError: maximum recursion depth exceeded
FAILED test/aggregations/spatialAggregation/test_manager.py::test_spatial_aggregation_distance_based[None-None-2] - RecursionError: maximum recursion depth exceeded
FAILED test/aggregations/spatialAggregation/test_manager.py::test_spatial_aggregation_distance_based[skip_regions1-None-3] - RecursionError: maximum recursion depth exceeded
FAILED test/aggregations/spatialAggregation/test_manager.py::test_spatial_aggregation_distance_based[None-enforced_...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs.

…targeted copy

Co-authored-by: julian-belina <56728940+julian-belina@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix RecursionError in spatial aggregation tests due to deepcopy Fix RecursionError in spatial aggregation from deepcopy of Shapely-containing xarray datasets Feb 23, 2026
Copilot AI requested a review from julian-belina February 23, 2026 21:22
@julian-belina julian-belina marked this pull request as ready for review February 23, 2026 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants