Fix RecursionError in spatial aggregation from deepcopy of Shapely-containing xarray datasets#568
Open
Fix RecursionError in spatial aggregation from deepcopy of Shapely-containing xarray datasets#568
Conversation
…targeted copy Co-authored-by: julian-belina <56728940+julian-belina@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix RecursionError in spatial aggregation tests due to deepcopy
Fix RecursionError in spatial aggregation from deepcopy of Shapely-containing xarray datasets
Feb 23, 2026
julian-belina
approved these changes
Feb 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In newer xarray versions,
__deepcopy__iterates more deeply throughobject-dtype arrays, causing aRecursionErrorwhen Shapely geometry objects are present. This broke all 13 spatial aggregation tests.The root cause was an unnecessary blanket
deepcopy(xarray_datasets)inaggregate_based_on_sub_to_sup_region_id_dict— wasteful because"Geometry"and all"Input"component datasets are fully replaced downstream anyway.Changes
fine/aggregations/spatialAggregation/aggregation.pydeepcopy(xarray_datasets)with a targeted construction ofaggregated_xr_dataset:"Parameters"— shallow xarray.copy()+deepcopyof.attrsonly (the only part with mutable objects mutated in-place:set,pd.DataFrame)"Input"— shallow dict copy (all component datasets are fully replaced in the loop below)"Geometry"— omitted entirely (assigned fresh fromaggregate_geometries()on the next line)Shapely geometries are immutable by design — deep-copying them has never been necessary.
Original prompt
Fix
RecursionErrorin spatial aggregation caused bydeepcopyon xarray datasets containing Shapely geometry objectsProblem
All 13 spatial aggregation tests fail with
RecursionError: maximum recursion depth exceededwhen using newer versions of xarray (the tests pass with xarray 2024.9.0 but fail with later versions).The crash originates in
fine/aggregations/spatialAggregation/aggregation.py, line 439:The
xarray_datasetsdict contains a"Geometry"key (added inmanager.pyviaxr_datasets["Geometry"] = geom_xr). Thisgeom_xrxarray Dataset holds Shapely geometry objects and centroids stored asobject-dtype arrays. In newer xarray versions,__deepcopy__iterates more deeply throughobject-dtype arrays, triggering Python's recursion limit on Shapely's complex internal structure.Root Cause Analysis
Examining
aggregate_based_on_sub_to_sup_region_id_dict, thedeepcopyof the fullxarray_datasetsis wasteful and broken:"Geometry"— fully replaced byaggregate_geometries()(which builds brand-new Shapely objects viaunary_union). The deep copy of its contents is entirely wasted."Input"datasets — every component's dataset is fully replaced by a freshly-builtxr.Dataset(). The deep copy is entirely wasted."Parameters".attrs— this actually needs a deep copy: it contains mutable Python objects (set,pd.DataFrame, etc.) that are modified in-place. It contains no Shapely objects and is safe todeepcopy.Shapely geometry objects are immutable by design — every operation returns a new object. They never need to be deep-copied.
Fix
Replace the blanket
deepcopy(xarray_datasets)with a targeted approach:"Parameters".attrs— the only part that is mutated in-place and needs isolation from the original."Geometry"— it is fully replaced byaggregate_geometries()below."Input"as a shallow structure — every component dataset is fully replaced, so no deep copy is needed.Concretely, in
fine/aggregations/spatialAggregation/aggregation.py, replace:with something like:
The
"Geometry"key is then assigned on line 463 as before:Failing Tests (all 13 in spatial aggregation)