Skip to content

Modify ensure_aggregatable_across_cases to ensure NetCDF cubes are correctly merged#1199

Merged
jfrost-mo merged 15 commits intomainfrom
merging_2d_time_coord
Mar 4, 2025
Merged

Modify ensure_aggregatable_across_cases to ensure NetCDF cubes are correctly merged#1199
jfrost-mo merged 15 commits intomainfrom
merging_2d_time_coord

Conversation

@jfrost-mo
Copy link
Member

@jfrost-mo jfrost-mo commented Feb 24, 2025

Fixes #1114

Contribution checklist

Aim to have all relevant checks ticked off before merging. See the developer's guide for more detail.

  • Documentation has been updated to reflect change.
  • New code has tests, and affected old tests have been updated.
  • All tests and CI checks pass.
  • Ensured the pull request title is descriptive.
  • Conda lock files have been updated if dependencies have changed.
  • Attributed any Generative AI, such as GitHub Copilot, used in this PR.
  • Marked the PR as ready to review.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 24, 2025

Coverage

@jfrost-mo jfrost-mo force-pushed the merging_2d_time_coord branch from a30a9fb to 5deec84 Compare February 24, 2025 15:04
@jfrost-mo jfrost-mo changed the title Modify ensure_aggregatable_across_cases to ensure NetCDF cubes are correctly merged. Modify ensure_aggregatable_across_cases to ensure NetCDF cubes are correctly merged Feb 24, 2025
@jfrost-mo jfrost-mo marked this pull request as ready for review February 24, 2025 15:21
@jfrost-mo jfrost-mo marked this pull request as draft February 24, 2025 15:21
@jfrost-mo

This comment was marked as resolved.

@jfrost-mo jfrost-mo marked this pull request as ready for review February 24, 2025 16:26
daflack

This comment was marked as resolved.

@jfrost-mo jfrost-mo force-pushed the merging_2d_time_coord branch 2 times, most recently from 5cddad3 to 6f663c0 Compare February 27, 2025 12:13
@jfrost-mo jfrost-mo requested a review from daflack February 27, 2025 12:43
Copy link
Contributor

@daflack daflack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a run, there are positive signs of improvement with some LFRic and some difference plot aggregation occurring. However, there are some issues that require changes:

  • aggregation by validity time - all attempts (regardless of model) fail due to lack of time coordinate
  • some fields are still being considered as cubelists therefore slices_over fails this has particular problems for histogram aggregation
  • hour of day difference aggregation fails (difference operator needs a time coordinate - relates to #1204)
  • Lightning difference aggregation fails with `Attribute Error: "NoneType" object has no attribute "attributes"

@daflack
Copy link
Contributor

daflack commented Feb 28, 2025

Further notes on the errors - the second bullet point is not due to having both time processed fields and instantaneous fields in the output as the error around cubelists occur for fields that do not appear in the time processed files.

@jfrost-mo jfrost-mo force-pushed the merging_2d_time_coord branch from a0a97ba to 65c519b Compare February 28, 2025 15:13
@jfrost-mo
Copy link
Member Author

I think we now have a fix, or at least an explanation, for each of those failure cases.

@daflack daflack self-requested a review March 3, 2025 12:26
Copy link
Contributor

@daflack daflack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me and only fails with OOM - so happy for it to be merged once tests are a bit more comprehensive for the parts added.

@jfrost-mo jfrost-mo requested a review from daflack March 3, 2025 14:38
@jfrost-mo
Copy link
Member Author

Tests have now been added, so ready for final review.

As the difference operator requires just two cubes this must be done
beforehand. For MEAN collapsing, as these recipes do, this should give
exactly the same result.
aggregate.ensure_aggregatable_across_cases has been modified to ensure
it always tries to fix the provided cubes.

To ensure cubes merge properly we also remove time_origin attribute from
the time coordinate at load time, taking care to only remove it once we
don't need it.
Remove collapse_by_lead_time as it can be replicated with collapse and
remove unnecessary tests.
The only substantive change should be that we now remove the
equalised_validity_time coordinate that is added during the operator.
We can only equalise the time coordinate when it actually exists, so we
skip when it does not.

Fixes #1204
It isn't useful to collapse over validity time for cubes that don't
overlap, so now we provide a descriptive error message instead.
Test difference of cubes without time coordinates and error when trying
to take difference of non-overlapping cubes.
@jfrost-mo jfrost-mo force-pushed the merging_2d_time_coord branch from 6662849 to 2d75e08 Compare March 3, 2025 14:39
@jfrost-mo jfrost-mo removed the request for review from jwarner8 March 3, 2025 15:52
Copy link
Contributor

@daflack daflack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a couple of questions around the tests. I'm not sure if you have deleted ones that we should still keep in and also wondering if there is an unnecessary if statement?

It significantly increased the testing burden and usage complexity of
the operator for little benefit. It only affected whether to do a
pre-collapse by lead time ahead of the coord categorisation, and we can
automatically detect that.
Copy link
Member Author

@jfrost-mo jfrost-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now ensured the tests cover everything changed in this PR.

As part of this I did remove the multi_case argument to collapse_by_hour_of_day. It significantly increased the complexity of testing, as well as general usage of the operator for very little benefit. I'm not convinced over any traceability benefits, as it only affects whether their is a collapse by lead time prior to categorising the coordinates, and the coordinates that are generated from that are removed within the operator.

@jfrost-mo jfrost-mo requested a review from daflack March 4, 2025 11:09
Copy link
Contributor

@daflack daflack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing those changes. I'm happy with everything so feel free to merge.

@jfrost-mo jfrost-mo merged commit 302580c into main Mar 4, 2025
8 checks passed
@jfrost-mo jfrost-mo deleted the merging_2d_time_coord branch March 4, 2025 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Loading in multiple forecasts in netcdf format creates multiple cubes rather than a single cube

2 participants