-
Notifications
You must be signed in to change notification settings - Fork 43
Allow grouping input cubes by date (instead of filename) for fix_metadata
#2551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Here's a small recipe to test this for rsut (I also included rsdt to ensure that this still works fine): # ESMValTool
---
documentation:
title: test
description: test
authors:
- schlund_manuel
datasets:
- {project: native6, dataset: ERA5, type: reanaly, version: v1, tier: 3, timerange: 2000/2001}
diagnostics:
test:
variables:
rsut:
mip: Amon
rsdt:
mip: Amon
scripts:
nullInput files need to be arranged like this: @bouweandela do you think this approach is a reasonable solution to this problem? As mentioned in the description, it doesn't solve the problem for all cases, but a different grouping will be necessary for all of them. And it is fully sufficient for the ERA5 netCDF case. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2551 +/- ##
==========================================
- Coverage 95.20% 95.12% -0.08%
==========================================
Files 259 259
Lines 15211 15232 +21
==========================================
+ Hits 14481 14489 +8
- Misses 730 743 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| fixed_cubes = CubeList() | ||
|
|
||
| # Group cubes by input file and apply all fixes to each group element | ||
| # (i.e., each file) individually | ||
| by_file = defaultdict(list) | ||
| for cube in cubes: | ||
| by_file[cube.attributes.get("source_file", "")].append(cube) | ||
|
|
||
| for cube_list in by_file.values(): | ||
| cube_list = CubeList(cube_list) | ||
| # Group cubes and apply all fixes to each group element individually. There | ||
| # are two options for grouping: | ||
| # (1) By input file name (default). | ||
| # (2) By time range (can be enabled by setting the attribute | ||
| # GROUP_CUBES_BY_DATE=True for the fix class; see | ||
| # _fixes.native6.era5.Rsut for an example). | ||
| grouped_cubes = _group_cubes(fixes, cubes) | ||
| for cube_list in grouped_cubes.values(): | ||
| for fix in fixes: | ||
| cube_list = fix.fix_metadata(cube_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be nicer to define the grouping operation on the Fix object, so this code would look like:
fixed_cubes = CubeList(cubes)
for fix in fixes:
fixed_cubes = CubeList(
cube
for group in fix.group_input_for_fix_metadata(fixed_cubes)
for cube in fix.fix_metadata(group)
)
If it works for you, it should be fine |
…_for_fix_metadata
…_for_fix_metadata
Description
This PR allows grouping the input cubes for our
fix_metadatafunctions by date, i.e., all files with the same data range are passed to the fix simultaneously). This can be enabled by setting the class variableGROUP_CUBES_BY_DATE = Truein the corresponding fix class. This allows implementing fixes where variables from multiple input files are necessary (for example, to derive rsut for ERA5).This solution only works for projects where the input files are located in the same directory, and the input file pattern is flexible enough to find all files. This is fine for the native ERA5 data in netCDF format (that we need to manually download and put into the corresponding directories). For other projects where files are stored in different directories, further changes are necessary (potentially in
local.py). However, this PR is a prerequisite to make these other cases work.By default, input cubes are grouped by filename for
fix_metadata(i.e., eachfix_metadatacall operates only on a single file):ESMValCore/esmvalcore/cmor/fix.py
Lines 197 to 204 in fd82b43
Note that this is fully backwards-compatible since the new functionality needs to be explicitly enabled.
Closes #1806
Link to documentation: TBA
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
To help with the number pull requests: