Conversation
updated mapping of cmip6 to cmip6plus for all variables
|
@tomvothecoder when you have a chance, could you help to evaluate if we can consider merge this branch authored by Yemi into main branch? |
There was a problem hiding this comment.
Hey @chengzhuzhang, here is my initial, quick review. I left some PR review comments.
Overall Analysis
The implemented logic to support a configurable mip_era does seem like it can work, but it does not scale well. The logic selects the CMIP table for each variable handler based on the mip_era argument with the assumption that variable handlers order tables as [CMIP6, CMIP6+]. If we add more tables or support more mip_era options, it will break.
Suggestions
One suggestion is to extend the util.py definition for frequencies, tables, etc. using a mapping table to the mip_era argument. Then we can perform the correct parsing logic in a scalable way.
Here's a rough example demonstrating this concept:
MIP_ERA_METADATA =
{
cmip6: {
frequencies: [...],
tables: {
"atmos": [...],
"land": [...],
...
}
},
cmip6plus: {
frequencies: [...],
tables: {
"atmos": [...],
"land": [...],
...
}
}
}
# We can keep how it is implemented now, but it is cleaner to extend the above mapping tables to do this for us.
FREQUENCY_TO_CMIP_TABLES = {
...
}Of course, this will involve downstream refactoring for code to reference MIP_ERA_METADATA.
I'll think more for better solutions if there are any and am open to ideas too.
Questions
- Do we plan on supporting
mip_erafor atm/land handlers at some point too (in addition to MPAS handlers in this PR)? - Is there already a mapping table that maps frequencies and tables to
mip_era? I recall repos such as https://github.com/E3SM-Project/CMIP6-Metadata used bye3sm_to_cmip.
| for var_handlers in var_handler: | ||
| if type(var_handlers["table"]) is list: | ||
| if mip_era == "cmip6" and len(var_handlers["table"]) > 1: | ||
| var_handlers["table"] = str(var_handlers["table"][0]) | ||
|
|
||
| elif mip_era == "cmip6plus" and len(var_handlers["table"]) > 1: | ||
| var_handlers["table"] = str(var_handlers["table"][1]) |
There was a problem hiding this comment.
Duplicate logic from load_all_handlers(). Can extract as a re-usable private function.
| for var_handler in var_handlers: | ||
| if type(var_handler["table"]) is list: | ||
| if mip_era == "cmip6" and len(var_handler["table"]) > 1: | ||
| var_handler["table"]= str(var_handler["table"][0]) | ||
|
|
||
| elif mip_era == "cmip6plus" and len(var_handler["table"]) > 1: | ||
| var_handler["table"] = str(var_handler["table"][1]) |
There was a problem hiding this comment.
Duplicate logic from load_all_handlers(). Can extract as a re-usable private function.
| if type(table_name) is list: | ||
| if mip_era == "cmip6" and len(table_name) > 1: | ||
| table_name = str(table_name[0]) | ||
|
|
||
| elif mip_era == "cmip6plus" and len(table_name) > 1: | ||
| table_name = str(table_name[1]) |
There was a problem hiding this comment.
Duplicate logic as mentioned earlier, can be extracted to re-usable private function.
| for var_handlers in var_handler: | ||
| if type(var_handlers["table"]) is list: | ||
| if mip_era == "cmip6" and len(var_handlers["table"]) > 1: | ||
| var_handlers["table"] = str(var_handlers["table"][0]) | ||
|
|
||
| elif mip_era == "cmip6plus" and len(var_handlers["table"]) > 1: | ||
| var_handlers["table"] = str(var_handlers["table"][1]) |
There was a problem hiding this comment.
The logical assumption that the first table is a CMIP6 table and the second table is a CMIP6+ table can easily break and does not scale well.
It is a good idea to further categorize CMIP tables by mip_era.
| @@ -32,16 +67,53 @@ | |||
| "CMIP6_fx.json", | |||
| ] | |||
|
|
|||
| LAND_TABLES = ["CMIP6_Lmon.json", "CMIP6_LImon.json"] | |||
| OCEAN_TABLES = ["CMIP6_Omon.json", "CMIP6_Ofx.json"] | |||
| SEAICE_TABLES = ["CMIP6_SImon.json"] | |||
| LAND_TABLES = ["MIP_LI3hrPt.json", "MIP_LI6hrPt.json", "MIP_LIday.json", "MIP_LIfx.json", "MIP_LImon.json", "MIP_LIsubhrPtSite.json", "MIP_LP3hr.json", "MIP_LP3hrPt.json", "MIP_LP6hrPt.json", "MIP_LPday.json", "MIP_LPfx.json", "MIP_LPmon.json", "MIP_LPyr.json", "MIP_LPyrPt.json", "CMIP6_Lmon.json", "CMIP6_LImon.json"] | |||
| OCEAN_TABLES = ["MIP_OBday.json", "MIP_OBmon.json", "MIP_OBmonLev.json", "MIP_OByr.json", "MIP_OByrLev.json", "MIP_OP3hrPt.json", "MIP_OPday.json", "MIP_OPdec.json", "MIP_OPdecLev.json", "MIP_OPdecZ.json", "MIP_OPfx.json", "MIP_OPmonClim.json", "MIP_OPmonClimLev.json", "MIP_OPmon.json", "MIP_OPmonLev.json", "MIP_OPmonZ.json", "MIP_OPyr.json", "MIP_OPyrLev.json", "CMIP6_Omon.json", "CMIP6_Ofx.json"] | |||
| SEAICE_TABLES = ["MIP_SIday.json", "MIP_SImon.json", "MIP_SImonPt.json", "CMIP6_SImon.json"] | |||
|
|
|||
| # Mapping from CMIP6 table names to their associated frequency. | |||
| # This dictionary helps determine the frequency category (e.g., "mon", "day", etc.) | |||
| # for a given CMIP6 table name when deriving or validating variable handlers. | |||
| # Keys are CMIP6 table filenames, values are their frequency strings. | |||
| FREQUENCY_TO_CMIP_TABLES = { | |||
| "yr": [ | |||
| "MIP_LPyr.json", | |||
| "MIP_LPyrPt.json", | |||
| "MIP_OByr.json", | |||
| "MIP_OByrLev.json", | |||
| "MIP_OPyr.json", | |||
| "MIP_OPyrLev.json", | |||
| ], | |||
| "mon": [ | |||
| "MIP_ACmon.json", | |||
| "MIP_ACmonZ.json", | |||
| "MIP_AEmon.json", | |||
| "MIP_AEmonLev.json", | |||
| "MIP_AEmonZ.json", | |||
| "MIP_APmonClim.json", | |||
| "MIP_APmonClimLev.json", | |||
| "MIP_APmonDiurnal.json", | |||
| "MIP_APmon.json", | |||
| "MIP_APmonLev.json", | |||
| "MIP_APmonZ.json", | |||
| "MIP_APfx.json", | |||
| "MIP_LPmon.json", | |||
| "MIP_LPfx.json", | |||
| "MIP_LImon.json", | |||
| "MIP_LIfx.json", | |||
| "MIP_OBmon.json", | |||
| "MIP_OBmonLev.json", | |||
| "MIP_OPmonClim.json", | |||
| "MIP_OPdec.json", | |||
| "MIP_OPdecLev.json", | |||
| "MIP_OPdecZ.json", | |||
| "MIP_OPmonClimLev.json", | |||
| "MIP_OPmon.json", | |||
| "MIP_OPmonLev.json", | |||
| "MIP_OPmonZ.json", | |||
| "MIP_OPfx.json", | |||
| "MIP_SImon.json", | |||
| "MIP_SImonPt.json", | |||
| "CMIP6_Amon.json", | |||
| "CMIP6_Lmon.json", | |||
| "CMIP6_LImon.json", | |||
| @@ -53,26 +125,64 @@ | |||
| "CMIP6_SImon.json", | |||
| ], | |||
| "day": [ | |||
| "MIP_AEday.json", | |||
| "MIP_APday.json", | |||
| "MIP_APdayLev.json", | |||
| "MIP_APdayZ.json", | |||
| "MIP_LPday.json", | |||
| "MIP_LIday.json", | |||
| "MIP_OBday.json", | |||
| "MIP_OPday.json", | |||
| "MIP_SIday.json", | |||
| "CMIP6_day.json", | |||
| "CMIP6_CFday.json", | |||
| "CMIP6_AERday.json", | |||
| ], | |||
| "3hr": [ | |||
| "MIP_AE1hr.json", | |||
| "MIP_AE3hrPt.json", | |||
| "MIP_AE3hrPtLev.json", | |||
| "MIP_AP3hr.json", | |||
| "MIP_AP3hrPt.json", | |||
| "MIP_AP3hrPtLev.json", | |||
| "MIP_LP3hr.json", | |||
| "MIP_LP3hrPt.json", | |||
| "MIP_LI3hrPt.json", | |||
| "MIP_OP3hrPt.json", | |||
| "CMIP6_3hr.json", | |||
| "CMIP6_CF3hr.json", | |||
| ], | |||
| "6hrLev": [ | |||
| "MIP_AE6hr.json", | |||
| "MIP_AP6hr.json", | |||
| "CMIP6_6hrLev.json", | |||
| ], | |||
| "6hrPlev": [ | |||
| "MIP_AE6hrPt.json", | |||
| "MIP_AP6hrPt.json", | |||
| "MIP_LP6hrPt.json", | |||
| "MIP_LI6hrPt.json", | |||
| "CMIP6_6hrPlev.json", | |||
| ], | |||
| "6hrPlevPt": [ | |||
| "MIP_AE6hrPtLev.json", | |||
| "MIP_AP6hrPtLev.json", | |||
| "MIP_AP6hrPtZ.json", | |||
| "CMIP6_6hrPlevPt.json", | |||
| ], | |||
| "1hr": [ | |||
| "MIP_AE1hr.json", | |||
| "MIP_AP1hr.json", | |||
| "MIP_AP1hrPt.json", | |||
| "CMIP6_AERhr.json", | |||
| ], | |||
| "subhr": [ | |||
| "MIP_AEsubhrPt.json", | |||
| "MIP_AEsubhrPtSite.json", | |||
| "MIP_APsubhrPt.json", | |||
| "MIP_APsubhrPtLev.json", | |||
| "MIP_LIsubhrPtSite.json", | |||
| ] | |||
| } | |||
There was a problem hiding this comment.
Combining CMIP6 tables and frequencies with CMIP6+ can work, but it makes downstream parsing logic more convoluted.
As mentioned here, it is a good idea to categorize CMIP tables by MIP ERA. Otherwise, we pass mip_era then make the assumption that the first table in a variable handler is CMIP6 and the second is CMIP6+ to get the correct table.
|
@tomvothecoder Thanks for reviewing the PR request. To answer your question about supporting atm/lnd handlers. Yes code changes for atm/lnd handlers are included in this PR. The changes in the cmor_handlers/utils.py file include changes for the atm/lnd files. I have also added changes to the handler.yaml file to map all the atm/lnd CMIP6 variables in the file to include their CMIP6Plus equivalents. The mapping between CMIP6 and CMIP6Plus variables can be found here: https://github.com/PCMDI/mip-cmor-tables/blob/main/src/Mapping%20between%20CMIP6%20and%20CMIP6Plus%20MIP%20tables.ipynb |
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
Thank you @ogaruba for the work and I appreciate your clarification! I will do another review and will try to fit in time to extend this PR with the mip era mapping. @chengzhuzhang Any particular deadline/priority for this PR? |
Description
This PR is created from @ogaruba's fork. Yemi worked on cmorizing LESFMIP simulations and published them to CMIP6Plus project.
This update may be useful for ecosystem projects and for the transition to support CMIP7.
Checklist
If applicable: