-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Is your feature request related to a problem? Please describe.
The UserMetadata class already contains logic to recognize date and time
keywords like date, time, and step and uses them to automatically
construct e.g. a forecast_reference_time and base_datetime.
However, this currently does not yet work for user metadata for hindcast and
reforecast datasets that use a hdate keyword to indicate the start time for
which the forecast is made, whereas date stands for the version of the model
with which the forecast is made (hdate documentation).
It would be nice to fix this so this works automatically.
Describe the solution you'd like
One example where the current behavior is wrong and how the correct behavior would look like
is captured by the following test:
diff --git a/tests/utils/metadata/test_dict.py b/tests/utils/metadata/test_dict.py
new file mode 100644
index 0000000..7475cab
--- /dev/null
+++ b/tests/utils/metadata/test_dict.py
@@ -0,0 +1,27 @@
+import datetime
+from earthkit.data.utils.metadata.dict import UserMetadata
+
+
+def test_user_metadata_recognizes_reforecast_mars_keywords():
+ metadata = UserMetadata(
+ {
+ "class": "od",
+ "expver": "0001",
+ "stream": "enfh",
+ "type": "cf",
+ "levtype": "sfc",
+ "param": "167.128",
+ "date": "20180830", # Model version
+ "hdate": "20100830", # Start date of the forecasts
+ "time": "0000", # Forecast starts at 0am
+ "step": 12, # Forecast 12 hours ahead
+ }
+ )
+
+ # Desired behavior
+ assert metadata.base_datetime() == datetime.datetime(2010, 8, 30, 0, 0)
+ assert metadata.valid_datetime() == datetime.datetime(2010, 8, 30, 12, 0)
+
+ # Current behavior (wrong)
+ # assert metadata.base_datetime() == datetime.datetime(2018, 8, 30, 0, 0)
+ # assert metadata.valid_datetime() == datetime.datetime(2018, 8, 30, 12, 0)Describe alternatives you've considered
I temporarily "fixed" this for my ongoing work by renaming the hdate key to date in my metadata dictionary before constructing the UserMetadata object. However, I think this should be dealt with automatically by earthkit-data. Not only because this is a common use-case (and e.g. works out of the box when constructing sources from grib files) but also because this eliminates useful information (ideally, the date representing the model version is included as an attribute in the xarray dataset when the UserMetadata is used e.g. in a list_of_dicts source).
As something that's closer to an actual fix, I then made the following change to the UserMetadata class:
diff --git a/src/earthkit/data/utils/metadata/dict.py b/src/earthkit/data/utils/metadata/dict.py
index a598d8c..44e029c 100644
--- a/src/earthkit/data/utils/metadata/dict.py
+++ b/src/earthkit/data/utils/metadata/dict.py
@@ -416,6 +416,10 @@ class UserMetadata(Metadata):
v = to_datetime(v)
return v
+ v = self._datetime("hdate", "time")
+ if v is not None:
+ return v
+
v = self._datetime("date", "time")
if v is not None:
return vThis e.g. makes the test I mentioned above pass. However, since this would mean that the hdate keyword would always be prioritized to the date keyword and might have unintended side effects, I didn't yet create a PR and wanted to discuss if there's already other ongoing work that's related to my problem.
Additional context
This issues is motivated by a use-case we have for ongoing work in
#689.
Organisation
No response