Gauge Interest & Use Cases: Support creation of anemoi-datasets using Forecast Data #412
Replies: 4 comments 1 reply
-
Current reforecast datasets for downscaling have specific particularities. Training on ensemble forecasts could also be useful—not as an ensemble like AIFS-CRPS, but simply to increase sample size. This would add a fourth dimension to the tuple. Other use cases may introduce further dimensions. The result is multiple timestamps for the same valid time. To address this, the feature/fake-hindcasts branch in anemoi-datasets (thanks Baudouin) implements abstract indices or “fake dates” for indexing. These indices do not map directly to valid times but to tuples of information. The approach works but is hacky. A cleaner implementation of abstract indexing would be great. It should ensure safe pairing of data across datasets, e.g. matching low- and high-resolution inputs. At training time, one should be able to retrieve the correct tuple—for example (reference time = 2015-08-02, lead time = 24h, model time = 2023-08-02)—consistently across datasets. |
Beta Was this translation helpful? Give feedback.
-
For nowcasting, I had to flatten forecast data in slices of [t_0, t_0+6h] of valid times for anemoi-datasets creation as the objective was to interpolate between t_0 and t_0+6h every 10min. I would second what was said above and adopt a convention like (reference_time, lead_time, validity_time). |
Beta Was this translation helpful? Give feedback.
-
Thanks for opening this discussion @anaprietonem. Emulating a forecasting model and predicting (re-)analysis-to-(re-)analysis changes are slightly different tasks. Predicting analysis to analysis changes tries to essentially improve upon existing forecasting models whereas forecasting model emulation seeks to answer the question, "What would the forecasting model have predicted, had it seen this initial state."
We have been thinking of this data as indexed by a tuple of two times |
Beta Was this translation helpful? Give feedback.
-
Hello! Thanks for starting the discussion! I’m very interested in this functionality to produce data for performing forecast verification. At KNMI, we work with UWC-West reforecast data, including both forecasts and analyses. The analyses and NWP forecasts are initially stored as GRIB2 files (one file per date and lead time) on a remote machine in Iceland with no internet access. For AI model inference, the netcdf format has been chosen so far. This functionality would allow us to build all the necessary NWP forecasts into Zarr files, which is ideal since the verification package we use (developed by RMI and mainly maintained by @mpvginde) already supports the Zarr format (use xarray and dask). You can find the verification package here: rmai-verification on GitHub. Using indexes as those already suggested Thanks! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
anemoi-datasets
currently focuses on generating datasets for analysis, usually with one timestamp per valid time. We’re exploring the idea of also supporting forecast data. This could be useful, but it’s trickier because:We need to decide how the data should be stored and shaped before we build anything.
👉 We’d love your input:
Would this be useful for you?
What would your use cases look like? & How would you expect the data to be structured ?
Your feedback will help us decide if and how to move forward with this feature. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions