Skip to content

Commit 57a2895

Browse files
b8raoultflyIchtuspre-commit-ci[bot]floriankrb
authored
feat: abstracting accumulation (#326)
## Description This PR revisits the accumulation source to extend it to work with other backends apart from MARS. As part of this work the main outcomes and changes are: - **Interface change:** Accumulation source is now named 'accumulate' and can work with both MARS and GRIB files. This new source takes 4 inputs: source, availability, patch and period. - PR includes docs with more details on the above docs/building/sources/accumulate.rst and a recipe example docs/howtos/create/05-create-accumulations.rst Future work will be done to extend this source to work with other backends. The new design will make this much easier as new backends will be easier to simply plug to the source. <!-- readthedocs-preview anemoi-datasets start --> ---- 📚 Documentation preview 📚: https://anemoi-datasets--326.org.readthedocs.build/en/326/ <!-- readthedocs-preview anemoi-datasets end --> --------- Co-authored-by: flyIchtus <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ClémentBrochet <[email protected]> Co-authored-by: Florian Pinault <[email protected]> Co-authored-by: Florian Pinault <[email protected]>
1 parent 8cb57d4 commit 57a2895

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+2992
-2155
lines changed

docs/building/sources.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ The following `sources` are currently available:
1919
.. toctree::
2020
:maxdepth: 1
2121

22-
sources/accumulations
22+
sources/accumulate
2323
sources/anemoi-dataset
2424
sources/cds
2525
sources/eccc-fstd
Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
###############
2+
accumulate
3+
###############
4+
5+
.. note::
6+
7+
The ``accumulate`` source was previously named ``accumulations``.
8+
The API has changed in the following ways:
9+
10+
- The parameter ``accumulation_period`` has been renamed to ``period``.
11+
- The source can be now different from ``mars`` (e.g., ``mars``, ``grib-index``)
12+
it must now be explicitly specified as a nested dictionary under the ``source`` key.
13+
- The (optional) available accumulation intervals can now be specified using the ``availability`` key.
14+
15+
Accumulations and flux variables, such as precipitation, are often
16+
forecast fields, which are archived for a given base date (or reference
17+
time) and a forecast time (or step). These fields are valid at the
18+
forecast time and are accumulated over a given period of time, with the
19+
relation: :math:`valid\_date = base\_date + step`.
20+
21+
Because the package builds datasets according to the valid date of the
22+
fields, it must be able to reconstruct the requested accumulation period
23+
from the available data in the source dataset. Furthermore, some fields
24+
are accumulated since the beginning of the forecast (e.g. ECMWF
25+
operational forecast), while others are accumulated since the last time
26+
step (e.g. ERA5).
27+
28+
The ``accumulate`` source requires the following parameters:
29+
30+
- **period**: The requested accumulation period (e.g., ``6h``, ``12h``, ``24h``, ``1d``).
31+
This can be specified as a string with units ``"6h"``.
32+
Periods shorter than one hour such as ``"30min"`` are not supported yet.
33+
- **source**: The data source configuration. Currently only ``mars`` and ``grib-index`` sources are supported.
34+
- **availability**: Information about how accumulations are stored in
35+
the data source. This allows the package to determine which intervals to use
36+
for reconstructing the requested accumulation period (see below).
37+
- **patch** (optional): Patches to apply to fields returned by the source to fix metadata issues.
38+
Default patching is to set ``startStep`` to ``0`` when ``startStep==endStep``.
39+
40+
.. warning::
41+
42+
If the data provided by the source does not match the definition provided
43+
in the ``availability`` parameter, the package will attempt to check the
44+
metadata of the source dataset and fail if the accumulation periods cannot
45+
be reconstructed.
46+
Defining the period to use to reconstruct the request accumulation period and
47+
checking the validity of the accumulation and relies on the metadata provided by the data source.
48+
**If the metadata is incomplete or inconsistent, the package may produce incorrect results.**
49+
50+
51+
Specifying the ``availability`` of accumulation intervals
52+
=========================================================
53+
54+
Data accumulation methods differ between datasets. Two common methods are to
55+
accumulate data either from the start of the forecast or from the previous time step.
56+
57+
- For ECMWF operational forecasts, the data is accumulated from the
58+
beginning of the forecast. For example, if the accumulation period is
59+
6h and the valid date is 2020-01-01 00:00, the source will use the
60+
forecast [1]_ of 2019-12-31 18:00 at step 6h.
61+
62+
- For ERA5, the data is accumulated since the last time step (hourly
63+
accumulations), and forecasts are only available at 06Z and 18Z. For a
64+
6h accumulation with valid date 2020-01-01 13:00, the source will sum
65+
the fields from the forecast of 2020-01-01 06:00 at steps 1-2h, 2-3h,
66+
3-4h, 4-5h, 5-6h, and 6-7h.
67+
68+
There are multiple ways to specify the ``availability`` parameter:
69+
70+
- `Option 1: Type-based availability`_
71+
- `Option 2: Availability over fixed periods`_
72+
- `Option 3: Automatic detection for well-known datasets`_
73+
- `Option 4: Finer control using explicit list of interval`_
74+
75+
76+
Option 1: Type-based availability
77+
---------------------------------
78+
79+
For more explicit control, use the **type** parameter with ``accumulated-from-start``
80+
or ``accumulated-from-previous-step``, along with **basetime**, **frequency**, and **last_step**.
81+
82+
.. list-table::
83+
:widths: 50 50
84+
:header-rows: 1
85+
86+
* - ECMWF operational (accumulated from start)
87+
- ERA5 (accumulated from previous step)
88+
* - .. literalinclude:: yaml/accumulations-from-start-mars-ecmwf-operational-forecast-2.yaml
89+
:language: yaml
90+
- .. literalinclude:: yaml/accumulations-from-previous-step-mars-era5-2.yaml
91+
:language: yaml
92+
93+
Option 2: Availability over fixed periods
94+
-----------------------------------------
95+
96+
If the source provides data accumulated over a fixed period, such as
97+
``availability: "1h"`` for hourly accumulated data, ``"3h"`` for
98+
3-hourly accumulated data, etc.
99+
100+
This approach should be used when all accumulation intervals for the fixed period are available
101+
for all base times.
102+
103+
Additionally, the period provided in ``availability`` must be compatible with the requested accumulation period,
104+
i.e., it must be a divisor of the requested period in ``period``.
105+
106+
.. literalinclude:: yaml/accumulations-grib-index.yaml
107+
:language: yaml
108+
109+
Option 3: Automatic detection for well-known datasets
110+
-----------------------------------------------------
111+
112+
The simplest approach is to use ``availability: auto``. The package will try to
113+
infer the availability from the ``mars`` source parameters (class, stream, origin).
114+
Supported combinations are:
115+
116+
- ERA5 reanalysis (class ``ea``, stream ``oper``)
117+
- ERA5 ensemble data assimilation (class ``ea``, stream ``enda``)
118+
- ECMWF operational forecasts (class ``od``, stream ``oper``)
119+
- ECMWF operational ensemble data assimilation (class ``od``, stream ``elda``)
120+
- Regional reanalysis (class ``rr``, stream ``oper``, origin ``se-al-ec``).
121+
- ERA5-Land (class ``l5``, stream ``oper``)
122+
123+
Automatic detection is not supported for the ``grib-index`` source.
124+
125+
.. list-table::
126+
:widths: 50 50
127+
:header-rows: 1
128+
129+
* - ECMWF operational (accumulated from start)
130+
- ERA5 (accumulated from previous step)
131+
* - .. literalinclude:: yaml/accumulations-from-start-mars-ecmwf-operational-forecast-1.yaml
132+
:language: yaml
133+
- .. literalinclude:: yaml/accumulations-from-previous-step-mars-era5-1.yaml
134+
:language: yaml
135+
136+
137+
Option 4: Finer control using explicit list of interval
138+
-------------------------------------------------------
139+
140+
For full control, provide an explicit list of ``(basetime, steps)`` pairs.
141+
142+
.. list-table::
143+
:widths: 50 50
144+
:header-rows: 1
145+
146+
* - ECMWF operational (accumulated from start)
147+
- ERA5 (accumulated from previous step)
148+
* - .. literalinclude:: yaml/accumulations-from-start-mars-ecmwf-operational-forecast-3.yaml
149+
:language: yaml
150+
- .. literalinclude:: yaml/accumulations-from-previous-step-mars-era5-3.yaml
151+
:language: yaml
152+
153+
These two examples are equivalent to those shown in Option 1 above.
154+
155+
.. [1]
156+
157+
For ECMWF forecasts, the forecasts at 00Z and 12Z are from the stream
158+
``oper`` while the forecasts at 06Z and 18Z are from the stream ``scda``.

docs/building/sources/accumulations.rst

Lines changed: 0 additions & 67 deletions
This file was deleted.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
accumulate:
2+
period: 6h
3+
availability: auto
4+
source:
5+
mars:
6+
expver: "0001"
7+
class: ea
8+
stream: oper
9+
type: fc
10+
grid: 20./20.
11+
levtype: sfc
12+
param: [tp, cp]
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
accumulate:
2+
period: 6h
3+
availability:
4+
type: accumulated-from-previous-step
5+
basetime: [6, 18]
6+
frequency: 1h
7+
last_step: 18
8+
source:
9+
mars:
10+
expver: "0001"
11+
class: ea
12+
stream: oper
13+
type: fc
14+
grid: 20./20.
15+
levtype: sfc
16+
param: [tp, cp]
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
accumulate:
2+
period: 6h
3+
availability:
4+
- [6, "0-1/1-2/2-3/3-4/4-5/5-6/6-7/7-8/8-9/9-10/10-11/11-12/12-13/13-14/14-15/15-16/16-17/17-18"]
5+
- [18, "0-1/1-2/2-3/3-4/4-5/5-6/6-7/7-8/8-9/9-10/10-11/11-12/12-13/13-14/14-15/15-16/16-17/17-18"]
6+
source:
7+
mars:
8+
expver: "0001"
9+
class: ea
10+
stream: oper
11+
type: fc
12+
grid: 20./20.
13+
levtype: sfc
14+
param: [tp, cp]
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
accumulate:
2+
period: 6h
3+
availability: auto
4+
source:
5+
mars:
6+
expver: "0001"
7+
class: od
8+
stream: oper
9+
type: fc
10+
grid: 20./20.
11+
levtype: sfc
12+
param: [tp, cp]
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
accumulate:
2+
period: 6h
3+
availability:
4+
type: accumulated-from-start
5+
basetime: [0, 12]
6+
frequency: 6h
7+
last_step: 18
8+
source:
9+
mars:
10+
expver: "0001"
11+
class: od
12+
stream: oper
13+
type: fc
14+
grid: 20./20.
15+
levtype: sfc
16+
param: [tp, cp]
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
accumulate:
2+
period: 6h
3+
availability:
4+
- [0, "0-6/0-12/0-18"]
5+
- [12, "0-6/0-12/0-18"]
6+
source:
7+
mars:
8+
expver: "0001"
9+
class: od
10+
stream: oper
11+
type: fc
12+
grid: 20./20.
13+
levtype: sfc
14+
param: [tp, cp]
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
accumulate:
2+
period: 6h
3+
availability: 1h
4+
grib-index:
5+
index-db: /path/to/grib/index.db
6+
param: [ tp, cp, sf ]
7+
levtype: sfc

0 commit comments

Comments
 (0)