You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _episodes/05-preprocessor.md
+56-59Lines changed: 56 additions & 59 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,17 +29,16 @@ Underneath the hood, each preprocessor is a modular python function that receive
29
29
30
30
Each preprocessor section includes a preprocessor name, a list of preprocessor steps to be executed and any arguments needed by the preprocessor steps.
31
31
32
-
>~~~YAML
33
-
> preprocessors:
34
-
> prep_timeseries:
35
-
> annual_statistics:
36
-
> operator: mean
37
-
>~~~
38
-
{: .source}
32
+
~~~yaml
33
+
preprocessors:
34
+
prep_timeseries:
35
+
annual_statistics:
36
+
operator: mean
37
+
~~~
39
38
40
-
For instance, the 'annual_statistics' with the 'operation: mean' argument preprocessor receives an iris cube, takes the annual average for each year of data in the cube, and returns the processed cube.
39
+
For instance, the 'annual_statistics' with the 'operation: mean' argument preprocessor receives an iris cube, takes the annual average for each year of data in the cube, and returns the processed cube.
41
40
42
-
You may use one or more of several preprocessors listed in the [documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/recipe/preprocessor.html). The standardised interface between the preprocessors allows them to be used modularly - like lego blocks. Almost any conceivable preprocessing order of operations can be performed using ESMValTool preprocessors.
41
+
You may use one or more of several preprocessors listed in the [documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/recipe/preprocessor.html). The standardised interface between the preprocessors allows them to be used modularly - like lego blocks. Almost any conceivable preprocessing order of operations can be performed using ESMValTool preprocessors.
43
42
44
43
> ## The 'custom order' command.
45
44
>
@@ -56,15 +55,15 @@ You may use one or more of several preprocessors listed in the [documentation](h
56
55
> Changing the order of preprocessors can also speed up your processing. For instance, if you want to extract a two-dimensional layer from a 3D field and re-grid it, the layer extraction should be done first. If you did it the other way around, then the regridding function would be applied to all the layers of your 3D cube and it would take much more time.
57
56
{: .callout}
58
57
59
-
Some preprocessor modules are always applied and do not need to be called. This includes the preprocessors that load the data, apply any fixes and save the data file afterwards. These do not need to be explicitly included in recipes.
58
+
Some preprocessor modules are always applied and do not need to be called. This includes the preprocessors that load the data, apply any fixes and save the data file afterwards. These do not need to be explicitly included in recipes.
60
59
61
60
> ## Exercise: Adding more preprocessor steps
62
61
>
63
62
> Edit the [example recipe](LINK to episode #4) to first change the variable thetao, then add preprocessors to average over the latitude and longitude dimensions and finally average over the depth. Now run the recipe.
64
63
>
65
64
>> ## Solution
66
-
>>
67
-
>>~~~YAML
65
+
>>
66
+
>>~~~yaml
68
67
>> preprocessors:
69
68
>> prep_timeseries:
70
69
>> annual_statistics:
@@ -73,7 +72,7 @@ Some preprocessor modules are always applied and do not need to be called. This
73
72
>> operator: mean
74
73
>> depth_integration:
75
74
>>~~~
76
-
>>{: .source}
75
+
>>
77
76
>{: .solution}
78
77
{: .challenge}
79
78
@@ -82,7 +81,7 @@ Some preprocessor modules are always applied and do not need to be called. This
82
81
You can also define different preprocessors with several preprocessor sections (setting different preprocessor names). In the variable section you call the specific preprocessor which should be applied.
83
82
84
83
> ## Example
85
-
>~~~YAML
84
+
>~~~yaml
86
85
> preprocessors:
87
86
> prep_timeseries_1:
88
87
> annual_statistics:
@@ -95,7 +94,7 @@ You can also define different preprocessors with several preprocessor sections (
@@ -105,9 +104,9 @@ You can also define different preprocessors with several preprocessor sections (
105
104
> short_name: thetaoga
106
105
> preprocessor: prep_timeseries_1
107
106
> scripts:
108
-
> timeseries_diag:
107
+
> timeseries_diag:
109
108
> script: ocean/diagnostic_timeseries.py
110
-
>
109
+
>
111
110
> diag_timeseries_temperature_2:
112
111
> description: simple_time_series
113
112
> variables:
@@ -118,29 +117,28 @@ You can also define different preprocessors with several preprocessor sections (
118
117
> timeseries_diag:
119
118
> script: ocean/diagnostic_timeseries.py
120
119
>~~~
121
-
>{: .source}
122
120
{: .solution}
123
121
124
122
>## Challenge : How to write a recipe with multiple preprocessors
125
123
> We now know that a recipe can have more than one diagnostic, variable or preprocessor. As we saw in the examples so far, we can group preprocessors with a single user defined name and can have more than one such preprocessor group in the recipe as well. Write two different preprocessors - one to regrid the data to a 1x1 resolution and the second preprocessor to mask out sea and ice grid cells before regridding to the same resolution. In the second case, ensure you perform the masking first before regridding (hint: custom order your operations). Now, use the two preprocessors in different diagnostics within the same recipe. You may use any variable(s) of your choice. Once you succeed, try to add new datasets to the same recipe. Placeholders for the different components are provided below:
@@ -203,22 +201,22 @@ You can also define different preprocessors with several preprocessor sections (
203
201
>> preprocessor: prep_map
204
202
>> mip: Amon
205
203
>> grid: gn #can change for variables from the same model
206
-
>> start_year: 1970
204
+
>> start_year: 1970
207
205
>> end_year: 2000
208
206
>> scripts: null
209
-
>>
207
+
>>
210
208
>> diag_land_only_plot:
211
209
>> description: #preprocess a variable for a 2D land only plot
212
210
>> variables:
213
211
>> tas: # surface temperature
214
212
>> preprocessor: prep_map_land
215
213
>> mip: Amon
216
214
>> grid: gn #can change for variables from the same model
217
-
>> start_year: 1970
215
+
>> start_year: 1970
218
216
>> end_year: 2000
219
217
>> scripts: null
220
218
>> ~~~
221
-
>> {: .source}
219
+
>>
222
220
> {: .solution}
223
221
{: .challenge}
224
222
@@ -227,20 +225,20 @@ You can also define different preprocessors with several preprocessor sections (
227
225
Sometimes, we may want to include specific datasets only for certain variables. An example is when we use observations for two different variables in a diagnostic. While the CMIP dataset details for the two variables may be common, the observations will likely not be so. It would be useful to know how to include different datasets for different variables. Here is an example of a simple preprocessor and diagnostic setup for that:
> version: 1, tier: 2} #dataset specific to the temperature variable
271
-
>
269
+
>
272
270
> scripts: null
273
271
>~~~
274
-
>{: .source}
272
+
>
275
273
{: .solution}
276
274
277
275
## Creating variable groups
278
276
279
277
Variable grouping can be used to preprocess different clusters of data for the same variable. For instance, the example below illustrates how we can compute separate multimodel means for CMIP5 and CMIP6 data given the same variable. Additionally we can also preprocess observed data for evaluation.
280
278
281
279
> ## Example
282
-
>~~~YAML
280
+
>~~~yaml
283
281
>
284
282
>preprocessors:
285
283
> prep_mmm:
@@ -300,8 +298,8 @@ Variable grouping can be used to preprocess different clusters of data for the s
300
298
>
301
299
># note that there is no field called datasets anymore
302
300
># note how multiple ensembles are added by using (1:4)
> additional_datasets: *cmip6_datasets #nothing changes from cmip5 except the data set
335
333
> scripts: null
336
334
>~~~
337
-
>{: .source}
335
+
>
338
336
{: .solution}
339
337
340
338
You should be able to see the variables grouped under different subdirectories under your output preproc directory. The different groupings can be accessed in your diagnostic by selecting the key name of the field variable_group such as tas_cmip5, tas_cmip6 or tas_obs.
@@ -347,4 +345,3 @@ You should be able to see the variables grouped under different subdirectories u
347
345
>
348
346
> A full list of all CMIP named variables is available here: [http://clipc-services.ceda.ac.uk/dreq/index/CMORvar.html](http://clipc-services.ceda.ac.uk/dreq/index/CMORvar.html).
0 commit comments