Skip to content

Commit 3717977

Browse files
committed
Render yaml code blocks with syntax highlighting
1 parent 2c28577 commit 3717977

File tree

1 file changed

+56
-59
lines changed

1 file changed

+56
-59
lines changed

_episodes/05-preprocessor.md

Lines changed: 56 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -29,17 +29,16 @@ Underneath the hood, each preprocessor is a modular python function that receive
2929

3030
Each preprocessor section includes a preprocessor name, a list of preprocessor steps to be executed and any arguments needed by the preprocessor steps.
3131

32-
>~~~YAML
33-
> preprocessors:
34-
> prep_timeseries:
35-
> annual_statistics:
36-
> operator: mean
37-
>~~~
38-
{: .source}
32+
~~~yaml
33+
preprocessors:
34+
prep_timeseries:
35+
annual_statistics:
36+
operator: mean
37+
~~~
3938

40-
For instance, the 'annual_statistics' with the 'operation: mean' argument preprocessor receives an iris cube, takes the annual average for each year of data in the cube, and returns the processed cube.
39+
For instance, the 'annual_statistics' with the 'operation: mean' argument preprocessor receives an iris cube, takes the annual average for each year of data in the cube, and returns the processed cube.
4140

42-
You may use one or more of several preprocessors listed in the [documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/recipe/preprocessor.html). The standardised interface between the preprocessors allows them to be used modularly - like lego blocks. Almost any conceivable preprocessing order of operations can be performed using ESMValTool preprocessors.
41+
You may use one or more of several preprocessors listed in the [documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/recipe/preprocessor.html). The standardised interface between the preprocessors allows them to be used modularly - like lego blocks. Almost any conceivable preprocessing order of operations can be performed using ESMValTool preprocessors.
4342

4443
> ## The 'custom order' command.
4544
>
@@ -56,15 +55,15 @@ You may use one or more of several preprocessors listed in the [documentation](h
5655
> Changing the order of preprocessors can also speed up your processing. For instance, if you want to extract a two-dimensional layer from a 3D field and re-grid it, the layer extraction should be done first. If you did it the other way around, then the regridding function would be applied to all the layers of your 3D cube and it would take much more time.
5756
{: .callout}
5857

59-
Some preprocessor modules are always applied and do not need to be called. This includes the preprocessors that load the data, apply any fixes and save the data file afterwards. These do not need to be explicitly included in recipes.
58+
Some preprocessor modules are always applied and do not need to be called. This includes the preprocessors that load the data, apply any fixes and save the data file afterwards. These do not need to be explicitly included in recipes.
6059

6160
> ## Exercise: Adding more preprocessor steps
6261
>
6362
> Edit the [example recipe](LINK to episode #4) to first change the variable thetao, then add preprocessors to average over the latitude and longitude dimensions and finally average over the depth. Now run the recipe.
6463
>
6564
>> ## Solution
66-
>>
67-
>>~~~YAML
65+
>>
66+
>>~~~yaml
6867
>> preprocessors:
6968
>> prep_timeseries:
7069
>> annual_statistics:
@@ -73,7 +72,7 @@ Some preprocessor modules are always applied and do not need to be called. This
7372
>> operator: mean
7473
>> depth_integration:
7574
>>~~~
76-
>>{: .source}
75+
>>
7776
>{: .solution}
7877
{: .challenge}
7978

@@ -82,7 +81,7 @@ Some preprocessor modules are always applied and do not need to be called. This
8281
You can also define different preprocessors with several preprocessor sections (setting different preprocessor names). In the variable section you call the specific preprocessor which should be applied.
8382

8483
> ## Example
85-
>~~~YAML
84+
>~~~yaml
8685
> preprocessors:
8786
> prep_timeseries_1:
8887
> annual_statistics:
@@ -95,7 +94,7 @@ You can also define different preprocessors with several preprocessor sections (
9594
> depth_integration:
9695
> ---
9796
> diagnostics:
98-
> # --------------------------------------------------
97+
> # --------------------------------------------------
9998
> # Time series diagnostics
10099
> # --------------------------------------------------
101100
> diag_timeseries_temperature_1:
@@ -105,9 +104,9 @@ You can also define different preprocessors with several preprocessor sections (
105104
> short_name: thetaoga
106105
> preprocessor: prep_timeseries_1
107106
> scripts:
108-
> timeseries_diag:
107+
> timeseries_diag:
109108
> script: ocean/diagnostic_timeseries.py
110-
>
109+
>
111110
> diag_timeseries_temperature_2:
112111
> description: simple_time_series
113112
> variables:
@@ -118,29 +117,28 @@ You can also define different preprocessors with several preprocessor sections (
118117
> timeseries_diag:
119118
> script: ocean/diagnostic_timeseries.py
120119
>~~~
121-
>{: .source}
122120
{: .solution}
123121

124122
>## Challenge : How to write a recipe with multiple preprocessors
125123
> We now know that a recipe can have more than one diagnostic, variable or preprocessor. As we saw in the examples so far, we can group preprocessors with a single user defined name and can have more than one such preprocessor group in the recipe as well. Write two different preprocessors - one to regrid the data to a 1x1 resolution and the second preprocessor to mask out sea and ice grid cells before regridding to the same resolution. In the second case, ensure you perform the masking first before regridding (hint: custom order your operations). Now, use the two preprocessors in different diagnostics within the same recipe. You may use any variable(s) of your choice. Once you succeed, try to add new datasets to the same recipe. Placeholders for the different components are provided below:
126124
>
127125
>> ## Recipe
128126
>>
129-
>>~~~YAML
127+
>>~~~yaml
130128
>>
131129
>> datasets:
132130
>> - {dataset: UKESM1-0-LL, project: CMIP6, exp: historical,
133131
>> ensemble: r1i1p1f2} #single dataset as an example
134-
>>
132+
>>
135133
>> preprocessors:
136134
>> prep_map: #preprocessor to just regrid data
137135
>> #fill preprocessor details here
138-
>>
136+
>>
139137
>> prep_map_land: #preprocessor to mask grid cells and then regrid
140138
>> #fill preprocessor details here including ordering
141-
>>
139+
>>
142140
>> diagnostics:
143-
>> # --------------------------------------------------
141+
>> # --------------------------------------------------
144142
>> # Two Simple diagnostics that illustrate the use of
145143
>> # different preprocessors
146144
>> # --------------------------------------------------
@@ -149,50 +147,50 @@ You can also define different preprocessors with several preprocessor sections (
149147
>> variables:
150148
>> # put your variable of choice here
151149
>> # apply the first preprocessor i.e. name your preprocessor
152-
>> # edit the following 4 lines for mip, grid and time
150+
>> # edit the following 4 lines for mip, grid and time
153151
>> # based on your variable choice
154152
>> mip: Amon
155153
>> grid: gn #can change for variables from the same model
156-
>> start_year: 1970
154+
>> start_year: 1970
157155
>> end_year: 2000
158156
>> scripts: null #no scripts called
159157
>> diag_land_only_plot: #second diagnostic
160158
>> description: #preprocess a variable for a 2D land only plot
161159
>> variables:
162-
>> # include a variable and information
163-
>> # as in the previous diagnostic and
160+
>> # include a variable and information
161+
>> # as in the previous diagnostic and
164162
>> # include your second preprocessor (masking and regridding)
165163
>> scripts: null # no scripts
166164
>>~~~
167-
>>{: .source}
165+
>>
168166
>{: .solution}
169167
>
170-
>> ## Solution:
171-
>>
168+
>> ## Solution:
169+
>>
172170
>> Here is one solution to complete the challenge above using two different preprocessors
173-
>>
174-
>>~~~YAML
171+
>>
172+
>>~~~yaml
175173
>>
176174
>> datasets:
177175
>> - {dataset: UKESM1-0-LL, project: CMIP6, exp: historical,
178176
>> ensemble: r1i1p1f2} #single dataset as an example
179-
>>
177+
>>
180178
>> preprocessors:
181179
>> prep_map:
182180
>> regrid: #apply the preprocessor to regrid
183181
>> target_grid: 1x1 # target resolution
184182
>> scheme: linear #how to interpolate for regridding
185-
>>
183+
>>
186184
>> prep_map_land:
187185
>> custom_order: true #ensure that given order of preprocessing is followed
188186
>> mask_landsea: #apply a mask
189187
>> mask_out: sea #mask out sea grid cells
190188
>> regrid: # now apply the preprocessor to regrid
191189
>> target_grid: 1x1 # target resolution
192190
>> scheme: linear #how to interpolate for regridding
193-
>>
191+
>>
194192
>> diagnostics:
195-
>> # --------------------------------------------------
193+
>> # --------------------------------------------------
196194
>> # Two Simple diagnostics that illustrate the use of
197195
>> # different preprocessors
198196
>> # --------------------------------------------------
@@ -203,22 +201,22 @@ You can also define different preprocessors with several preprocessor sections (
203201
>> preprocessor: prep_map
204202
>> mip: Amon
205203
>> grid: gn #can change for variables from the same model
206-
>> start_year: 1970
204+
>> start_year: 1970
207205
>> end_year: 2000
208206
>> scripts: null
209-
>>
207+
>>
210208
>> diag_land_only_plot:
211209
>> description: #preprocess a variable for a 2D land only plot
212210
>> variables:
213211
>> tas: # surface temperature
214212
>> preprocessor: prep_map_land
215213
>> mip: Amon
216214
>> grid: gn #can change for variables from the same model
217-
>> start_year: 1970
215+
>> start_year: 1970
218216
>> end_year: 2000
219217
>> scripts: null
220218
>> ~~~
221-
>> {: .source}
219+
>>
222220
> {: .solution}
223221
{: .challenge}
224222

@@ -227,20 +225,20 @@ You can also define different preprocessors with several preprocessor sections (
227225
Sometimes, we may want to include specific datasets only for certain variables. An example is when we use observations for two different variables in a diagnostic. While the CMIP dataset details for the two variables may be common, the observations will likely not be so. It would be useful to know how to include different datasets for different variables. Here is an example of a simple preprocessor and diagnostic setup for that:
228226

229227
> ## Example
230-
>~~~YAML
228+
>~~~yaml
231229
>
232230
> datasets:
233-
> - {dataset: UKESM1-0-LL, project: CMIP6, exp: historical,
231+
> - {dataset: UKESM1-0-LL, project: CMIP6, exp: historical,
234232
> ensemble: r1i1p1f2} #common to both variables discussed below
235-
>
233+
>
236234
> preprocessors:
237235
> prep_regrid: # regrid to get all data to the same resolution
238236
> regrid: #apply the preprocessor to regrid
239237
> target_grid: 2.5x2.5 # target resolution
240238
> scheme: linear #how to interpolate for regridding
241-
>
239+
>
242240
> diagnostics:
243-
> # --------------------------------------------------
241+
> # --------------------------------------------------
244242
> # Simple diagnostic to illustrate use of different
245243
> # datasets for different variables
246244
> # --------------------------------------------------
@@ -251,35 +249,35 @@ Sometimes, we may want to include specific datasets only for certain variables.
251249
> preprocessor: prep_regrid
252250
> mip: Amon
253251
> grid: gn #can change for variables from the same model
254-
> start_year: 1970
255-
> end_year: 2000 # start and end years for a 30 year period,
252+
> start_year: 1970
253+
> end_year: 2000 # start and end years for a 30 year period,
256254
> # we assume this is common and exists for all
257-
> # model and obs data
255+
> # model and obs data
258256
> additional_datasets:
259-
> - {dataset: GPCP-SG, project: obs4mips, level: L3,
257+
> - {dataset: GPCP-SG, project: obs4mips, level: L3,
260258
> version: v2.2, tier: 1} #dataset specific to this variable
261-
>
259+
>
262260
> tas: #second variable is surface temperature
263261
> preprocessor: prep_regrid
264262
> mip: Amon
265263
> grid: gn #can change for variables from the same model
266264
> start_year: 1970 #some 30 year period
267265
> end_year: 2000
268266
> additional_datasets:
269-
> - {dataset: HadCRUT4, project: OBS, type: ground,
267+
> - {dataset: HadCRUT4, project: OBS, type: ground,
270268
> version: 1, tier: 2} #dataset specific to the temperature variable
271-
>
269+
>
272270
> scripts: null
273271
>~~~
274-
>{: .source}
272+
>
275273
{: .solution}
276274

277275
## Creating variable groups
278276

279277
Variable grouping can be used to preprocess different clusters of data for the same variable. For instance, the example below illustrates how we can compute separate multimodel means for CMIP5 and CMIP6 data given the same variable. Additionally we can also preprocess observed data for evaluation.
280278

281279
> ## Example
282-
>~~~YAML
280+
>~~~yaml
283281
>
284282
>preprocessors:
285283
> prep_mmm:
@@ -300,8 +298,8 @@ Variable grouping can be used to preprocess different clusters of data for the s
300298
>
301299
># note that there is no field called datasets anymore
302300
># note how multiple ensembles are added by using (1:4)
303-
>cmip5_datasets: &cmip5_datasets
304-
> - {dataset: CanESM2, ensemble: "r(1:4)i1p1", project: CMIP5}
301+
>cmip5_datasets: &cmip5_datasets
302+
> - {dataset: CanESM2, ensemble: "r(1:4)i1p1", project: CMIP5}
305303
> - {dataset: MPI-ESM-LR, ensemble: "r(1:2)i1p1", project: CMIP5}
306304
>
307305
>cmip6_datasets: &cmip6_datasets
@@ -330,11 +328,11 @@ Variable grouping can be used to preprocess different clusters of data for the s
330328
> - {dataset: HadCRUT4, project: OBS, type: ground, version: 1, tier: 2}
331329
> tas_cmip6:
332330
> <<: *variable_settings
333-
> tag: TAS_CMIP6
331+
> tag: TAS_CMIP6
334332
> additional_datasets: *cmip6_datasets #nothing changes from cmip5 except the data set
335333
> scripts: null
336334
>~~~
337-
>{: .source}
335+
>
338336
{: .solution}
339337

340338
You should be able to see the variables grouped under different subdirectories under your output preproc directory. The different groupings can be accessed in your diagnostic by selecting the key name of the field variable_group such as tas_cmip5, tas_cmip6 or tas_obs.
@@ -347,4 +345,3 @@ You should be able to see the variables grouped under different subdirectories u
347345
>
348346
> A full list of all CMIP named variables is available here: [http://clipc-services.ceda.ac.uk/dreq/index/CMORvar.html](http://clipc-services.ceda.ac.uk/dreq/index/CMORvar.html).
349347
{: .callout}
350-

0 commit comments

Comments
 (0)