Skip to content

Commit 330137f

Browse files
committed
Made changes from Birgit's revies, added section on grouping variables, altered timing numbers at the top
1 parent 4b04e61 commit 330137f

File tree

1 file changed

+83
-20
lines changed

1 file changed

+83
-20
lines changed

_episodes/05-preprocessor.md

Lines changed: 83 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,18 @@
11
---
22
title: "Working with preprocessors"
3-
teaching: 20?
4-
exercises: 15?
3+
teaching: 20
4+
exercises: 20
55
questions:
66
- "How do I set up a preprocessor?"
77
- "Can I use different preprocessors for different variables?"
88
- "Can I use different datasets for different variables?"
9+
- "How can I combine different preprocessor functions?"
910
objectives:
1011
- "Create a recipe with multiple preprocessors"
1112
- "Use different preprocessors for different variables"
1213
- "Run a recipe with variables from different datasets"
1314
keypoints:
14-
- "A recipe can run different preprocessors at the same time."
15+
- "A recipe can work with different preprocessors at the same time."
1516
- "The setting additional_datasets can be used to add a different dataset."
1617
- "Variable groups are useful for defining different settings for different variables."
1718
---
@@ -38,7 +39,7 @@ Each preprocessor section includes a preprocessor name, a list of preprocessor s
3839
3940
For instance, the 'annual_statistics' with the 'operation: mean' argument preprocessor receives an iris cube, takes the annual average for each year of data in the cube, and returns the processed cube.
4041
41-
You could use several preprocessor steps listed in the [documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/recipe/preprocessor.html). The standardised interface between the preprocessors allows them to be used modularly - like lego blocks. Almost any conceivable preprocessing order of operations can be performed using ESMValTool preprocessors.
42+
You may use one or more of several preprocessors listed in the [documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/recipe/preprocessor.html). The standardised interface between the preprocessors allows them to be used modularly - like lego blocks. Almost any conceivable preprocessing order of operations can be performed using ESMValTool preprocessors.
4243
4344
> ## The 'custom order' command.
4445
>
@@ -50,16 +51,16 @@ You could use several preprocessor steps listed in the [documentation](https://d
5051
>
5152
>The default preprocessor order is listed in the [ESMValCore preprocessor API page](>https://docs.esmvaltool.org/projects/ESMValCore/en/latest/api/esmvalcore.preprocessor.html).
5253
>
53-
> Also note that not preprocessor operations aren't always commutative - meaning that the order of operations matters. For instance, if you run the preprocessor 'extract_volume' to extract the top 100m of the ocean surface, then 'volume_statistics' to calculate the volume-weighted mean of the data, then your result will differ depending on the order of these two preprocessors. In fact, the 'extract_volume' preprocessor will fail if you try to run it on a 2D dataset.
54+
> Also note that preprocessor operations aren't always commutative - meaning that the order of operations matters. For instance, when you run the two preprocessors -- 'extract_volume' to extract the top 100m of the ocean surface and 'volume_statistics' to calculate the volume-weighted mean of the data, your result will differ depending on the order of these two preprocessors. In fact, the 'extract_volume' preprocessor will fail if you try to run it on a 2D dataset.
5455
>
5556
> Changing the order of preprocessors can also speed up your processing. For instance, if you want to extract a two-dimensional layer from a 3D field and re-grid it, the layer extraction should be done first. If you did it the other way around, then the regridding function would be applied to all the layers of your 3D cube and it would take much more time.
5657
{: .callout}
5758
58-
Some preprocessor stages are always applied and do not need to be called. This includes the preprocessors that load the data, apply the fixes, and save the data file afterwards. They do not need to be explicitly included in recipes.
59+
Some preprocessor modeules are always applied and do not need to be called. This includes the preprocessors that load the data, apply any fixes and save the data file afterwards. These do not need to be explicitly included in recipes.
5960
6061
> ## Exercise: Adding more preprocessor steps
6162
>
62-
> Edit the [example recipe](LINK to episode #4) to change the variable in ?thetao? and to add preprocessors which take the average over the latitude and longitude dimensions and the average over the depth. And then run the recipe.
63+
> Edit the [example recipe](LINK to episode #4) to first change the variable thetao, then add preprocessors to average over the latitude and longitude dimensions and finally average over the depth. Now run the recipe.
6364
>
6465
>> ## Solution
6566
>>
@@ -78,7 +79,7 @@ Some preprocessor stages are always applied and do not need to be called. This i
7879
7980
## Using different preprocessors for different variables
8081
81-
You could define different preprocessors with several preprocessor sections (setting different preprocessor names). In the variable section you call the specific preprocessor which should be applied.
82+
You can also define different preprocessors with several preprocessor sections (setting different preprocessor names). In the variable section you call the specific preprocessor which should be applied.
8283
8384
> ## Example
8485
>~~~YAML
@@ -91,7 +92,7 @@ You could define different preprocessors with several preprocessor sections (set
9192
> operator: mean
9293
> area_statistics:
9394
> operator: mean
94-
> Depth_integration:
95+
> depth_integration:
9596
> ---
9697
> diagnostics:
9798
> # --------------------------------------------------
@@ -114,14 +115,14 @@ You could define different preprocessors with several preprocessor sections (set
114115
> short_name: thetao
115116
> preprocessor: prep_timeseries_2
116117
> scripts:
117-
> timeseries_diag:
118+
> timeseries_diag:
118119
> script: ocean/diagnostic_timeseries.py
119120
>~~~
120121
>{: .source}
121122
{: .solution}
122123
123124
>## Challenge : How to write a recipe with multiple preprocessors
124-
> We now know that a recipe can have more than one diagnostic, variable or preprocessor. As we saw in the examples so far, we can group preprocessors with a single user defined name and can have more than one such preprocessor group in the recipe as well. Write two different preprocessors - one to regrid the data to a 1x1 resolution and the second preprocessor to mask out sea and ice grid cells before regridding to the saem resolution. In the second case, ensure you perform the masking first before regridding (hint: custom order your operations). Now, use the two preprocessors in different diagnostics within the same recipe. You may use any variable(s) of your choice. Once you succeed, try to add new datasets to the same recipe. Placeholders for the different components are provided below:
125+
> We now know that a recipe can have more than one diagnostic, variable or preprocessor. As we saw in the examples so far, we can group preprocessors with a single user defined name and can have more than one such preprocessor group in the recipe as well. Write two different preprocessors - one to regrid the data to a 1x1 resolution and the second preprocessor to mask out sea and ice grid cells before regridding to the same resolution. In the second case, ensure you perform the masking first before regridding (hint: custom order your operations). Now, use the two preprocessors in different diagnostics within the same recipe. You may use any variable(s) of your choice. Once you succeed, try to add new datasets to the same recipe. Placeholders for the different components are provided below:
125126
>
126127
>> ## Recipe
127128
>>
@@ -168,8 +169,7 @@ You could define different preprocessors with several preprocessor sections (set
168169
>
169170
>> ## Solution:
170171
>>
171-
>> Here is one possible way to use two different preprocessors including a
172-
>> group of preprocessors on different variables.
172+
>> Here is one solution to complete the challenge above using two different preprocessors
173173
>>
174174
>>~~~YAML
175175
>>
@@ -184,7 +184,7 @@ You could define different preprocessors with several preprocessor sections (set
184184
>> scheme: linear #how to interpolate for regridding
185185
>>
186186
>> prep_map_land:
187-
>> custom_order: true #ensure order follows a given
187+
>> custom_order: true #ensure that given order of preprocessing is followed
188188
>> mask_landsea: #apply a mask
189189
>> mask_out: sea #mask out sea grid cells
190190
>> regrid: # now apply the preprocessor to regrid
@@ -205,7 +205,6 @@ You could define different preprocessors with several preprocessor sections (set
205205
>> grid: gn #can change for variables from the same model
206206
>> start_year: 1970
207207
>> end_year: 2000
208-
>> additional_datasets:
209208
>> scripts: null
210209
>>
211210
>> diag_land_only_plot:
@@ -217,7 +216,6 @@ You could define different preprocessors with several preprocessor sections (set
217216
>> grid: gn #can change for variables from the same model
218217
>> start_year: 1970
219218
>> end_year: 2000
220-
>> additional_datasets:
221219
>> scripts: null
222220
>> ~~~
223221
>> {: .source}
@@ -226,7 +224,7 @@ You could define different preprocessors with several preprocessor sections (set
226224
227225
## Adding different datasets for different variables
228226
229-
Sometimes, we may want to include specific datasets for certain variables. An example is when we use observations for two different variables in a diagnostic. While the CMIP dataset details for the two variables may be common, the observations will likely not be so. It would be useful to know how to include different datasets for different variables. Here is an example of a simple preprocessor and diagnostic setup for that:
227+
Sometimes, we may want to include specific datasets only for certain variables. An example is when we use observations for two different variables in a diagnostic. While the CMIP dataset details for the two variables may be common, the observations will likely not be so. It would be useful to know how to include different datasets for different variables. Here is an example of a simple preprocessor and diagnostic setup for that:
230228
231229
> ## Example
232230
>~~~YAML
@@ -254,7 +252,7 @@ Sometimes, we may want to include specific datasets for certain variables. An ex
254252
> mip: Amon
255253
> grid: gn #can change for variables from the same model
256254
> start_year: 1970
257-
> end_year: 2000 # start and end years for a30 year period,
255+
> end_year: 2000 # start and end years for a 30 year period,
258256
> # we assume this is common and exists for all
259257
> # model and obs data
260258
> additional_datasets:
@@ -269,16 +267,81 @@ Sometimes, we may want to include specific datasets for certain variables. An ex
269267
> end_year: 2000
270268
> additional_datasets:
271269
> - {dataset: HadCRUT4, project: OBS, type: ground,
272-
> version: 1, tier: 2} #dataset specific to the temperature var
270+
> version: 1, tier: 2} #dataset specific to the temperature variable
273271
>
274272
> scripts: null
275273
>~~~
276274
>{: .source}
277275
{: .solution}
278276
277+
## Creating variable groups
278+
279+
Variable grouping can be used to preprocess different clusters of data for the same variable. For instance, the example below illustrates how we can compute separate multimodel means for CMIP5 and CMIP6 data given the same variable. Additionally we can also preprocess observed data for evaluation.
280+
281+
> ## Example
282+
>~~~YAML
283+
>
284+
>preprocessors:
285+
> prep_mmm:
286+
> custom_order: true
287+
> regrid:
288+
> target_grid: 2.5 x 2.5
289+
> scheme: linear
290+
> multi_model_statistics:
291+
> span: full
292+
> statistics: [mean, median]
293+
>
294+
> prep_obs:
295+
> mask_landsea:
296+
> mask_out: sea
297+
> regrid:
298+
> target_grid: 2.5 x 2.5
299+
> scheme: linear
300+
>
301+
> #note that there is no field called datasets anymore
302+
> #note how multiple ensembles are added by using (1:4)
303+
>cmip5_datasets: &cmip5_datasets
304+
> - {dataset: CanESM2, ensemble: "r(1:4)i1p1", project: CMIP5}
305+
> - {dataset: MPI-ESM-LR, ensemble: "r(1:2)i1p1", project: CMIP5}
306+
>
307+
>cmip6_datasets: &cmip6_datasets
308+
> - {dataset: UKESM1-0-LL, ensemble: "r(1:4)i1p1f2", grid: gn, project: CMIP6}
309+
> - {dataset: CanESM5, ensemble: "r(1:4)i1p2f1", grid: gn, project: CMIP6}
310+
>
311+
>diagnostics:
312+
>
313+
> diag_variable_groups:
314+
> description: Demonstrate the use of variable groups.
315+
> variables:
316+
> tas_cmip5: &variable_settings # need a key name for the grouping
317+
> short_name: tas # specify variable to look for
318+
> preprocessor: prep_mmm
319+
> mip: Amon
320+
> exp: historical
321+
> start_year: 2000
322+
> end_year: 2005
323+
> tag: TAS_CMIP5 #tag is optional if you are using these settings just once
324+
> additional_datasets: *cmip5_datasets
325+
> tas_obs:
326+
> <<: *variable_settings
327+
> preprocessor: prep_obs
328+
> tag: TAS_OBS
329+
> additional_datasets:
330+
> - {dataset: HadCRUT4, project: OBS, type: ground, version: 1, tier: 2}
331+
> tas_cmip6:
332+
> <<: *variable_settings
333+
> tag: TAS_CMIP6
334+
> additional_datasets: *cmip6_datasets #nothing changes from cmip5 except the data set
335+
> scripts: null
336+
>~~~
337+
>{: .source}
338+
{: .solution}
339+
340+
You should be able to see the variables grouped under different subdirectories under your output preproc directory. The different groupings can be accessed in your diagnostic by selecting the key name of the field variable_group such as tas_cmip5, tas_cmip6 or tas_obs.
341+
>
279342
> ## How to find what CMIP data is available?
280343
>
281-
> [CMIP5](https://pcmdi.llnl.gov/mips/cmip5/index.html) and [CMIP6](https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html) data obey the [CF-conventions](http://cfconventions.org/). Available variables could be found under the [CMIP5 data request](https://pcmdi.llnl.gov/mips/cmip5/docs/standard_output.pdf?id=28) and the [CMIP6 Data Request](http://clipc-services.ceda.ac.uk/dreq/index.html).
344+
> [CMIP5](https://pcmdi.llnl.gov/mips/cmip5/index.html) and [CMIP6](https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html) data obey the [CF-conventions](http://cfconventions.org/). Available variables can be found under the [CMIP5 data request](https://pcmdi.llnl.gov/mips/cmip5/docs/standard_output.pdf?id=28) and the [CMIP6 Data Request](http://clipc-services.ceda.ac.uk/dreq/index.html).
282345
>
283346
> CMIP data is widely available via the Earth System Grid Federation ([ESGF](https://esgf.llnl.gov/)) and is accessible to users either via download from the ESGF portal or through the ESGF data nodes hosted by large computing facilities (like [CEDA-Jasmin](https://esgf-index1.ceda.ac.uk/), [DKRZ](https://esgf-data.dkrz.de/), etc). The ESGF also hosts observations for Model Intercomparison Projects (obs4MIPs) and reanalyses data (ana4MIPs).
284347
>

0 commit comments

Comments
 (0)