Skip to content

Commit e5f9856

Browse files
authored
Merge pull request #151 from ESMValGroup/improve_configuration
Add more exercises to Configuration episode
2 parents 57616e6 + e645581 commit e5f9856

File tree

1 file changed

+191
-126
lines changed

1 file changed

+191
-126
lines changed

_episodes/03-configuration.md

Lines changed: 191 additions & 126 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,9 @@ objectives:
1212

1313
keypoints:
1414
- The ``config-user.yml`` tells ESMValTool where to find input data.
15-
- "``rootpath`` defines the root directory for the input data."
1615
- "``output_dir`` defines the destination directory."
16+
- "``rootpath`` defines the root path of the data."
17+
- "``drs`` defines the directory structure of the data."
1718

1819
---
1920

@@ -22,10 +23,8 @@ keypoints:
2223
The ``config-user.yml`` configuration file contains all the global level
2324
information needed by ESMValTool to run.
2425
This is a [YAML file](https://yaml.org/spec/1.2/spec.html).
25-
An example configuration file can be found in the ESMValCore repository:
26-
[config-user-example.yml](https://github.com/ESMValGroup/ESMValCore/blob/master/esmvalcore/config-user.yml).
2726

28-
You can generate the default configuration file by running:
27+
You can get the default configuration file by running:
2928

3029
~~~bash
3130
esmvaltool config get_config_user
@@ -36,7 +35,8 @@ path to your home directory. Note that files and directories starting with a
3635
period are "hidden", to see the `.esmvaltool` directory in the terminal use
3736
`ls -la ~`.
3837

39-
We run a text editor called ``nano`` to have a look inside the configuration file:
38+
We run a text editor called ``nano`` to have a look inside the configuration file
39+
and then modify it if needed:
4040

4141
~~~bash
4242
nano ~/.esmvaltool/config-user.yml
@@ -63,42 +63,29 @@ This file contains the information for:
6363

6464
## Output settings
6565

66-
These settings are used to inform ESMValTool about your preference about
67-
specific actions. You can turn on or off the setting by ``true`` or ``false``
68-
values. Most of these settings are fairly self-explanatory, ie:
66+
The configuration file starts with output settings that
67+
inform ESMValTool about your preference for output.
68+
You can turn on or off the setting by ``true`` or ``false``
69+
values. Most of these settings are fairly self-explanatory.
70+
For example, `write_plots: true` means that diagnostics create plots.
6971

70-
```yaml
71-
# Diagnostics create plots? [true]/false
72-
write_plots: true
73-
# Diagnositcs write NetCDF files? [true]/false
74-
write_netcdf: true
75-
# Set the console log level debug, [info], warning, error
76-
log_level: info
77-
# Exit on warning (only for NCL diagnostic scripts)? true/[false]
78-
exit_on_warning: false
79-
# Plot file format? [png]/pdf/ps/eps/epsi
80-
output_file_type: png
81-
82-
...
83-
84-
# Use netCDF compression true/[false]
85-
compress_netcdf: false
86-
# Save intermediary cubes in the preprocessor true/[false]
87-
save_intermediary_cubes: false
88-
# Remove the preproc dir if all fine
89-
remove_preproc_dir: true
90-
91-
...
92-
93-
# Path to custom config-developer file, to customise project configurations.
94-
# See config-developer.yml for an example. Set to [null] to use the default
95-
config_developer_file: null
96-
# Get profiling information for diagnostics
97-
# Only available for Python diagnostics
98-
profile_diagnostic: false
99-
```
100-
101-
In general there is no need to change the settings listed above.
72+
> ## Saving preprocessed data
73+
>
74+
> Later in this tutorial, we will want to look at the contents of the `preproc` folder.
75+
> This folder contains preprocessed data and is removed by default when ESMValTool is run.
76+
> In the configuration file, which settings can be modified to prevent this from happening?
77+
>
78+
>> ## Solution
79+
>>
80+
>> If the option ``remove_preproc_dir`` is set to ``false``, then the
81+
>> ``preproc/`` directory contains all the pre-processed data and the
82+
>> metadata interface files.
83+
>> If the option ``save_intermediary_cubes`` is set to ``true``
84+
>> then data will also be saved after each preprocessor step in the folder
85+
>> ``preproc``. Note that saving all intermediate results to file will result
86+
>> in a considerable slowdown, and can quickly fill your disk.
87+
> {: .solution}
88+
{: .challenge}
10289

10390
## Destination directory
10491

@@ -108,12 +95,6 @@ a new output folder determined by recipe name, and date and time using
10895
the format: YYYYMMDD_HHMMSS.
10996
This folder contains four further subfolders: ``plots``, ``preproc``, ``run``, ``work``.
11097

111-
Let's name our destination directory ``esmvaltool_output`` in the working directory:
112-
113-
```yaml
114-
output_dir: ./esmvaltool_output
115-
```
116-
11798
> ## Content of subfolders
11899
>
119100
> - ``plots``: the location for all plots, split by individual diagnostics and fields.
@@ -130,40 +111,22 @@ are not plots, e.g. files in NetCDF format (depends on the diagnostic script).
130111
[lesson]({{ page.root }}{% link _episodes/04-recipe.md %})
131112
{: .callout}
132113

133-
## Auxiliary data directory
134-
135-
The ``auxiliary_data_dir`` setting is the path where any required additional
136-
auxiliary data files are stored. This location allows us to tell the diagnostic
137-
script where to find the files if they can not be downloaded at runtime. This
138-
option should not be used for model or observational datasets, but for data
139-
files (e.g. shape files) used in plotting such as coastline descriptions and so
140-
on.
141-
142-
```yaml
143-
auxiliary_data_dir: ~/auxiliary_data
144-
```
145-
146-
## Number of parallel tasks
147-
148-
This option enables you to perform parallel processing.
149-
You can choose the number of tasks in parallel as
150-
1/2/3/4/... or you can set it to ``null``. That tells
151-
ESMValTool to use the maximum number of available CPUs:
152-
153-
```yaml
154-
155-
max_parallel_tasks: null
156-
```
157-
158-
> ## Set the number of tasks
114+
> ## Set the destination directory
159115
>
160-
> If you run out of memory, try setting ``max_parallel_tasks`` to 1.
161-
Then, check the amount of memory you need for that by inspecting
162-
the file ``run/resource_usage.txt`` in the output directory.
163-
Using the number there you can increase the number of parallel tasks
164-
again to a reasonable number for the amount of memory available in your system.
165-
{: .callout}
166-
116+
> Let's name our destination directory ``esmvaltool_output`` in the working directory.
117+
> ESMValTool should write the output to this path.
118+
> How to modify the `config-user.yml`?
119+
>
120+
>> ## Solution
121+
>>
122+
>> We use `output_dir` entry in the `config-user.yml` file as:
123+
>>```yaml
124+
>> output_dir: ./esmvaltool_output
125+
>>```
126+
>>
127+
>> If the `esmvaltool_output` does not exist, ESMValTool will generate it for you.
128+
> {: .solution}
129+
{: .challenge}
167130
168131
## Rootpath to input data
169132
@@ -175,6 +138,7 @@ We can find more information about the projects in the ESMValTool
175138
[documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/quickstart/find_data.html).
176139
The ``rootpath`` specifies the directories where ESMValTool will look for input data.
177140
For each category, you can define either one path or several paths as a list.
141+
For example:
178142
179143
```yaml
180144
rootpath:
@@ -184,71 +148,172 @@ rootpath:
184148
default: ~/default_inputpath
185149
CORDEX: ~/default_inputpath
186150
```
187-
Site-specific entries for Jasmin, DKRZ and ETHZ are listed at the end of the
188-
example configuration file.
189151
190-
In this lesson, we will work with data from
191-
[CMIP5](https://esgf-node.llnl.gov/projects/cmip5/)
192-
and [obs4mips](https://esgf-node.llnl.gov/projects/obs4mips/).
193-
We add the root path of the folder where our/your data is available.
194-
195-
```yaml
196-
rootpath:
197-
...
198-
CMIP5: [~/cmip5_inputpath1, ~/cmip5_inputpath2, ~/esmvaltool_tutorial/data]
199-
obs4mips: ~/esmvaltool_tutorial/data
200-
```
152+
Site-specific entries for Jasmin and DKRZ are listed at the end of the
153+
example configuration file.
201154
202-
> ## Setting the correct rootpath
155+
> ## Set the correct rootpath
203156
>
204-
> - To get the data (or its correct rootpath), check instruction in
205-
> [Setup]({{ page.root }}{% link setup.md %}).
206-
> - For more information about setting the rootpath, see also the ESMValTool
207-
> [documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/esmvalcore/datafinder.html).
208-
{: .callout}
157+
> In this tutorial, we will work with data from
158+
> [CMIP5](https://esgf-node.llnl.gov/projects/cmip5/)
159+
> and [obs4mips](https://esgf-node.llnl.gov/projects/obs4mips/).
160+
> How can we moodify the `rootpath` to make sure the data path is set correctly
161+
> for both CMIP5 and obs4mips.
162+
>
163+
> Note:
164+
> to get the data, check instruction in
165+
> [Setup]({{ page.root }}{% link setup.md %}).
166+
>
167+
>> ## Solution
168+
>>
169+
>> - Are you working on your own local machine?
170+
>> You need to add the root path of the folder where the data is available
171+
>> to the `config-user.yml` file as:
172+
>>```yaml
173+
>> rootpath:
174+
>> ...
175+
>> CMIP5: ~/esmvaltool_tutorial/data
176+
>> obs4mips: ~/esmvaltool_tutorial/data
177+
>>```
178+
>>
179+
>> - Are you working with on a computer cluster like Jasmin or DKRZ?
180+
>> Site-specific path to the data are already listed at the end of the
181+
>> `config-user.yml` file. You need to uncomment the related lines.
182+
>> For example, on Jasmin:
183+
>>```yaml
184+
>> # Site-specific entries: Jasmin
185+
>> # Uncomment the lines below to locate data on JASMIN
186+
>> rootpath:
187+
>> # CMIP6: /badc/cmip6/data/CMIP6
188+
>> CMIP5: /badc/cmip5/data/cmip5/output1
189+
>> # CMIP3: /badc/cmip3_drs/data/cmip3/output
190+
>> # OBS: /group_workspaces/jasmin4/esmeval/obsdata-v2
191+
>> # OBS6: /group_workspaces/jasmin4/esmeval/obsdata-v2
192+
>> obs4mips: /group_workspaces/jasmin4/esmeval/obsdata-v2
193+
>> # ana4mips: /group_workspaces/jasmin4/esmeval/obsdata-v2
194+
>> # CORDEX: /badc/cordex/data/CORDEX/output
195+
>>```
196+
>>
197+
>> - For more information about setting the rootpath, see also the ESMValTool
198+
>> [documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/quickstart/find_data.html).
199+
> {: .solution}
200+
{: .challenge}
209201
210202
## Directory structure for the data from different projects
211203
212204
Input data can be from various models, observations and reanalysis data that
213205
adhere to the [CF/CMOR standard](https://cmor.llnl.gov/). The ``drs`` setting
214-
describes the file structure. Let's use ``default`` for ``CMIP5`` in our example
215-
here:
206+
describes the file structure.
216207
217-
```yaml
218-
drs:
219-
CMIP5: default
220-
```
208+
The ``drs`` setting describes the file structure for several projects (e.g.
209+
CMIP6, CMIP5, obs4mips, OBS6, OBS) on several key machines
210+
(e.g. BADC, CP4CDS, DKRZ, ETHZ, SMHI, BSC). For more
211+
information about ``drs``, you can visit the ESMValTool documentation on
212+
[Data Reference Syntax (DRS)](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/quickstart/find_data.html#cmor-drs).
221213
222-
> ## Available drs
214+
> ## Set the correct drs
223215
>
224-
> The ``drs`` setting describes the file structure for several projects (e.g.
225-
> ``CMIP6``, ``CMIP5``, ``obs4mips``, ``OBS6``, ``OBS``) on several key machines
226-
> (e.g. ``BADC``, ``CP4CDS``, ``DKRZ``, ``ETHZ``, ``SMHI``, ``BSC``). For more
227-
> information about ``drs``, you can visit the ESMValTool
228-
> [documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/quickstart/find_data.html#cmor-drs).
229-
{: .callout}
230-
231-
> ## Make your own configuration file
216+
> In this lesson, we will work with data from
217+
> [CMIP5](https://esgf-node.llnl.gov/projects/cmip5/)
218+
> and [obs4mips](https://esgf-node.llnl.gov/projects/obs4mips/).
219+
> How can we set the correct `drs`?
232220
>
233-
> It is possible to have several configuration files with different purposes,
234-
> for example: config-user_formalised_runs.yml, config-user_debugging.yml
235-
{: .callout}
221+
>> ## Solution
222+
>>
223+
>> - Are you working on your own local machine?
224+
>> You need to set the `drs` of the data
225+
>> in the `config-user.yml` file as:
226+
>>```yaml
227+
>> drs:
228+
>> CMIP5: default
229+
>> obs4mips: default
230+
>>```
231+
>>
232+
>> - Are you working with on a computer cluster like Jasmin or DKRZ?
233+
>> Site-specific `drs` of the data are already listed at the end of the
234+
>> `config-user.yml` file. You need to uncomment the related lines.
235+
>> For example, on Jasmin:
236+
>>```yaml
237+
>> # Site-specific entries: Jasmin
238+
>> # Uncomment the lines below to locate data on JASMIN
239+
>> drs:
240+
>> # CMIP6: BADC
241+
>> CMIP5: BADC
242+
>> # CMIP3: BADC
243+
>> # CORDEX: BADC
244+
>> # OBS: BADC
245+
>> # OBS6: BADC
246+
>> obs4mips: BADC
247+
>> # ana4mips: BADC
248+
>>```
249+
>>
250+
> {: .solution}
251+
{: .challenge}
236252
237-
> ## Saving preprocessed data
253+
> ## Explain the default drs (if working on local machine)
254+
>
255+
> 1. In the previous exercise, we set the `drs` of CMIP5 data to `default`.
256+
> Can you explain why?
238257
>
239-
> In the configuration file, which settings are useful to make sure preprocessed
240-
> data is stored when ESMValTool is run?
258+
> 2. Have a look at the directory structure of the data.
259+
> There is the folder `Tier1`. What does it mean?
241260
>
242261
>> ## Solution
243262
>>
244-
> > If the option ``remove_preproc_dir`` is set to ``false``, then the
245-
> > ``preproc/`` directory contains all the pre-processed data and the
246-
> > metadata interface files.
247-
> > If the option ``save_intermediary_cubes`` is set to ``true``
248-
> > then data will also be saved after each preprocessor step in the folder
249-
> > ``preproc``. Note that saving all intermediate results to file will result
250-
> > in a considerable slowdown.
263+
>> 1. `drs: default` is one way to retrieve data from a ROOT directory that has no DRS-like structure.
264+
>> ``default`` indicates that all the files are in a folder without any structure.
265+
>>
266+
>> 2. Observational data are organized in Tiers depending on their level of public availability.
267+
>> Therefore the default directory must be structured accordingly with sub-directories
268+
>> `TierX` e.g. Tier1, Tier2 or Tier3, even when `drs: default`.
269+
>>
251270
> {: .solution}
252271
{: .challenge}
253272
273+
## Other settings
274+
275+
> ## Auxiliary data directory
276+
>
277+
> The ``auxiliary_data_dir`` setting is the path where any required additional
278+
auxiliary data files are stored. This location allows us to tell the diagnostic
279+
script where to find the files if they can not be downloaded at runtime. This
280+
option should not be used for model or observational datasets, but for data
281+
files (e.g. shape files) used in plotting such as coastline descriptions and
282+
if you want to feed some additional data (e.g. shape files) to your recipe.
283+
>
284+
>```yaml
285+
> auxiliary_data_dir: ~/auxiliary_data
286+
> ```
287+
> See more information in ESMValTool
288+
[document](https://docs.esmvaltool.org/projects/ESMValCore/en/latest/quickstart/configure.html?highlight=auxiliary_data#user-configuration-file).
289+
{: .callout}
290+
291+
> ## Number of parallel tasks
292+
>
293+
> This option enables you to perform parallel processing.
294+
You can choose the number of tasks in parallel as
295+
1/2/3/4/... or you can set it to ``null``. That tells
296+
ESMValTool to use the maximum number of available CPUs:
297+
>
298+
>```yaml
299+
> max_parallel_tasks: null
300+
> ```
301+
>
302+
> If you run out of memory, try setting ``max_parallel_tasks`` to 1.
303+
Then, check the amount of memory you need for that by inspecting
304+
the file ``run/resource_usage.txt`` in the output directory.
305+
Using the number there you can increase the number of parallel tasks
306+
again to a reasonable number for the amount of memory available in your system.
307+
{: .callout}
308+
309+
> ## Make your own configuration file
310+
>
311+
> It is possible to have several configuration files with different purposes,
312+
> for example: config-user_formalised_runs.yml, config-user_debugging.yml.
313+
> In this case, you have to pass the path of your own configuration file
314+
> as a command-line option when running the ESMValTool.
315+
> We will learn how to do this in the
316+
> [next lesson]({{ page.root }}{% link _episodes/04-recipe.md %}).
317+
{: .callout}
318+
254319
{% include links.md %}

0 commit comments

Comments
 (0)