Skip to content

Commit a674b4c

Browse files
authored
Merge pull request #12 from ESMValGroup/episode03_configuration
Update episode03_configuration
2 parents 203c592 + cfbc4f9 commit a674b4c

File tree

1 file changed

+209
-54
lines changed

1 file changed

+209
-54
lines changed

_episodes/03-configuration.md

Lines changed: 209 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -3,91 +3,246 @@ title: "Configuration"
33
teaching: 0
44
exercises: 0
55
questions:
6-
- "What is user configuration file and how can I use it?"
6+
- What is the user configuration file and how should I use it?
7+
78
objectives:
8-
- "Understand the data directories structure"
9-
- "Configure ESMValTool to ignore some settings"
9+
- Understand the contents of the user-config.yml file
10+
- Prepare a personalized user-config.yml file
11+
- Configure ESMValTool to use some settings
12+
1013
keypoints:
11-
- "The ``config-user.yml`` file tells ESMValTool what data are input"
12-
- "The ``config-user.yml`` file tells ESMValTool what directory is the destination"
14+
- The ``config-user.yml`` tells ESMValTool where to find input data.
15+
- "``rootpath`` defines the root directory for the input data."
16+
- "``output_dir`` defines the destination directory."
17+
1318
---
1419

1520
## The configuration file
1621

17-
The ``config-user.yml`` configuration file contains all the global level information needed by ESMValTool to run.
18-
This is an (YAML file) [https://yaml.org/spec/1.2/spec.html]. An example configuration file can be found in the root directory of the ESMValTool repository.
19-
Make a copy and rename it to ``config-user.yml``:
22+
The ``config-user.yml`` configuration file contains all the global level information
23+
needed by ESMValTool to run. This is an
24+
[YAML file](https://yaml.org/spec/1.2/spec.html). An example configuration file
25+
can be found in the root directory of the ESMValTool repository:
26+
[config-user-example.yml](https://github.com/ESMValGroup/ESMValTool/blob/master/config-user-example.yml).
2027

28+
First, we make a working directory ``esmvaltool_tutorial``.
29+
In a new terminal, run:
30+
31+
~~~bash
32+
mkdir esmvaltool_tutorial
33+
cd esmvaltool_tutorial
2134
~~~
22-
cp config-user-example.yml config-user.yml
35+
36+
Now, we download the configuration file to our working directory.
37+
To do that, click on
38+
[this link](https://raw.githubusercontent.com/ESMValGroup/ESMValTool/master/config-user-example.yml)
39+
to see a raw version of the file, right-click and press ``save as``,
40+
then you can rename it to ``config-user.yml``and save it into the working directory
41+
``esmvaltool_tutorial``.
42+
43+
Now, let's change our working directory in a terminal window to ``esmvaltool_tutorial``.
44+
Then, we run a text editor called Nano to have a look inside the configuration file:
45+
46+
~~~bash
47+
nano config-user.yml
2348
~~~
24-
{: .source}
2549

2650
This file contains the information for:
27-
* Rootpaths to the data from different projects
28-
* Directory structure for input data
29-
* Number of available CPUs
30-
* Destination directory
31-
* Auxiliary data directory
32-
* Output settings
33-
34-
## Rootpaths to input data
35-
ESMValTool uses several categories (in ESMValTool, this is referred to as projects) for input data based on their source, like CMIP for dataset from climate model intercomparison project, and OBS for observational dataset that adhere to (CMOR standard)[https://cmor.llnl.gov/].
36-
For each category, you can define either one path or several pathes as a list.
37-
In this lesson, you work with data from (CMIP5)[https://esgf-node.llnl.gov/projects/cmip5/].
38-
Add the root path of the folder where you downloaded the data during the (Setup)[https://escience-academy.github.io/lesson-esmvaltool/setup.html].
3951

52+
- Rootpath to input data
53+
- Directory structure for the data from different projects
54+
- Number of tasks that can be run in parallel
55+
- Destination directory
56+
- Auxiliary data directory
57+
- Output settings
58+
59+
> ## Text editor side note
60+
>
61+
> No matter what editor you use, you will need to know where it searches
62+
> for and saves files. If you start it from the shell, it will (probably)
63+
> use your current working directory as its default location. We use ``nano``
64+
> in examples here because it is one of the least complex text editors.
65+
> Press <kbd>ctrl</kbd> + <kbd>O</kbd> to save the file,
66+
> and then <kbd>ctrl</kbd> + <kbd>X</kbd> to exit ``nano``.
67+
{: .callout}
68+
69+
## Rootpath to input data
70+
71+
ESMValTool uses several categories (in ESMValTool, this is referred to as projects)
72+
for input data based on their source. The current categories in the configuration
73+
file are mentioned below. For example, CMIP is used for a dataset from
74+
the climate model intercomparison project whereas OBS is used for an observational dataset.
75+
We can find more information about the projects in the ESMValTool
76+
[documentation](https://docs.esmvaltool.org/en/latest/input.html).
77+
The ``rootpath`` specifies the directories where ESMValTool will look for input data.
78+
For each category, you can define either one path or several paths as a list.
79+
80+
~~~YAML
81+
rootpath:
82+
CMIP3: [~/cmip3_inputpath1, ~/cmip3_inputpath2]
83+
CMIP5: [~/cmip5_inputpath1, ~/cmip5_inputpath2]
84+
CMIP6: [~/cmip6_inputpath1, ~/cmip6_inputpath2]
85+
OBS: ~/obs_inputpath
86+
OBS6: ~/obs6_inputpath
87+
obs4mips: ~/obs4mips_inputpath
88+
ana4mips: ~/ana4mips_inputpath
89+
native6: ~/native6_inputpath
90+
RAWOBS: ~/rawobs_inputpath
91+
default: ~/default_inputpath
4092
~~~
93+
94+
In this lesson, we will work with data from
95+
[CMIP5](https://esgf-node.llnl.gov/projects/cmip5/).
96+
We add the root path of the folder where our/your data is available.
97+
98+
~~~YAML
4199
rootpath:
42100
...
43-
CMIP5: [~/cmip5_inputpath1, ~/cmip5_inputpath2, ~/escience-academy/test_data]
101+
CMIP5: [~/cmip5_inputpath1, ~/cmip5_inputpath2, ~/esmvaltool_tutorial/data]
44102
~~~
45-
{: .source}
46103

47-
## Auxiliary data directory (used for some additional datasets)
48-
auxiliary_data_dir: ~/auxiliary_data
104+
> ## Setting the correct rootpath
105+
>
106+
> - To get the data (or its correct rootpath), check instruction in
107+
[Setup]({{ page.root }}{% link setup.md %}).
108+
> - For more information about setting the rootpath, see also the ESMValTool
109+
[documentation](https://esmvaltool.readthedocs.io/projects/esmvalcore/en/latest/esmvalcore/datafinder.html).
110+
{: .callout}
49111

50-
The ``auxiliary_data_dir`` setting is the path to place any required
51-
additional auxiliary data files. This method was necessary because certain
52-
Python toolkits such as cartopy will attempt to download data files at run
53-
time, typically geographic data files such as coastlines or land surface maps.
54-
This can fail if the machine does not have access to the wider internet. This
55-
location allows us to tell cartopy (and other similar tools) where to find the
56-
files if they can not be downloaded at runtime. To reiterate, this setting is
57-
not for model or observational datasets, rather it is for data files used in
58-
plotting such as coastline descriptions and so on.
112+
## Directory structure for the data from different projects
59113

114+
Input data can be from various models, observations and reanalysis data that adhere
115+
to the [CF/CMOR standard](https://cmor.llnl.gov/).
116+
The ``drs`` setting describes the file structure.
117+
Let's use ``default`` for ``CMIP5`` in our example here:
60118

61-
## Output settings
119+
~~~YAML
120+
drs:
121+
CMIP5: default
122+
~~~
62123

63-
.. code-block:: yaml
124+
> ## Available drs
125+
>
126+
> The ``drs`` setting describes the file structure for several projects
127+
(e.g. ``CMIP6``, ``CMIP5``, ``obs4mips``, ``OBS6``, ``OBS``) on several key machines
128+
(e.g. ``BADC``, ``CP4CDS``, ``DKRZ``, ``ETHZ``, ``SMHI``, ``BSC``).
129+
For more information about ``drs``, you can visit the ESMValTool
130+
[documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/quickstart/find_data.html#cmor-drs).
131+
{: .callout}
64132

65-
# Diagnostics create plots? [true]/false
66-
write_plots: true
67-
# Diagnositcs write NetCDF files? [true]/false
68-
write_netcdf: true
133+
## Number of parallel tasks
69134

70-
The ``write_plots`` setting is used to inform ESMValTool about your preference
71-
for saving figures. Similarly, the ``write_netcdf`` setting is a boolean which
72-
turns on or off the writing of netCDF files.
135+
This option enables you to perform parallel processing.
136+
You can choose the number of tasks in parallel as
137+
1/2/3/4/... or you can set it to ``null``. That tells
138+
ESMValTool to use the maximum number of available CPUs:
73139

74-
The ```rootpath`` specifies the directories where ESMValTool will look for input
75-
data. Similarly, ``output_dir`` specifies where ESMValTool will store its
76-
output, i.e. figures, data, logs, etc. Make sure to set appropriate paths.
140+
~~~YAML
77141

78-
.. code-block:: yaml
142+
max_parallel_tasks: null
143+
~~~
79144

145+
> ## Set the number of tasks
146+
>
147+
> If you run out of memory, try setting ``max_parallel_tasks`` to 1.
148+
Then, check the amount of memory you need for that by inspecting
149+
the file ``run/resource_usage.txt`` in the output directory.
150+
Using the number there you can increase the number of parallel tasks
151+
again to a reasonable number for the amount of memory available in your system.
152+
{: .callout}
80153

154+
## Destination directory
81155

156+
The destination directory is the rootpath where ESMValTool will store its output,
157+
i.e. figures, data, logs, etc. With every run, ESMValTool automatically generates
158+
a new output folder determined by recipe name, and date and time using
159+
the format: YYYYMMDD_HHMMSS.
160+
This folder contains four further subfolders: ``plots``, ``preproc``, ``run``, ``work``.
82161

83-
You can tailor it for your system using the explanation below.
162+
Let's name our destination directory ``esmvaltool_output`` in the working directory:
84163

85-
.. note::
164+
~~~YAML
165+
output_dir: ./esmvaltool_output
166+
~~~
86167

87-
The ``config-user.yml`` file is specified as argument at run time, so it is
88-
possible to have several available with different purposes: one for
89-
formalised runs, one for debugging, etc...
168+
> ## Content of subfolders
169+
>
170+
> - ``plots``: the location for all plots, split by individual diagnostics and fields.
171+
> - ``preproc``: this folder contains all the preprocessed data and metadata.yml
172+
interface files. Note that by default this directory will be deleted after
173+
each run because most users will only need the results from the diagnostic scripts.
174+
> - ``run``: this folder includes all log files, a copy of the recipe,
175+
a summary of the resource usage, and the settings.yml interface files,
176+
resource_usage.txt and temporary files created by the diagnostic scripts.
177+
> - ``work``: this folder is a place for any diagnostic script results that
178+
are not plots, e.g. files in NetCDF format (depends on the diagnostic script).
179+
>
180+
> We explain more about output in the next
181+
[lesson]({{ page.root }}{% link _episodes/04-toy-example.md %})
182+
{: .callout}
183+
184+
## Auxiliary data directory
185+
186+
The ``auxiliary_data_dir`` setting is the path where any required
187+
additional auxiliary data files are stored. This location allows us to tell
188+
the diagnostic script where to find the files if they can not be downloaded
189+
at runtime. This option should not be used for model or observational datasets, but
190+
for data files (e.g. shape files) used in plotting such as coastline descriptions and so on.
191+
192+
~~~YAML
193+
auxiliary_data_dir: ~/auxiliary_data
194+
~~~
90195

196+
## Output settings
91197

92-
{% include links.md %}
198+
These settings are used to inform ESMValTool about your preference about specific actions.
199+
You can turn on or off the setting by ``true`` or ``false`` values.
200+
Most of these settings are fairly self-explanatory, ie:
201+
202+
~~~YAML
203+
# Diagnostics create plots? [true]/false
204+
write_plots: true
205+
# Diagnositcs write NetCDF files? [true]/false
206+
write_netcdf: true
207+
# Set the console log level debug, [info], warning, error
208+
log_level: info
209+
# Exit on warning (only for NCL diagnostic scripts)? true/[false]
210+
exit_on_warning: false
211+
# Plot file format? [png]/pdf/ps/eps/epsi
212+
output_file_type: png
213+
# Use netCDF compression true/[false]
214+
compress_netcdf: false
215+
# Save intermediary cubes in the preprocessor true/[false]
216+
save_intermediary_cubes: false
217+
# Remove the preproc dir if all fine
218+
remove_preproc_dir: true
219+
# Path to custom config-developer file, to customise project configurations.
220+
# See config-developer.yml for an example. Set to [null] to use the default
221+
# config_developer_file: null
222+
# Get profiling information for diagnostics
223+
# Only available for Python diagnostics
224+
profile_diagnostic: false
225+
~~~
93226

227+
> ## Make your own configuration file
228+
>
229+
> It is possible to have several configuration files with different purposes,
230+
for example: config-user_formalised_runs.yml, config-user_debugging.yml
231+
{: .callout}
232+
>
233+
> ## Saving preprocessed data
234+
>
235+
> In the configuration file, which settings are useful to make sure preprocessed data
236+
is stored when ESMValTool is run?
237+
>
238+
>> ## Solution
239+
>>
240+
>> If the option ``save_intermediary_cubes`` is set to true in
241+
the config-user.yml file, then the intermediary cubes will also be saved
242+
in the folder ``preproc``. Also, if the option ``remove_preproc_dir``
243+
is set to ``false``, then the ``preproc/`` directory contains all
244+
the preprocessed data and the metadata interface files.
245+
> {: .solution}
246+
{: .challenge}
247+
248+
{% include links.md %}

0 commit comments

Comments
 (0)