@@ -12,8 +12,9 @@ objectives:
1212
1313keypoints :
1414- The ``config-user.yml`` tells ESMValTool where to find input data.
15- - " ``rootpath`` defines the root directory for the input data."
1615- " ``output_dir`` defines the destination directory."
16+ - " ``rootpath`` defines the root path of the data."
17+ - " ``drs`` defines the directory structure of the data."
1718
1819---
1920
@@ -22,10 +23,8 @@ keypoints:
2223The `` config-user.yml `` configuration file contains all the global level
2324information needed by ESMValTool to run.
2425This is a [ YAML file] ( https://yaml.org/spec/1.2/spec.html ) .
25- An example configuration file can be found in the ESMValCore repository:
26- [ config-user-example.yml] ( https://github.com/ESMValGroup/ESMValCore/blob/master/esmvalcore/config-user.yml ) .
2726
28- You can generate the default configuration file by running:
27+ You can get the default configuration file by running:
2928
3029~~~ bash
3130 esmvaltool config get_config_user
@@ -36,7 +35,8 @@ path to your home directory. Note that files and directories starting with a
3635period are "hidden", to see the ` .esmvaltool ` directory in the terminal use
3736` ls -la ~ ` .
3837
39- We run a text editor called `` nano `` to have a look inside the configuration file:
38+ We run a text editor called `` nano `` to have a look inside the configuration file
39+ and then modify it if needed:
4040
4141~~~ bash
4242 nano ~ /.esmvaltool/config-user.yml
@@ -63,42 +63,29 @@ This file contains the information for:
6363
6464## Output settings
6565
66- These settings are used to inform ESMValTool about your preference about
67- specific actions. You can turn on or off the setting by `` true `` or `` false ``
68- values. Most of these settings are fairly self-explanatory, ie:
66+ The configuration file starts with output settings that
67+ inform ESMValTool about your preference for output.
68+ You can turn on or off the setting by `` true `` or `` false ``
69+ values. Most of these settings are fairly self-explanatory.
70+ For example, ` write_plots: true ` means that diagnostics create plots.
6971
70- ``` yaml
71- # Diagnostics create plots? [true]/false
72- write_plots : true
73- # Diagnositcs write NetCDF files? [true]/false
74- write_netcdf : true
75- # Set the console log level debug, [info], warning, error
76- log_level : info
77- # Exit on warning (only for NCL diagnostic scripts)? true/[false]
78- exit_on_warning : false
79- # Plot file format? [png]/pdf/ps/eps/epsi
80- output_file_type : png
81-
82- ...
83-
84- # Use netCDF compression true/[false]
85- compress_netcdf : false
86- # Save intermediary cubes in the preprocessor true/[false]
87- save_intermediary_cubes : false
88- # Remove the preproc dir if all fine
89- remove_preproc_dir : true
90-
91- ...
92-
93- # Path to custom config-developer file, to customise project configurations.
94- # See config-developer.yml for an example. Set to [null] to use the default
95- config_developer_file : null
96- # Get profiling information for diagnostics
97- # Only available for Python diagnostics
98- profile_diagnostic : false
99- ` ` `
100-
101- In general there is no need to change the settings listed above.
72+ > ## Saving preprocessed data
73+ >
74+ > Later in this tutorial, we will want to look at the contents of the ` preproc ` folder.
75+ > This folder contains preprocessed data and is removed by default when ESMValTool is run.
76+ > In the configuration file, which settings can be modified to prevent this from happening?
77+ >
78+ >> ## Solution
79+ >>
80+ >> If the option `` remove_preproc_dir `` is set to `` false `` , then the
81+ >> `` preproc/ `` directory contains all the pre-processed data and the
82+ >> metadata interface files.
83+ >> If the option `` save_intermediary_cubes `` is set to `` true ``
84+ >> then data will also be saved after each preprocessor step in the folder
85+ >> `` preproc `` . Note that saving all intermediate results to file will result
86+ >> in a considerable slowdown, and can quickly fill your disk.
87+ > {: .solution}
88+ {: .challenge}
10289
10390## Destination directory
10491
@@ -108,12 +95,6 @@ a new output folder determined by recipe name, and date and time using
10895the format: YYYYMMDD_HHMMSS.
10996This folder contains four further subfolders: `` plots `` , `` preproc `` , `` run `` , `` work `` .
11097
111- Let's name our destination directory ` ` esmvaltool_output` ` in the working directory:
112-
113- ` ` ` yaml
114- output_dir : ./esmvaltool_output
115- ` ` `
116-
11798> ## Content of subfolders
11899>
119100> - `` plots `` : the location for all plots, split by individual diagnostics and fields.
@@ -130,40 +111,22 @@ are not plots, e.g. files in NetCDF format (depends on the diagnostic script).
130111[ lesson] ({{ page.root }}{% link _ episodes/04-recipe.md %})
131112{: .callout}
132113
133- ## Auxiliary data directory
134-
135- The ` ` auxiliary_data_dir` ` setting is the path where any required additional
136- auxiliary data files are stored. This location allows us to tell the diagnostic
137- script where to find the files if they can not be downloaded at runtime. This
138- option should not be used for model or observational datasets, but for data
139- files (e.g. shape files) used in plotting such as coastline descriptions and so
140- on.
141-
142- ` ` ` yaml
143- auxiliary_data_dir : ~/auxiliary_data
144- ` ` `
145-
146- ## Number of parallel tasks
147-
148- This option enables you to perform parallel processing.
149- You can choose the number of tasks in parallel as
150- 1/2/3/4/... or you can set it to ` ` null` ` . That tells
151- ESMValTool to use the maximum number of available CPUs:
152-
153- ` ` ` yaml
154-
155- max_parallel_tasks : null
156- ` ` `
157-
158- > ## Set the number of tasks
114+ > ## Set the destination directory
159115>
160- > If you run out of memory, try setting ` ` max_parallel_tasks` ` to 1.
161- Then, check the amount of memory you need for that by inspecting
162- the file ` ` run/resource_usage.txt` ` in the output directory.
163- Using the number there you can increase the number of parallel tasks
164- again to a reasonable number for the amount of memory available in your system.
165- {: .callout}
166-
116+ > Let's name our destination directory `` esmvaltool_output `` in the working directory.
117+ > ESMValTool should write the output to this path.
118+ > How to modify the ` config-user.yml ` ?
119+ >
120+ >> ## Solution
121+ >>
122+ >> We use ` output_dir ` entry in the ` config-user.yml ` file as:
123+ >> ``` yaml
124+ >> output_dir: ./esmvaltool_output
125+ >>```
126+ >>
127+ >> If the `esmvaltool_output` does not exist, ESMValTool will generate it for you.
128+ > {: .solution}
129+ {: .challenge}
167130
168131# # Rootpath to input data
169132
@@ -175,6 +138,7 @@ We can find more information about the projects in the ESMValTool
175138[documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/quickstart/find_data.html).
176139The ``rootpath`` specifies the directories where ESMValTool will look for input data.
177140For each category, you can define either one path or several paths as a list.
141+ For example :
178142
179143` ` ` yaml
180144rootpath:
@@ -184,71 +148,172 @@ rootpath:
184148 default: ~/default_inputpath
185149 CORDEX: ~/default_inputpath
186150` ` `
187- Site-specific entries for Jasmin, DKRZ and ETHZ are listed at the end of the
188- example configuration file.
189151
190- In this lesson, we will work with data from
191- [CMIP5](https://esgf-node.llnl.gov/projects/cmip5/)
192- and [obs4mips](https://esgf-node.llnl.gov/projects/obs4mips/).
193- We add the root path of the folder where our/your data is available.
194-
195- ` ` ` yaml
196- rootpath :
197- ...
198- CMIP5 : [~/cmip5_inputpath1, ~/cmip5_inputpath2, ~/esmvaltool_tutorial/data]
199- obs4mips : ~/esmvaltool_tutorial/data
200- ` ` `
152+ Site-specific entries for Jasmin and DKRZ are listed at the end of the
153+ example configuration file.
201154
202- > ## Setting the correct rootpath
155+ > # # Set the correct rootpath
203156>
204- > - To get the data (or its correct rootpath), check instruction in
205- > [Setup]({{ page.root }}{% link setup.md %}).
206- > - For more information about setting the rootpath, see also the ESMValTool
207- > [documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/esmvalcore/datafinder.html).
208- {: .callout}
157+ > In this tutorial, we will work with data from
158+ > [CMIP5](https://esgf-node.llnl.gov/projects/cmip5/)
159+ > and [obs4mips](https://esgf-node.llnl.gov/projects/obs4mips/).
160+ > How can we moodify the `rootpath` to make sure the data path is set correctly
161+ > for both CMIP5 and obs4mips.
162+ >
163+ > Note:
164+ > to get the data, check instruction in
165+ > [Setup]({{ page.root }}{% link setup.md %}).
166+ >
167+ >> # # Solution
168+ >>
169+ >> - Are you working on your own local machine?
170+ >> You need to add the root path of the folder where the data is available
171+ >> to the `config-user.yml` file as:
172+ >>```yaml
173+ >> rootpath:
174+ >> ...
175+ >> CMIP5: ~/esmvaltool_tutorial/data
176+ >> obs4mips: ~/esmvaltool_tutorial/data
177+ >>```
178+ >>
179+ >> - Are you working with on a computer cluster like Jasmin or DKRZ?
180+ >> Site-specific path to the data are already listed at the end of the
181+ >> `config-user.yml` file. You need to uncomment the related lines.
182+ >> For example, on Jasmin:
183+ >>```yaml
184+ >> # Site-specific entries: Jasmin
185+ >> # Uncomment the lines below to locate data on JASMIN
186+ >> rootpath:
187+ >> # CMIP6: /badc/cmip6/data/CMIP6
188+ >> CMIP5: /badc/cmip5/data/cmip5/output1
189+ >> # CMIP3: /badc/cmip3_drs/data/cmip3/output
190+ >> # OBS: /group_workspaces/jasmin4/esmeval/obsdata-v2
191+ >> # OBS6: /group_workspaces/jasmin4/esmeval/obsdata-v2
192+ >> obs4mips: /group_workspaces/jasmin4/esmeval/obsdata-v2
193+ >> # ana4mips: /group_workspaces/jasmin4/esmeval/obsdata-v2
194+ >> # CORDEX: /badc/cordex/data/CORDEX/output
195+ >>```
196+ >>
197+ >> - For more information about setting the rootpath, see also the ESMValTool
198+ >> [documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/quickstart/find_data.html).
199+ > {: .solution}
200+ {: .challenge}
209201
210202# # Directory structure for the data from different projects
211203
212204Input data can be from various models, observations and reanalysis data that
213205adhere to the [CF/CMOR standard](https://cmor.llnl.gov/). The ``drs`` setting
214- describes the file structure. Let's use ` ` default` ` for ` ` CMIP5` ` in our example
215- here:
206+ describes the file structure.
216207
217- ` ` ` yaml
218- drs :
219- CMIP5 : default
220- ` ` `
208+ The ``drs`` setting describes the file structure for several projects (e.g.
209+ CMIP6, CMIP5, obs4mips, OBS6, OBS) on several key machines
210+ (e.g. BADC, CP4CDS, DKRZ, ETHZ, SMHI, BSC). For more
211+ information about ``drs``, you can visit the ESMValTool documentation on
212+ [Data Reference Syntax (DRS)](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/quickstart/find_data.html#cmor-drs).
221213
222- > ## Available drs
214+ > # # Set the correct drs
223215>
224- > The ` ` drs` ` setting describes the file structure for several projects (e.g.
225- > ` ` CMIP6` ` , ` ` CMIP5` ` , ` ` obs4mips` ` , ` ` OBS6` ` , ` ` OBS` ` ) on several key machines
226- > (e.g. ` ` BADC` ` , ` ` CP4CDS` ` , ` ` DKRZ` ` , ` ` ETHZ` ` , ` ` SMHI` ` , ` ` BSC` ` ). For more
227- > information about ` ` drs` ` , you can visit the ESMValTool
228- > [documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/quickstart/find_data.html#cmor-drs).
229- {: .callout}
230-
231- > ## Make your own configuration file
216+ > In this lesson, we will work with data from
217+ > [CMIP5](https://esgf-node.llnl.gov/projects/cmip5/)
218+ > and [obs4mips](https://esgf-node.llnl.gov/projects/obs4mips/).
219+ > How can we set the correct `drs`?
232220>
233- > It is possible to have several configuration files with different purposes,
234- > for example: config-user_formalised_runs.yml, config-user_debugging.yml
235- {: .callout}
221+ >> # # Solution
222+ >>
223+ >> - Are you working on your own local machine?
224+ >> You need to set the `drs` of the data
225+ >> in the `config-user.yml` file as:
226+ >>```yaml
227+ >> drs:
228+ >> CMIP5: default
229+ >> obs4mips: default
230+ >>```
231+ >>
232+ >> - Are you working with on a computer cluster like Jasmin or DKRZ?
233+ >> Site-specific `drs` of the data are already listed at the end of the
234+ >> `config-user.yml` file. You need to uncomment the related lines.
235+ >> For example, on Jasmin:
236+ >>```yaml
237+ >> # Site-specific entries: Jasmin
238+ >> # Uncomment the lines below to locate data on JASMIN
239+ >> drs:
240+ >> # CMIP6: BADC
241+ >> CMIP5: BADC
242+ >> # CMIP3: BADC
243+ >> # CORDEX: BADC
244+ >> # OBS: BADC
245+ >> # OBS6: BADC
246+ >> obs4mips: BADC
247+ >> # ana4mips: BADC
248+ >>```
249+ >>
250+ > {: .solution}
251+ {: .challenge}
236252
237- > ## Saving preprocessed data
253+ > # # Explain the default drs (if working on local machine)
254+ >
255+ > 1. In the previous exercise, we set the `drs` of CMIP5 data to `default`.
256+ > Can you explain why?
238257>
239- > In the configuration file, which settings are useful to make sure preprocessed
240- > data is stored when ESMValTool is run ?
258+ > 2. Have a look at the directory structure of the data.
259+ > There is the folder `Tier1`. What does it mean ?
241260>
242261>> # # Solution
243262>>
244- > > If the option ` ` remove_preproc_dir ` ` is set to ` ` false ` ` , then the
245- > > ` ` preproc/ ` ` directory contains all the pre-processed data and the
246- > > metadata interface files.
247- > > If the option ` ` save_intermediary_cubes ` ` is set to ` ` true ` `
248- > > then data will also be saved after each preprocessor step in the folder
249- > > ` ` preproc ` ` . Note that saving all intermediate results to file will result
250- > > in a considerable slowdown.
263+ >> 1. `drs: default` is one way to retrieve data from a ROOT directory that has no DRS-like structure.
264+ >> ``default `` indicates that all the files are in a folder without any structure.
265+ >>
266+ >> 2. Observational data are organized in Tiers depending on their level of public availability.
267+ >> Therefore the default directory must be structured accordingly with sub-directories
268+ >> `TierX` e.g. Tier1, Tier2 or Tier3, even when `drs: default`.
269+ >>
251270> {: .solution}
252271{: .challenge}
253272
273+ # # Other settings
274+
275+ > # # Auxiliary data directory
276+ >
277+ > The ``auxiliary_data_dir`` setting is the path where any required additional
278+ auxiliary data files are stored. This location allows us to tell the diagnostic
279+ script where to find the files if they can not be downloaded at runtime. This
280+ option should not be used for model or observational datasets, but for data
281+ files (e.g. shape files) used in plotting such as coastline descriptions and
282+ if you want to feed some additional data (e.g. shape files) to your recipe.
283+ >
284+ >```yaml
285+ > auxiliary_data_dir: ~/auxiliary_data
286+ > ```
287+ > See more information in ESMValTool
288+ [document](https://docs.esmvaltool.org/projects/ESMValCore/en/latest/quickstart/configure.html?highlight=auxiliary_data#user-configuration-file).
289+ {: .callout}
290+
291+ > # # Number of parallel tasks
292+ >
293+ > This option enables you to perform parallel processing.
294+ You can choose the number of tasks in parallel as
295+ 1/2/3/4/... or you can set it to ``null``. That tells
296+ ESMValTool to use the maximum number of available CPUs :
297+ >
298+ >```yaml
299+ > max_parallel_tasks: null
300+ > ```
301+ >
302+ > If you run out of memory, try setting ``max_parallel_tasks`` to 1.
303+ Then, check the amount of memory you need for that by inspecting
304+ the file ``run/resource_usage.txt`` in the output directory.
305+ Using the number there you can increase the number of parallel tasks
306+ again to a reasonable number for the amount of memory available in your system.
307+ {: .callout}
308+
309+ > # # Make your own configuration file
310+ >
311+ > It is possible to have several configuration files with different purposes,
312+ > for example: config-user_formalised_runs.yml, config-user_debugging.yml.
313+ > In this case, you have to pass the path of your own configuration file
314+ > as a command-line option when running the ESMValTool.
315+ > We will learn how to do this in the
316+ > [next lesson]({{ page.root }}{% link _episodes/04-recipe.md %}).
317+ {: .callout}
318+
254319{% include links.md %}
0 commit comments