Similar to Dask, ESMValCore provides one single configuration object that consists of a single nested dictionary for its configuration.
Note
In v2.12.0, a redesign process of ESMValTool/Core's configuration started. Its main aim is to simplify the configuration by moving from many different configuration files for individual components to one configuration object that consists of a single nested dictionary (similar to Dask's configuration). This change will not be implemented in one large pull request but rather in a step-by-step procedure. Thus, the configuration might appear inconsistent until this redesign is finished. A detailed plan for this new configuration is outlined in :issue:`2371`.
When running recipes via the :ref:`command line <running>`, configuration options can be specified via YAML files and command line arguments. The options from all YAML files and command line arguments are merged together using :func:`dask.config.collect` to create a single configuration object, which properly considers nested objects (see :func:`dask.config.update` for details). Configuration options given via the command line will always be preferred over options given via YAML files.
:ref:`Configuration options <config_options>` can be specified via YAML files
(i.e., *.yaml and *.yml).
A file could look like this (for example, located at
~/.config/esmvaltool/config.yml):
output_dir: ~/esmvaltool_output
max_parallel_tasks: 1ESMValCore searches for all YAML files in each of the following locations and merges them together:
- The directory specified via the
--config_dircommand line argument. - The user configuration directory: by default
~/.config/esmvaltool, but this location can be changed with theESMVALTOOL_CONFIG_DIRenvironment variable.
Preference follows the order in the list above (i.e., the directory specified
via command line argument is preferred over the user configuration directory).
Within a directory, files are sorted lexicographically, and later files (e.g.,
z.yml) will take precedence over earlier files (e.g., a.yml).
Warning
ESMValCore will read all YAML files in these configuration directories.
Thus, other YAML files in this directory which are not valid configuration
files (like the old config-developer.yml files) will lead to errors.
Make sure to move these files to a different directory.
The minimal required configuration for the tool is that you configure where it can find :ref:`input data <config-data-sources>`. In addition to that, you may copy the default configuration file with :ref:`top level options <config_options>`
To get a copy of the default configuration file, you can run the command:
esmvaltool config copy defaults/config-user.ymlThis will copy the file to your configuration directory and you can tailor it
for your system, e.g. set the output_dir to a path where ESMValTool can
store its output files.
All :ref:`configuration options <config_options>` can also be given as command
line arguments to the esmvaltool executable.
Example:
esmvaltool run --max_parallel_tasks=2 /path/to/recipe.ymlOptions given via command line arguments will always take precedence over options specified via YAML files.
When running recipes with the :ref:`experimental Python API <experimental_api>`, configuration options can be specified and accessed via the :py:data:`~esmvalcore.config.CFG` object. For example:
>>> from esmvalcore.config import CFG
>>> CFG['output_dir'] = '~/esmvaltool_output'
>>> CFG['output_dir']
PosixPath('/home/user/esmvaltool_output')Or, alternatively, via a context manager:
>>> with CFG.context(log_level="debug"):
... print(CFG["log_level"])
debug
>>> print(CFG["log_level"])
infoThis will also consider YAML configuration files in the user configuration
directory (by default ~/.config/esmvaltool, but this can be changed with
the ESMVALTOOL_CONFIG_DIR environment variable).
More information about this can be found :ref:`here <api_configuration>`.
Note: the following entries use Python syntax.
For example, Python's None is YAML's null, Python's True is YAML's
true, and Python's False is YAML's false.
| Option | Description | Type | Default value |
|---|---|---|---|
auxiliary_data_dir |
Directory where auxiliary data is stored. [1] | :obj:`str` | ~/auxiliary_data |
check_level |
Sensitivity of the CMOR check (debug, strict, default, relaxed, ignore), see :ref:`cmor_check_strictness`. |
:obj:`str` | default |
compress_netcdf |
Use netCDF compression. | :obj:`bool` | False |
config_developer_file |
Path to custom :ref:`config-developer`. | :obj:`str` | None (default file) |
dask |
:ref:`config-dask`. | :obj:`dict` | See :ref:`config-dask-defaults` |
diagnostics |
Only run the selected diagnostics from the recipe, see :ref:`running`. | :obj:`list` or :obj:`str` | None (all diagnostics) |
download_dir |
[deprecated] Directory where downloaded data will be stored. [2] | :obj:`str` | ~/climate_data |
drs |
[deprecated] Directory structure for input data. [2] | :obj:`dict` | {CMIP3: ESGF, CMIP5: ESGF, CMIP6: ESGF, CORDEX: ESGF, obs4MIPs: ESGF} |
exit_on_warning |
Exit on warning (only used in NCL diagnostic scripts). | :obj:`bool` | False |
log_level |
Log level of the console (debug, info, warning, error). |
:obj:`str` | info |
logging |
:ref:`config-logging`. | :obj:`dict` | See :ref:`config-logging` |
max_datasets |
Maximum number of datasets to use, see :ref:`running`. | :obj:`int` | None (all datasets from recipe) |
max_parallel_tasks |
Maximum number of parallel processes, see :ref:`task_priority`. [4] | :obj:`int` | None (number of available CPUs) |
max_years |
Maximum number of years to use, see :ref:`running`. | :obj:`int` | None (all years from recipe) |
output_dir |
Directory where all output will be written, see :ref:`outputdata`. | :obj:`str` | ~/esmvaltool_output |
output_file_type |
Plot file type. | :obj:`str` | png |
profile_diagnostic |
Use a profiling tool for the diagnostic run. [3] | :obj:`bool` | False |
projects |
:ref:`config-projects`. | :obj:`dict` | See table in :ref:`config-projects` |
remove_preproc_dir |
Remove the preproc directory if the run was successful, see :ref:`preprocessed_datasets`. |
:obj:`bool` | True |
resume_from |
Resume previous run(s) by using preprocessor output files from these output directories, see :ref:`running`. | :obj:`list` of :obj:`str` | [] |
rootpath |
[deprecated] Rootpaths to the data from different projects. [2] | :obj:`dict` | {default: ~/climate_data} |
run_diagnostic |
Run diagnostic scripts, see :ref:`running`. | :obj:`bool` | True |
save_intermediary_cubes |
Save intermediary cubes from the preprocessor, see also :ref:`preprocessed_datasets`. | :obj:`bool` | False |
search_data |
Perform a quick or complete search for input data. When set to quick,
search will stop as soon as a result is found. :ref:`Data sources <config-data-sources>`
with a lower value for priority will be searched first. (quick, complete) |
:obj:`str` | quick |
search_esgf |
[deprecated] Automatic data download from ESGF (never, when_missing, always). [2] |
:obj:`str` | never |
skip_nonexistent |
Skip non-existent datasets, see :ref:`running`. | :obj:`bool` | False |
| [1] | The Warning This setting is not for model or observational datasets, rather it is for extra data files such as shapefiles or other data sources needed by the diagnostics. |
| [2] | (1, 2, 3, 4) This option is scheduled for removal in v2.14.0. Please use :ref:`data sources <config-data-sources>` to configure data finding instead. |
| [3] | The vprof --input-file esmvaltool_output/recipe_output/run/diagnostic/script/profile.jsonNote that it is also possible to use vprof to understand other resources used while running the diagnostic, including execution time of different code blocks and memory usage. |
| [4] | When using max_parallel_tasks with a value larger than 1 with the
Dask threaded scheduler, every task will start num_workers threads.
To avoid running out of memory or slowing down computations due to competition
for resources, it is recommended to set num_workers such that
max_parallel_tasks * num_workers approximately equals the number of CPU cores.
The number of available CPU cores can be found by running
python -c 'import os; print(len(os.sched_getaffinity(0)))'.
See :ref:`config-dask-threaded-scheduler` for information on how to configure
num_workers. |
Configure Dask in the dask section.
The :ref:`preprocessor functions <preprocessor_functions>` and many of the :ref:`Python diagnostics in ESMValTool <esmvaltool:recipes>` make use of the :ref:`Iris <iris:iris_docs>` library to work with the data. In Iris, data can be either :ref:`real or lazy <iris:real_and_lazy_data>`. Lazy data is represented by dask arrays. Dask arrays consist of many small numpy arrays (called chunks) and if possible, computations are run on those small arrays in parallel. In order to figure out what needs to be computed when, Dask makes use of a 'scheduler'. The default (thread-based) scheduler in Dask is rather basic, so it can only run on a single computer and it may not always find the optimal task scheduling solution, resulting in excessive memory use when using e.g. the :func:`esmvalcore.preprocessor.multi_model_statistics` preprocessor function. Therefore it is recommended that you take a moment to configure the Dask distributed scheduler. A Dask scheduler and the 'workers' running the actual computations, are collectively called a 'Dask cluster'.
Because some recipes require more computational resources than others, ESMValCore provides the option to define "Dask profiles". These profiles can be used to update the Dask user configuration per recipe run. The Dask profile can be selected in a YAML configuration file via
dask:
use: <NAME_OF_PROFILE>or alternatively in the command line via
esmvaltool run --dask='{"use": "<NAME_OF_PROFILE>"}' recipe_example.ymlAvailable predefined Dask profiles:
local_threaded(selected by default): use threaded scheduler without any further options.local_distributed: use local distributed scheduler without any further options.debug: use synchronous Dask scheduler for debugging purposes. Best used withmax_parallel_tasks: 1.
To copy these predefined profiles to your configuration directory for further customization, run the command:
esmvaltool config copy defaults/dask.ymlHere, some examples are provided on how to use a custom Dask distributed scheduler. Extensive documentation on setting up Dask Clusters is available here.
Note
If not all preprocessor functions support lazy data, computational performance may be best with the :ref:`threaded scheduler <config-dask-threaded-scheduler>`. See :issue:`674` for progress on making all preprocessor functions lazy.
Personal computer
Create a :class:`distributed.LocalCluster` on the computer running ESMValCore using all available resources:
dask:
use: local_cluster # use "local_cluster" defined below
profiles:
local_cluster:
cluster:
type: distributed.LocalClusterThis should work well for most personal computers.
Note
If running this configuration on a shared node of an HPC cluster, Dask will try and use as many resources it can find available, and this may lead to overcrowding the node by a single user (you)!
Shared computer
Create a :class:`distributed.LocalCluster` on the computer running ESMValCore, with 2 workers with 2 threads/4 GiB of memory each (8 GiB in total):
dask:
use: local_cluster # use "local_cluster" defined below
profiles:
local_cluster:
cluster:
type: distributed.LocalCluster
n_workers: 2
threads_per_worker: 2
memory_limit: 4GiBthis should work well for shared computers.
Computer cluster
Create a Dask distributed cluster on the Levante supercomputer using the Dask-Jobqueue package:
dask:
use: slurm_cluster # use "slurm_cluster" defined below
profiles:
slurm_cluster:
cluster:
type: dask_jobqueue.SLURMCluster
queue: shared
account: <YOUR_SLURM_ACCOUNT>
cores: 8
memory: 7680MiB
processes: 2
interface: ib0
local_directory: "/scratch/b/<YOUR_DKRZ_ACCOUNT>/dask-tmp"
n_workers: 24This will start 24 workers with cores / processes = 4 threads each,
resulting in n_workers / processes = 12 Slurm jobs, where each Slurm job
will request 8 CPU cores and 7680 MiB of memory and start processes = 2
workers.
This example will use the fast infiniband network connection (called ib0
on Levante) for communication between workers running on different nodes.
It is important to set the right location for temporary storage, in this
case the /scratch space is used.
It is also possible to use environmental variables to configure the temporary
storage location, if you cluster provides these.
A configuration like this should work well for larger computations where it is advantageous to use multiple nodes in a compute cluster. See Deploying Dask Clusters on High Performance Computers for more information.
Externally managed Dask cluster
To use an externally managed cluster, specify an scheduler_address for the
selected profile.
Such a cluster can e.g. be started using the Dask Jupyterlab extension:
dask:
use: external # Use the `external` profile defined below
profiles:
external:
scheduler_address: "tcp://127.0.0.1:43605"See here for an example of how to configure this on a remote system.
For debugging purposes, it can be useful to start the cluster outside of ESMValCore because then Dask dashboard remains available after ESMValCore has finished running.
Advice on choosing performant configurations
The threads within a single worker can access the same memory locations, so they may freely pass around chunks, while communicating a chunk between workers is done by copying it, so this is (a bit) slower. Therefore it is beneficial for performance to have multiple threads per worker. However, due to limitations in the CPython implementation (known as the Global Interpreter Lock or GIL), only a single thread in a worker can execute Python code (this limitation does not apply to compiled code called by Python code, e.g. numpy), therefore the best performing configurations will typically not use much more than 10 threads per worker.
Due to limitations of the NetCDF library (it is not thread-safe), only one of the threads in a worker can read or write to a NetCDF file at a time. Therefore, it may be beneficial to use fewer threads per worker if the computation is very simple and the runtime is determined by the speed with which the data can be read from and/or written to disk.
The Dask threaded scheduler can be a good choice for recipes using a small amount of data or when running a recipe where not all preprocessor functions are lazy yet (see :issue:`674` for the current status).
To avoid running out of memory, it is important to set the number of workers (threads) used by Dask to run its computations to a reasonable number. By default, the number of CPU cores in the machine will be used, but this may be too many on shared machines or laptops with a large number of CPU cores compared to the amount of memory they have available.
Typically, Dask requires about 2 GiB of RAM per worker, but this may be more depending on the computation.
To set the number of workers used by the Dask threaded scheduler, use the following configuration:
dask:
use: local_threaded # This can be omitted
profiles:
local_threaded:
num_workers: 4By default, the following Dask configuration is used:
dask:
use: local_threaded # use the `local_threaded` profile defined below
profiles:
local_threaded:
scheduler: threads
local_distributed:
cluster:
type: distributed.LocalCluster
debug:
scheduler: synchronous| Option | Description | Type | Default value |
|---|---|---|---|
profiles |
Different Dask profiles that can be
selected via the use option. Each
profile has a name (:obj:`dict` keys)
and corresponding options (:obj:`dict`
values). See
:ref:`config-dask-profiles` for
details. |
:obj:`dict` | See :ref:`config-dask-defaults` |
use |
Dask profile that is used; must be
defined in the option profiles. |
:obj:`str` | local_threaded |
| Option | Description | Type | Default value |
|---|---|---|---|
cluster |
Keyword arguments to initialize a Dask
distributed cluster. Needs the option
type, which specifies the class of
the cluster. The remaining options are
passed as keyword arguments to
initialize that class. Cannot be used
in combination with
scheduler_address. |
:obj:`dict` | If omitted, use externally managed
cluster if scheduler_address is
given or a :ref:`Dask threaded
scheduler
<config-dask-threaded-scheduler>`
otherwise. |
scheduler_address |
Scheduler address of an externally
managed cluster. Will be passed to
:class:`distributed.Client`. Cannot be
used in combination with cluster. |
:obj:`str` | If omitted, use a Dask distributed
cluster if cluster is given or a
:ref:`Dask threaded scheduler
<config-dask-threaded-scheduler>`
otherwise. |
| All other options | Passed as keyword arguments to :func:`dask.config.set`. | Any | No defaults. |
Configure what information is logged and how it is presented in the logging
section.
Note
Not all logging configuration is available here yet, see :issue:`2596`.
Configuration file example:
logging:
log_progress_interval: 10swill log progress of Dask computations every 10 seconds instead of showing a progress bar.
Command line example:
esmvaltool run --logging='{"log_progress_interval": "1m"}' recipe_example.ymlwill log progress of Dask computations every minute instead of showing a progress bar.
Available options:
| Option | Description | Type | Default value |
|---|---|---|---|
log_progress_interval |
When running computations with Dask,
log progress every
log_progress_interval instead of
showing a progress bar. The value can
be specified in the format accepted by
:func:`dask.utils.parse_timedelta`. A
negative value disables any progress
reporting. A progress bar is only
shown if max_parallel_tasks: 1. |
:obj:`str` or :obj:`float` | 0 |
Configure project-specific settings in the projects section.
Top-level keys in this section are projects, e.g., CMIP6, CORDEX, or
obs4MIPs.
Example:
projects:
CMIP6:
... # project-specific optionsThe following project-specific options are available:
| Option | Description | Type | Default value |
|---|---|---|---|
data |
Data sources are used to find input data and have to be configured before running the tool. See :ref:`config-data-sources` for details. | :obj:`dict` | {} |
extra_facets |
Extra key-value pairs ("facets") added to datasets in addition to the facets defined in the recipe. See :ref:`config-extra-facets` for details. | :obj:`dict` | See :ref:`config-extra-facets-defaults` |
The data section defines sources of input data. The easiest way to get
started with these is to copy one of the example configuration files and tailor
it to your needs.
To list the available example configuration files, run the command:
esmvaltool config listTo use one of the example configuration files, copy it to your configuration directory by running the command:
esmvaltool config copy data-intake-esgf.ymlwhere data-intake-esgf.yml needs to be replaced by the name of the example
configuration you would like to use. The format of the configuration file
is described in :mod:`esmvalcore.io`.
There are three modules available as part of ESMValCore that provide data sources:
- :mod:`esmvalcore.io.intake_esgf`: Use the intake-esgf library to load data that is available from ESGF.
- :mod:`esmvalcore.local`: Use :mod:`glob` patterns to find files on a filesystem.
- :mod:`esmvalcore.esgf`: Use the legacy esgf-pyclient library to find and download data from ESGF.
Adding a custom data source is relatively easy and is explained in :mod:`esmvalcore.io.protocol`.
There are various use cases and we provide example configurations for each of them below.
On a personal computer, the recommended setup can be obtained by running the commands:
esmvaltool config copy data-intake-esgf.yml
esmvaltool config copy data-local-esmvaltool.ymlThis will use the :mod:`esmvalcore.io.intake_esgf` module to access data
that is available through ESGF and use :mod:`esmvalcore.local` to find
observational and reanalysis datasets that have been
:ref:`CMORized with ESMValTool <esmvaltool:inputdata_observations>`
(OBS6 and OBS projects for CMIP6- and CMIP5-style CMORization
respectively) or are supported in their :ref:`native format <read_native_datasets>`
through the native6 project.
Warning
It is important to :doc:`configure intake-esgf <intake_esgf:configure>`
for your system before using it. Make sure to set local_cache to a path
where it can store downloaded files, and if (some) ESGF data is already
available on your system, point esg_dataroot to it. If you are
missing certain search results, you may want to choose a different
index node for searching the ESGF.
On HPC systems, data is often stored in large shared filesystems. We have several example configurations for popular HPC systems. To list the available example files, run the command:
esmvaltool config list data-hpcIf you are using one of the supported HPC systems, for example Jasmin, you can copy the example configuration file by running the command:
esmvaltool config copy data-hpc-badc.ymland you should be good to go. If your HPC system is not supported yet, you can
copy one of the other example configuration files, e.g. data-hpc-dkrz.yml
and tailor it for your system.
Warning
It is important to :doc:`configure intake-esgf <intake_esgf:configure>`
for your system before using it. Make sure to set local_cache to a path
where it can store downloaded files, and if (some) ESGF data is already
available on your system, point esg_dataroot to it. If you are
missing certain search results, you may want to choose a different
index node for searching the ESGF.
Note
Deduplicating data found via :mod:`esmvalcore.io.intake_esgf` data sources
and the :mod:`esmvalcore.local` data sources has not yet been implemented.
Therefore it is recommended not to use the configuration option
search_data: complete when using both data sources for the same project.
The search_data: quick option can be safely used.
For each of the climate models that are supported in their native format as described in :ref:`read_native_models`, an example configuration file is available. To list the available example files, run the command:
esmvaltool config list data-nativeIt is possible to ignore specific warnings when loading data with Iris.
This is particularly useful for native datasets which do not follow the CMOR
standard by default and consequently produce a lot of warnings when handled by
Iris.
This can be configured using the ignore_warnings argument to
:class:`esmvalcore.local.LocalDataSource`.
Here is an example on how to ignore specific warnings when loading data from
the EMAC model in its native format:
.. literalinclude:: ../configurations/data-native-emac.yml
:language: yaml
The keyword arguments specified in the list items are directly passed to
:func:`warnings.filterwarnings` in addition to action=ignore.
It can be useful to automatically add extra key-value pairs to variables or datasets without explicitly specifying them in the recipe. These key-value pairs can be used for :ref:`finding data <extra-facets-data-finder>` or for providing extra information to the functions that :ref:`fix data <extra-facets-fixes>` before passing it on to the preprocessor.
To support this, we provide the extra facets facilities. Facets are the key-value pairs described in :ref:`Datasets`. Extra facets allows for the addition of more details per project, dataset, MIP table, and variable name.
Extra facets are configured in the extra_facets section of the
project-specific configuration.
They are specified in nested dictionaries with the following levels:
- Dataset name
- MIP table
- Variable short name
Example:
projects:
CMIP6:
extra_facets:
CanESM5: # dataset name
Amon: # MIP table
tas: # variable short name
a_new_key: a_new_value # extra facetsThe three top levels under extra_facets (dataset name, MIP table, and
variable short name) can contain Unix shell-style wildcards.
The special characters used in shell-style wildcards are:
| Pattern | Meaning |
|---|---|
* |
matches everything |
? |
matches any single character |
[seq] |
matches any character in seq |
[!seq] |
matches any character not in seq |
where seq can either be a sequence of characters or just a bunch of
characters, for example [A-C] matches the characters A, B, and
C, while [AC] matches the characters A and C.
Examples:
projects:
CMIP6:
extra_facets:
CanESM5: # dataset name
"*": # MIP table
"*": # variable short name
a_new_key: a_new_value # extra facetsHere, the extra facet a_new_key: a_new_value will be added to any CMIP6
data from model CanESM5.
If keys are duplicated, later keys will take precedence over earlier keys:
projects:
CMIP6:
extra_facets:
CanESM5:
"*":
"*":
shared_key: with_wildcard
unique_key_1: test
Amon:
tas:
shared_key: without_wildcard
unique_key_2: testHere, the following extra facets will be added to a dataset with project CMIP6, name CanESM5, MIP table Amon, and variable short name tas:
unique_key_1: test
shared_key: without_wildcard # takes value from later entry
unique_key_2: testDefault extra facets are specified in extra_facets_*.yml files located in
this
directory.
The esmvaltool run command can automatically download the files required
to run a recipe from ESGF for the projects CMIP3, CMIP5, CMIP6, CORDEX, and obs4MIPs.
Refer to :ref:`config-data-sources` for instructions on how to set this up. This section describes additional configuration options for the :mod:`esmvalcore.esgf` module, which is based on the legacy esgf-pyclient library. Most users will not need this.
Note
When running a recipe that uses many or large datasets on a machine that does not have any data available locally, the amount of data that will be downloaded can be in the range of a few hundred gigabyte to a few terrabyte. See :ref:`esmvaltool:inputdata` for advice on getting access to machines with large datasets already available.
A log message will be displayed with the total amount of data that will
be downloaded before starting the download.
If you see that this is more than you would like to download, stop the
tool by pressing the Ctrl and C keys on your keyboard simultaneously
several times, edit the recipe so it contains fewer datasets and try again.
An optional configuration file can be created for configuring how the
:class:`esmvalcore.esgf.ESGFDataSource` uses esgf-pyclient
to find and download data.
The name of this file is ~/.esmvaltool/esgf-pyclient.yml.
Any arguments to :py:obj:`pyesgf.search.connection.SearchConnection` can
be provided in the section search_connection, for example:
search_connection:
expire_after: 2592000 # the number of seconds in a monthto keep cached search results for a month.
The default settings are:
search_connection:
urls:
.. - 'https://esgf-node.ornl.gov/esgf-1-5-bridge'
- 'https://esgf.ceda.ac.uk/esg-search'
- 'https://esgf-data.dkrz.de/esg-search'
- 'https://esgf-node.ipsl.upmc.fr/esg-search'
- 'https://esg-dn1.nsc.liu.se/esg-search'
- 'https://esgf.nci.org.au/esg-search'
- 'https://esgf.nccs.nasa.gov/esg-search'
- 'https://esgdata.gfdl.noaa.gov/esg-search'
distrib: true
timeout: 120 # seconds
cache: '~/.esmvaltool/cache/pyesgf-search-results'
expire_after: 86400 # cache expires after 1 dayNote that by default the tool will try searching the ESGF index nodes in the order provided in the configuration file and use the first one that is online. Some ESGF index nodes may return search results faster than others, so you may be able to speed up the search for files by experimenting with placing different index nodes at the top of the list.
Warning
ESGF is currently transitioning to new server technology and all of the above indices are expected to go offline except the first one.
Issues with https://esgf-node.ornl.gov/esgf-1-5-bridge can be reported here.
If you experience errors while searching, it sometimes helps to delete the cached results.
The tool will maintain statistics of how fast data can be downloaded from what host in the file ~/.esmvaltool/cache/esgf-hosts.yml and automatically select hosts that are faster. There is no need to manually edit this file, though it can be useful to delete it if you move your computer to a location that is very different from the place where you previously downloaded data. An entry in the file might look like this:
esgf2.dkrz.de:
duration (s): 8
error: false
size (bytes): 69067460
speed (MB/s): 7.9The tool only uses the duration and size to determine the download speed,
the speed shown in the file is not used.
If error is set to true, the most recent download request to that
host failed and the tool will automatically try this host only as a last
resort.
Most users and diagnostic developers will not need to change this file,
but it may be useful to understand its content.
The settings from this file are being moved to the
:ref:`new configuration system <config_overview>`. In particular, the
input_dir, input_file, and ignore_warnings settings have already
been replaced by the :class:`esmvalcore.local.LocalDataSource` that can be
configured via :ref:`data sources <config-data-sources>`.
The developer configuration file will be installed along with ESMValCore and can
also be viewed on GitHub:
esmvalcore/config-developer.yml.
This configuration file describes the CMOR tables for several
key projects (CMIP6, CMIP5, obs4MIPs, OBS6, OBS), and for native output data for some
models (ICON, IPSL, ... see :ref:`configure_native_models`).
Users can get a copy of this file with default values by running
esmvaltool config get_config_developer --path=${TARGET_FOLDER}If the option --path is omitted, the file will be created in
~/.esmvaltool.
Note
Remember to change the configuration option config_developer_file if you
want to use a custom config developer file.
Warning
For now, make sure that the custom config-developer.yml is not saved
in the ESMValTool/Core configuration directories (see
:ref:`config_yaml_files` for details).
This will change in the future due to the :ref:`redesign of ESMValTool/Core's
configuration <config_overview>`.
Example of the CMIP6 project configuration:
CMIP6:
output_file: '{project}_{dataset}_{mip}_{exp}_{ensemble}_{short_name}'
cmor_type: 'CMIP6'
cmor_strict: trueThe filename to use for preprocessed data is configured using output_file,
similar to the filename template in :class:`esmvalcore.local.LocalDataSource`.
Note that the extension .nc (and if applicable, a start and end time) will
automatically be appended to the filename.
ESMValCore comes bundled with several CMOR tables, which are stored in the directory esmvalcore/cmor/tables. These are copies of the tables available from PCMDI.
For every project that can be used in the recipe, there are four settings
related to CMOR table settings available:
cmor_type: can beCMIP5if the CMOR table is in the same format as the CMIP5 table orCMIP6if the table is in the same format as the CMIP6 table.cmor_strict: if this is set tofalse, the CMOR table will be extended with variables from the :ref:`custom_cmor_tables` (by default loaded from theesmvalcore/cmor/tables/customdirectory) and it is possible to use variables with amipwhich is different from the MIP table in which they are defined. Note that this option is always enabled for :ref:`derived variables <Variable derivation>`.cmor_path: path to the CMOR table. Relative paths are with respect to esmvalcore/cmor/tables. Defaults to the value provided incmor_typewritten in lower case.cmor_default_table_prefix: Prefix that needs to be added to themipto get the name of the file containing themiptable. Defaults to the value provided incmor_type.
As mentioned in the previous section, the CMOR tables of projects that use
cmor_strict: false will be extended with custom CMOR tables.
For :ref:`derived variables <Variable derivation>` (the ones with derive:
true in the recipe), the custom CMOR tables will always be considered.
By default, these custom tables are loaded from esmvalcore/cmor/tables/custom.
However, by using the special project custom in the
config-developer.yml file with the option cmor_path, a custom location
for these custom CMOR tables can be specified.
In this case, the default custom tables are extended with those entries from
the custom location (in case of duplication, the custom location tables take
precedence).
Example:
custom:
cmor_path: ~/my/own/custom_tablesThis path can be given as relative path (relative to esmvalcore/cmor/tables) or as absolute path. Other options given for this special table will be ignored.
Custom tables in this directory need to follow the naming convention
CMOR_{short_name}.dat and need to be given in CMIP5 format.
Example for the file CMOR_asr.dat:
SOURCE: CMIP5 !============ variable_entry: asr !============ modeling_realm: atmos !---------------------------------- ! Variable attributes: !---------------------------------- standard_name: units: W m-2 cell_methods: time: mean cell_measures: area: areacella long_name: Absorbed shortwave radiation !---------------------------------- ! Additional variable information: !---------------------------------- dimensions: longitude latitude time type: real positive: down !---------------------------------- !
It is also possible to use a special coordinates file CMOR_coordinates.dat,
which will extend the entries from the default one
(esmvalcore/cmor/tables/custom/CMOR_coordinates.dat).
ESMValCore can be configured for handling native model output formats and
specific reanalysis/observation datasets without preliminary reformatting.
These datasets can be either hosted under the native6 project (mostly
native reanalysis/observational datasets) or under a dedicated project, e.g.,
ICON (mostly native models).
Example:
native6:
cmor_strict: false
output_file: '{project}_{dataset}_{type}_{version}_{mip}_{short_name}'
cmor_type: 'CMIP6'
cmor_default_table_prefix: 'CMIP6_'
ICON:
cmor_strict: false
output_file: '{project}_{dataset}_{exp}_{var_type}_{mip}_{short_name}'
cmor_type: 'CMIP6'
cmor_default_table_prefix: 'CMIP6_'A detailed description on how to add support for further native datasets is given :ref:`here <add_new_fix_native_datasets>`.
Hint
When using native datasets, it might be helpful to specify a custom location
for the :ref:`custom_cmor_tables`.
This allows reading arbitrary variables from native datasets.
Note that this requires the option cmor_strict: false in the
:ref:`project configuration <configure_native_models>` used for the native
model output.
The esmvaltool/config-references.yml file contains the list of ESMValTool diagnostic and recipe authors, references and projects. Each author, project and reference referred to in the documentation section of a recipe needs to be in this file in the relevant section.
For instance, the recipe recipe_ocean_example.yml file contains the
following documentation section:
documentation:
authors:
- demo_le
maintainer:
- demo_le
references:
- demora2018gmd
projects:
- ukesmThese four items here are named people, references and projects listed in the
config-references.yml file.