Skip to content

Commit c055e70

Browse files
committed
Address small comments and suggestions
1 parent 730208b commit c055e70

File tree

1 file changed

+78
-100
lines changed

1 file changed

+78
-100
lines changed

_episodes/09-cmorization.md

Lines changed: 78 additions & 100 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,17 @@
11
---
2-
title: "CMORization: Using observational datasets"
2+
title: "CMORization: adding new datasets to ESMValTool"
33
teaching: 15
44
exercises: 45
55

66
questions:
7-
- "What is so challenging about observational data?"
8-
- "How do I use observational datasets in ESMValTool?"
7+
- "CMORization: what is it and why do we need it?"
8+
- "How to use the existing CMORizer scripts shipped with ESMValTool?"
99
- "How add support for new (observational) datasets?"
1010

1111
objectives:
1212
- "Understand what CMORization is and why it is necessary."
13-
- "Learn how to write a new CMORizer script."
13+
- "Use existing scripts to CMORize your data."
14+
- "Write a new CMORizer script to support additional data."
1415

1516
keypoints:
1617
- "CMORizers are dataset-specific scripts that can be run once to generate CMOR-compliant data."
@@ -19,11 +20,14 @@ keypoints:
1920

2021
## Introduction
2122

22-
This episode deals with "CMORization". ESMValTool was designed to work with data
23+
This episode deals with "CMORization". ESMValTool is designed to work with data
2324
that follow the CMOR standards. Unfortunately, not all datasets follow these
2425
standards. In order to use such datasets in ESMValTool we first need to reformat
2526
the data. This process is called "CMORization".
2627

28+
In this episode we assume that you are using a development installation of
29+
ESMValTool as explained in the [previous episode](/08-development-setup).
30+
2731
> ## What are the CMOR standards?
2832
>
2933
> The name "CMOR" originates from a tool: [the Climate Model Output
@@ -38,7 +42,7 @@ the data. This process is called "CMORization".
3842
coordinate information, how the data should be structured (e.g. 1 variable per
3943
file), additional metadata requirements, but also file naming conventions a.k.a.
4044
the data reference syntax (DRS). All this information is stored in so-called
41-
CMOR tables. As example, the CMOR tables for the CMIP6 project can be found
45+
CMOR tables. As an example, the CMOR tables for the CMIP6 project can be found
4246
[here](https://github.com/PCMDI/cmip6-cmor-tables).
4347
{: .callout}
4448

@@ -52,12 +56,12 @@ CMOR-compliant copy of these datasets. CMORizer scripts for several popular
5256
datasets are included in ESMValTool, and ESMValTool also provides a convenient
5357
way to execute them.
5458

55-
Occasionally it happens that there are still minor issue with CMIP datasets. In
59+
Occasionally it happens that there are still minor issues with CMIP datasets. In
5660
those cases, it is possible to fix those issues in ESMValCore before any further
5761
processing is done. The same can be done for non-CMIP data. The advantage is
5862
that you don't need to store an additional, reformatted copy of the data. The
5963
disadvantage is that these fixes should be implemented inside ESMValCore.
60-
Writing a CMORizer script is technically is simpler.
64+
Development of ESMValCore is is beyond the scope of this tutorial.
6165

6266
The concepts discussed so far are illustrated in the figure below.
6367
![Data flow with ESMValTool](../fig/data_flow.png)
@@ -69,24 +73,22 @@ that is important for calculating components of the global carbon cycle. We will
6973
go through all the steps and explain relevant topics as we go. If you prefer to
7074
implement CMOR fixes, please read the documentation
7175
[here](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/develop/fixing_data.html#fixing-data).
72-
While fixes are implemented slightly differently, conceptually the process is
73-
the same and the concepts explained in this episode are still useful.
76+
While fixes are implemented in a different way, the objective is exactly the
77+
same, and the concepts explained in this episode are still useful.
7478

7579
## Obtaining the data
7680

77-
> ## Get the data
78-
> The data for this episode is available via the [FluxCom Data
79-
> Portal](http://www.bgc-jena.mpg.de/geodb/BGI/Home). First you'll need to
80-
> register. After registration, in the dropdown boxes, select FLUXCOM as the data
81-
> choice and click download. Three files will be displayed. Click the download
82-
> button on the "FLUXCOM (RS+METEO) Global Land Carbon Fluxes using CRUNCEP
83-
> climate data". You'll be send an FTP address to access the server. Connect
84-
> to the server, follow the path in your email, and look for the file
85-
> `raw/monthly/GPP.ANN.CRUNCEPv6.monthly.2000.nc`. Download that file and save in
86-
> in a folder called `/RAWOBS/Tier3/`.
87-
>
88-
> Note: you'll need a user-friendly ftp client. On Linux, `ncftp` works okay.
89-
{: .challenge}
81+
The data for this episode is available via the [FluxCom Data
82+
Portal](http://www.bgc-jena.mpg.de/geodb/BGI/Home). First you'll need to
83+
register. After registration, in the dropdown boxes, select FLUXCOM as the data
84+
choice and click download. Three files will be displayed. Click the download
85+
button on the "FLUXCOM (RS+METEO) Global Land Carbon Fluxes using CRUNCEP
86+
climate data". You'll receive an email with the FTP address to access the
87+
server. Connect to the server, follow the path in your email, and look for the
88+
file `raw/monthly/GPP.ANN.CRUNCEPv6.monthly.2000.nc`. Download that file and
89+
save in in a folder called `/RAWOBS/Tier3/`.
90+
91+
Note: you'll need a user-friendly ftp client. On Linux, `ncftp` works okay.
9092

9193
> ## What is the deal with those "tiers"?
9294
>
@@ -127,12 +129,12 @@ know we have completed our task.
127129

128130
> ## Create a test recipe
129131
>
130-
> Create a simple recipe that loads the "FLUXCOM" data. It should include a
131-
> datasets section with a single entry for the "FLUXCOM" dataset with the correct
132-
> dataset keys, and a diagnostics section with two variables: gpp and gppStderr.
133-
> We won't need any preprocessors or scripts (set `scripts: null`), but you will
134-
> have to add a documentation section with a description, authors and
135-
> maintainer, otherwise the recipe will fail.
132+
> Create a simple recipe called `recipe_check_fluxnet.yml` that loads the
133+
> "FLUXCOM" data. It should include a datasets section with a single entry for
134+
> the "FLUXCOM" dataset with the correct dataset keys, and a diagnostics section
135+
> with two variables: gpp and gppStderr. We don't need any preprocessors or
136+
> scripts (set `scripts: null`), but we have to add a documentation section with
137+
> a description, authors and maintainer, otherwise the recipe will fail.
136138
>
137139
> Use the following dataset keys:
138140
>
@@ -176,9 +178,7 @@ know we have completed our task.
176178
> >
177179
> > ```
178180
> >
179-
> > Note: a recipe similar to this one is available under
180-
> > `~/path/to/ESMValTool/esmvaltool/recipes/examples/recipe_check_obs.yml`.
181-
> > That recipe includes checks all datasets for which CMORizers are available.
181+
> > To learn more about writing a recipe, please refer to [Writing your own recipe](/05-preprocessor).
182182
> >
183183
> {: .solution}
184184
{: .challenge}
@@ -218,17 +218,14 @@ data. Our data is located in the `RAWOBS` folder, but ESMValTool is looking in
218218
from one folder to the other. To do end, we need to tell ESMValTool where our
219219
data may be found.
220220
221-
> ## Set the correct paths in your user configuration file:
222-
>
223-
> This information is set in `config-user.yml`. Modify your
224-
> configuration file so that it has the correct paths
225-
>
226-
> ```yaml
227-
> rootpath:
228-
> OBS6: /path/to/my/obs6/data
229-
> RAWOBS: /path/to/my/rawobs/data
230-
> ```
231-
{: .challenge}
221+
## Set the correct paths in your user configuration file:
222+
This information is set in `config-user.yml`. Modify your
223+
configuration file so that it has the correct paths
224+
```yaml
225+
rootpath:
226+
OBS6: /path/to/my/obs6/data
227+
RAWOBS: /path/to/my/rawobs/data
228+
```
232229

233230
> ## RAWOBS, OBS, OBS6!?
234231
>
@@ -295,12 +292,11 @@ copy of the [PCMDI](https://github.com/PCMDI) guidelines.
295292
{: .challenge}
296293

297294

298-
If the variable you are interested in is not available in the standard CMOR tables,
299-
you need to write a custom CMOR table entry for the variable. Don't worry! It sounds
300-
more complicated than it is! Examples of custom CMOR table entries are for example
301-
the standard error of a specific variable.
302-
For our variable "gpp" there is indeed no CMOR definition for the standard error,
303-
therefore "gppStderr" was defined in the custom CMOR table
295+
If the variable you are interested in is not available in the standard CMOR
296+
tables, you need to write a custom CMOR table entry for the variable. Examples
297+
of custom CMOR table entries are for example the standard error of a specific
298+
variable. For our variable "gpp" there is indeed no CMOR definition for the
299+
standard error, therefore "gppStderr" was defined in the custom CMOR table
304300
[here](https://github.com/ESMValGroup/ESMValCore/tree/master/esmvalcore/cmor/tables/custom),
305301
as ``CMOR_gppStderr.dat``.
306302

@@ -463,16 +459,14 @@ problems.
463459
The first step now is to create a file in the right folder that will contain
464460
the short python script. The home of all CMORizer scripts for observations
465461
and reanalysis datasets is
466-
[here](https://github.com/ESMValGroup/ESMValTool/tree/master/esmvaltool/cmorizers/obs).
462+
[esmvaltool/cmorizers/obs](https://github.com/ESMValGroup/ESMValTool/tree/master/esmvaltool/cmorizers/obs).
467463
Add a file with the name ``cmorize_obs_fluxcom.py`` to this folder.
468464
469465
> ## Note
470466
>
471467
> Always, always, when modifying or creating new code for the ESMValTool
472-
> repositories, work on your *own, local* branch of the ESMValTool. Optimally,
473-
> you have forked that branch directly from the most up-to-date version of
474-
> the "master" branch to avoid conflicts later when you want to merge your
475-
> code with the "master" branch of the ESMValTool.
468+
> repositories, work on your *own, local* branch of the ESMValTool. For more
469+
> information see [Development and contribution](/08-development-setup)
476470
>
477471
{: .callout}
478472
@@ -554,11 +548,8 @@ ultimately the ESMValTool knows how to look for the new dataset.
554548
555549
Therefore it is necessary to create a configuration file for the new dataset.
556550
This configuration file needs to be stored in the
557-
following folder:
551+
following folder: ``ESMValTool/esmvaltool/cmorizers/obs/cmor_config/``.
558552
559-
```bash
560-
ESMValTool/esmvaltool/cmorizers/obs/cmor_config/
561-
```
562553
It is important to note that the name of the configuration file has to be
563554
identical to the name of the dataset. For our example the configuration file,
564555
traditionally written in the ``yaml`` format, therefore must be called
@@ -627,19 +618,15 @@ contain)
627618
> > mip: Lmon
628619
> > ```
629620
> >
630-
> > The original configuration file for the "FLUXCOM" dataset can be
631-
> > found here:
632-
> > [FLUXCOM.yml]
633-
> > (https://github.com/ESMValGroup/ESMValTool/blob/master/esmvaltool/cmorizers/obs/cmor_config/FLUXCOM.yml)
621+
> > The original configuration file for the "FLUXCOM" dataset can be found here:
622+
> > [FLUXCOM.yml](https://github.com/ESMValGroup/ESMValTool/blob/master/esmvaltool/cmorizers/obs/cmor_config/FLUXCOM.yml)
634623
> >
635-
> > Note the attribute "reference" here: it should include a ``doi`` related
636-
> > to the dataset. For more information on how to add references to the
624+
> > Note the attribute "reference" here: it should include a ``doi`` related to
625+
> > the dataset. For more information on how to add references to the
637626
> > ``reference`` section of the configuration file, see the section in the
638-
> > documentation about this: [adding references]
639-
> > (https://docs.esmvaltool.org/en/latest/community/diagnostic.html#adding-references)
627+
> > documentation about this: [adding
628+
> > references](https://docs.esmvaltool.org/en/latest/community/diagnostic.html#adding-references)
640629
> >
641-
> > If a single dataset has more than one reference, it is possible to add
642-
> > tags as a list e.g. ``reference: ['tag1', 'tag2']``.
643630
> {: .solution}
644631
{: .challenge}
645632
@@ -653,12 +640,12 @@ def cmorization(in_dir, out_dir, cfg, config_user):
653640
654641
with this exact call signature. Here, ``in_dir`` corresponds to the input
655642
directory of the raw files, ``out_dir`` to the output directory of final
656-
reformatted data set and ``cfg`` to the configuration dictionary given by
657-
the ``.yml`` configuration file. The return value of this function is ignored.
658-
All the work, i.e. loading of the raw files, processing them and saving the
659-
final output, has to be performed inside its body. To simplify this process,
660-
ESMValTool provides a set of predefined utilities.py_, which can be imported
661-
into your CMORizer by
643+
reformatted data set and ``cfg`` to the configuration dictionary given by the
644+
``.yml`` configuration file. The return value of this function is ignored. All
645+
the work, i.e. loading of the raw files, processing them and saving the final
646+
output, has to be performed inside its body. To simplify this process,
647+
ESMValTool provides some convenience functions in ``utilities.py`` , which
648+
can be imported into your CMORizer by
662649

663650
```python
664651
from . import utilities as utils
@@ -675,7 +662,7 @@ that code style). For example, the function ``_get_filepath`` converts the raw
675662
filepath to the correct one and the function ``_extract_variable`` extracts and
676663
saves a single variable from the raw data.
677664

678-
After all that theory, let's have a look at the actualy python code of the
665+
After all that theory, let's have a look at the python code of the
679666
existing "FLUXCOM" CMORizer script. For now, we only want to read in the data
680667
and then store it in a new file.
681668

@@ -762,7 +749,7 @@ def cmorization(in_dir, out_dir, cfg, _):
762749
```
763750

764751
Let's run this CMORizing script to see if the dataset is read correctly, and
765-
what kind of file is written out. Tere is a specific command available in the
752+
what kind of file is written out. There is a specific command available in the
766753
ESMValTool to run the CMORizing scripts:
767754

768755
```bash
@@ -782,15 +769,10 @@ NetCDF files are produced in your output directory.
782769
> ## Was the CMORization successful so far?!
783770
>
784771
> If you check the folders in your output path, you should see the following
785-
> folder structure:
786-
> ```bash
787-
> /Tier3/FLUXCOM/
788-
> ```
772+
> folder structure: ``/Tier3/FLUXCOM/``
789773
>
790-
> Within the "FLUXCOM" folder there should be a NetCDF file with the name:
791-
> ```bash
792-
> OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_xxxx01-xxxx12.nc
793-
> ```
774+
> Within the "FLUXCOM" folder there should be a NetCDF file named
775+
> ``OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_xxxx01-xxxx12.nc``.
794776
>
795777
> The "xxxx" represents the start year of the data period you wanted to
796778
> CMORize, and the "yyyy" represents the end year.
@@ -868,7 +850,7 @@ cube = cube / (1000 * 86400)
868850
cube.units = 'kg m-2 s-1'
869851
```
870852

871-
The whole section should not look then look like this:
853+
The whole section should then look like this:
872854

873855
```python
874856
def _extract_variable(cmor_info, attrs, filepath, out_dir):
@@ -914,11 +896,7 @@ the problem with the coordinates ``lat`` and ``lon`` yet. There is no "units"
914896
or "standard_name" given for either of these coordinates which will cause a
915897
problem for the ESMValTool. Such some smaller formatting problems can occur
916898
relatively often for coordinates like ``lat``, ``lon`` or ``time``. This means
917-
that these problems need fixing in many CMORizers. Therefore there are common
918-
functions available within the ESMValTool that one can import and use in the
919-
new CMORizer script. The functions, written in python, are stored in the folder
920-
[utilities.py](https://github.com/ESMValGroup/ESMValTool/blob/master/esmvaltool/cmorizers/obs/utilities.py)
921-
899+
that these problems need fixing in many CMORizers.
922900
> ## Finalizing the "FLUXCOM" CMORizer
923901
>
924902
> The task is now to work with the functions in the file "utilities.py" to
@@ -956,7 +934,7 @@ new CMORizer script. The functions, written in python, are stored in the folder
956934
> double lon_bnds(lon, bnds) ;
957935
> ```
958936
>
959-
> For that to happen, you will have to fix/work on the following things:
937+
> For that to happen, you have to fix the following things:
960938
> - adding standard names to the dimensions ``lat`` and ``lon``
961939
> - fix the metadata for the variable "gpp"
962940
> - change the time units to start in the year 1950
@@ -1030,17 +1008,17 @@ Since you have gone through all the trouble to reformat the dataset so that
10301008
the ESMValTool can work with it, it would be great if you could provide the
10311009
CMORizer, and ultimately with that the dataset, to the rest of the community.
10321010
To do that there are a few more steps you have to do:
1033-
1. Open a pull request in the ESMValTool repository describing the dataset briefly
1034-
2. Add the info of your dataset to the User Guide so that people know it is available for the ESMValTool [Obtaining input data](https://github.com/ESMValGroup/ESMValTool/blob/master/doc/sphinx/source/input.rst)
1035-
3. Make sure that there is a reference file available for the dataset [BibTeX info file](https://github.com/ESMValGroup/ESMValTool/tree/master/esmvaltool/references)
1036-
1037-
More information about working with pull requests are available in the ESMValTool
1038-
documentation under [Contributing a review](https://esmvaltool--1920.org.readthedocs.build/en/1920/community/review.html)
1039-
1011+
1. Check out the previous episode on [Contributing to ESMValTool](/08-development-setup)
1012+
1. Make sure that you have added the info of your dataset to the User Guide so
1013+
that people know it is available for the ESMValTool [Obtaining input
1014+
data](https://github.com/ESMValGroup/ESMValTool/blob/master/doc/sphinx/source/input.rst)
1015+
1. Make sure that there is a reference file available for the dataset [BibTeX
1016+
info
1017+
file](https://github.com/ESMValGroup/ESMValTool/tree/master/esmvaltool/references)
10401018
## Some final comments
10411019
10421020
Adding a new CMORizer to the ESMValTool is definitely already an advanced task
1043-
when working with the ESMValTool. You have to have a basic understanding of
1021+
when working with the ESMValTool. You need to have a basic understanding of
10441022
how the ESMValTool works and how it's internal structure looks like. In
10451023
addition, you need to have a basic understanding of NetCDF files and a
10461024
programming language. In our example we used python for the CMORizing script
@@ -1050,4 +1028,4 @@ compatibility of the code with possible fundamental changes to the structure
10501028
of the ESMValTool and ESMValCore.
10511029
10521030
More information about adding observations to the ESMValTool can be found in the
1053-
[documentation](https://docs.esmvaltool.org/en/latest/input.html#observations)
1031+
[documentation](https://docs.esmvaltool.org/en/latest/input.html#observations).

0 commit comments

Comments
 (0)