Skip to content

Commit ca71bd7

Browse files
authored
Merge branch 'main' into fix_check_fails
2 parents 79944ca + 123ed7b commit ca71bd7

File tree

2 files changed

+147
-52
lines changed

2 files changed

+147
-52
lines changed

_episodes/09-cmorization.md

Lines changed: 145 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ run the existing one. There is a specific command available in the ESMValTool to
119119
run the CMORizer scripts:
120120

121121
```bash
122-
cmorize_obs -c <config-user.yml> -o <dataset-name>
122+
esmvaltool data format --config_file <path to config-user.yml> <dataset-name>
123123
```
124124

125125
The ``config-user.yml`` is the file in which we define the different data
@@ -141,46 +141,43 @@ If everything is okay, the output should look something like this:
141141
142142
~~~
143143
...
144-
... Starting the CMORization Tool at time: 2021-02-26 14:02:16 UTC
144+
... Starting the CMORization Tool at time: 2022-07-26 14:02:16 UTC
145145
... ----------------------------------------------------------------------
146146
... input_dir = /home/peter/data/RAWOBS
147-
... output_dir = /home/peter/esmvaltool_output/cmorize_obs_20210226_140216
147+
... output_dir = /home/peter/esmvaltool_output/data_formatting_20220726_140216
148148
... ----------------------------------------------------------------------
149149
... Running the CMORization scripts.
150-
... Using cmorizer scripts repository: /home/peter/miniconda3/envs/esmvaltool/
151-
lib/python3.8/site-packages/esmvaltool/cmorizers/obs
152-
... Processing datasets {'Tier3': ['FLUXCOM']}
150+
... Processing datasets ['FLUXCOM']
153151
... Input data from: /home/peter/data/RAWOBS/Tier3/FLUXCOM
154152
... Output will be written to: /home/peter/esmvaltool_output/
155-
cmorize_obs_20210226_140216/Tier3/FLUXCOM
156-
... Reformat script: /home/peter/miniconda3/envs/esmvaltool/lib/python3.8/
157-
site-packages/esmvaltool/cmorizers/obs/cmorize_obs_fluxcom
158-
... CMORizing dataset FLUXCOM using Python script /home/peter/miniconda3/envs/
159-
esmvaltool/lib/python3.8/site-packages/esmvaltool/cmorizers/obs/
160-
cmorize_obs_fluxcom.py
153+
data_formatting_20220726_140216/Tier3/FLUXCOM
154+
... Reformat script: /home/peter/mambaforge/envs/esmvaltool/lib/python3.9/
155+
site-packages/esmvaltool/cmorizers/data/formatters/datasets/fluxcom
156+
... CMORizing dataset FLUXCOM using Python script /home/peter/mambaforge/envs/
157+
esmvaltool/lib/python3.9/site-packages/esmvaltool/cmorizers/data/formatters/
158+
datasets/fluxcom.py
161159
... Found input file '/home/peter/data/RAWOBS/Tier3/FLUXCOM/GPP.ANN.CRUNCEPv6.monthly.*.nc'
162160
... CMORizing variable 'gpp'
163161
... Lmon
164162
... Var is gpp
165163
... ... UserWarning: Ignoring netCDF variable 'GPP' invalid units 'gC m-2 day-1'
166-
warnings.warn(msg)
164+
167165
... Fixing time...
168166
... Fixing latitude...
169167
... Fixing longitude...
170168
... Flipping dimensional coordinate latitude...
171169
... Saving file
172-
... Converting data type of data from 'float64' to 'float32'
173-
... Saving: /home/peter/esmvaltool_output/cmorize_obs_20210226_140216/Tier3/
174-
FLUXCOM/OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_200001-200012.nc
170+
... Saving: /home/peter/esmvaltool_output/data_formatting_20220726_140216/Tier3/
171+
FLUXCOM/OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_200001-200012.nc
175172
... Cube has lazy data [lazy is preferred]
176-
... Ending the CMORization Tool at time: 2021-02-26 14:02:16 UTC
177-
... Time for running the CMORization scripts was: 0:00:00.605970
173+
... CMORization of dataset FLUXCOM finished!
174+
... Formatting successful for dataset FLUXCOM
178175
~~~
179176
{: .output}
180177
181178
So you can see that several fixes are applied, and the CMORized file is written
182179
to the ESMValTool output directory, i.e.
183-
`~/esmvaltool_output/cmorize_obs_YYYYMMDD_HHMMSS/TierX/dataset-name/filename.nc`
180+
`~/esmvaltool_output/data_formatting_YYYYMMDD_HHMMSS/TierX/dataset-name/filename.nc`
184181
In order to use it, we'll have to copy it from the output directory to a folder
185182
called `~/data/OBS/Tier3/FLUXCOM` and make sure the path to ``OBS`` is set
186183
correctly in our config-user file:
@@ -191,9 +188,9 @@ rootpath:
191188
```
192189

193190
You can also see the path where ESMValTool stores the reformatting script:
194-
`~/ESMValTool/esmvaltool/cmorizers/obs/cmorize_obs_fluxcom.py`. You may
191+
`~/ESMValTool/esmvaltool/data/formatters/datasets/fluxcom.py`. You may
195192
have a look at this file if you want. The script also uses a configuration file:
196-
`~/ESMValTool/esmvaltool/cmorizers/obs/cmor_config/FLUXCOM.yml`.
193+
`~/ESMValTool/esmvaltool/cmorizers/data/cmor_config/FLUXCOM.yml`.
197194

198195
## Make a test recipe
199196

@@ -204,8 +201,8 @@ CMORized, ESMValTool will give a warning or error.
204201

205202
> ## Create a test recipe
206203
>
207-
> Create a simple recipe called `recipe_check_fluxcom.yml` that loads the
208-
> FLUXCOM data. It should include a datasets section with a single entry for
204+
> Create a simple recipe called [recipe_check_fluxcom.yml](../files/recipe_check_fluxcom.yml)
205+
> that loads the FLUXCOM data. It should include a datasets section with a single entry for
209206
> the "FLUXCOM" dataset with the correct dataset keys, and a diagnostics section
210207
> with two variables: gpp. We don't need any preprocessors or
211208
> scripts (set `scripts: null`), but we have to add a documentation section with
@@ -233,7 +230,7 @@ CMORized, ESMValTool will give a warning or error.
233230
> > documentation:
234231
> >
235232
> > description: Test recipe for FLUXCOM data
236-
>> title: This is a test recipe for the FLUXCOM data.
233+
> > title: This is a test recipe for the FLUXCOM data.
237234
> >
238235
> > authors:
239236
> > - kalverla_peter
@@ -263,7 +260,7 @@ CMORized, ESMValTool will give a warning or error.
263260
Try to run the example recipe with
264261

265262
```bash
266-
esmvaltool run recipe_check_fluxcom.yml --log_level debug
263+
esmvaltool run recipe_check_fluxcom.yml --config_file <path to config-user.yml> --log_level debug
267264
```
268265

269266
If everything is okay, the recipe should run without problems.
@@ -278,17 +275,17 @@ test recipe will not be able to use it anymore.
278275

279276
```bash
280277
rm ~/data/OBS/Tier3/FLUXCOM/OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_200001-200012.nc
281-
rm ~/ESMValTool/esmvaltool/cmorizers/obs/cmorize_obs_fluxcom.py
282-
rm ~/ESMValTool/esmvaltool/cmorizers/obs/cmor_config/FLUXCOM.yml
278+
rm ~/ESMValTool/esmvaltool/cmorizers/data/formatters/datasets/fluxcom.py
279+
rm ~/ESMValTool/esmvaltool/cmorizers/data/cmor_config/FLUXCOM.yml
283280
```
284281

285282
If you now run the test recipe again it should fail, and somewhere in the output
286283
you should find something like:
287284

288285
~~~
289286
No input files found for ...
290-
Looking for files matching ['OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp[_.]*nc'] in
291-
['/home/peter/data/OBS/Tier3/FLUXCOM']
287+
Looked for files matching: /home/peter/data/OBS/Tier3/
288+
FLUXCOM/OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp[_.]*nc
292289
~~~
293290
{: .error}
294291

@@ -298,10 +295,10 @@ the file so that it follows the CMOR filename conventions.
298295
## Create a new CMORizer script and a corresponding config file
299296

300297
The first step now is to create a new file in the right folder that will contain
301-
our new CMORizer instructions. Create a file called ``cmorize_obs_fluxcom.py``
298+
our new CMORizer instructions. Create a file called ``fluxcom.py``
302299

303300
```bash
304-
nano ~/ESMValTool/esmvaltool/cmorizers/obs/cmorize_obs_fluxcom.py
301+
nano ~/ESMValTool/esmvaltool/cmorizers/data/formatters/datasets/fluxcom.py
305302
```
306303

307304
and fill it with the following boilerplate code:
@@ -312,11 +309,11 @@ and fill it with the following boilerplate code:
312309
<We will add some useful info here later>
313310
"""
314311
import logging
315-
from . import utilities as utils
312+
from esmvaltool.cmorizers.data import utilities as utils
316313
317314
logger = logging.getLogger(__name__)
318315
319-
def cmorization(in_dir, out_dir, cfg, _):
316+
def cmorization(in_dir, out_dir, cfg, cfg_user, start_date, end_date):
320317
"""Cmorize the dataset."""
321318
322319
# This is where you'll add the cmorization code
@@ -328,11 +325,14 @@ def cmorization(in_dir, out_dir, cfg, _):
328325
Here, ``in_dir`` corresponds to the input directory of the raw files,
329326
``out_dir`` to the output directory of final reformatted data set and ``cfg`` to
330327
a configuration dictionary given by a configuration file that we will get to
331-
shortly. When you type the command ``cmorize_obs`` in the terminal, ESMValTool
332-
will call this function with the settings found in your configuration files.
328+
shortly. The last three arguments will not be considered in this script but
329+
can be used in other cases. ``cfg_user`` corresponds to the user configuration
330+
file, ``start_date`` to the start of the period to format, and ``end_date`` to
331+
the end of the period to format. When you type the command ``esmvaltool data format``
332+
in the terminal, ESMValTool will call this function with the settings found in your configuration files.
333333

334334
The ESMValTool CMORizer also needs a dataset configuration file. Create a file
335-
called `~/ESMValTool/esmvaltool/cmorizers/obs/cmor_config/FLUXCOM.yml`
335+
called `~/ESMValTool/esmvaltool/cmorizers/data/cmor_config/FLUXCOM.yml`
336336
and fill it with the following boilerplate:
337337

338338
```yaml
@@ -382,7 +382,7 @@ You can try running the CMORizer at this point, and it should work without
382382
errors. However, it doesn't produce any output yet:
383383

384384
```bash
385-
cmorize_obs -c <config-user.yml> -o FLUXCOM
385+
esmvaltool data format --config_user <path to config-user.yml> FLUXCOM
386386
```
387387

388388
### 1. Find the input data
@@ -397,7 +397,14 @@ logger.info("in_dir: '%s'", in_dir)
397397
logger.info("cfg: '%s'", cfg)
398398
```
399399

400-
If you run the CMORizer again, it will print out the content of these variables.
400+
If you run the CMORizer again, it will print out the content of these variables
401+
and the output should contain something like this:
402+
403+
~~~
404+
... in_dir: '/home/peter/data/RAWOBS/Tier3/FLUXCOM'
405+
... cfg: '{'attributes': {'project_id': 'OBS6', 'comment': ''}, 'cmor_table': <esmvalcore.cmor.table.CMIP6Info object at 0x7fbd0a0f6bf0>}'
406+
~~~
407+
{: .output}
401408

402409
> ## Load the data
403410
>
@@ -450,7 +457,7 @@ attributes (`attrs`) are set through the configuration file. So we need to find
450457
out what the correct short name and attributes are.
451458

452459
The standard attributes for CMIP variables are defined in the [CMIP
453-
tables](https://github.com/ESMValGroup/ESMValCore/tree/master/esmvalcore/cmor/tables/cmip6/Tables).
460+
tables](https://github.com/ESMValGroup/ESMValCore/tree/main/esmvalcore/cmor/tables/cmip6/Tables).
454461
These tables are differentiated according to the "MIP" they belong to. The
455462
tables are a copy of the [PCMDI](https://github.com/PCMDI) guidelines.
456463

@@ -466,7 +473,7 @@ tables are a copy of the [PCMDI](https://github.com/PCMDI) guidelines.
466473
> > The variable "gpp" belongs to the land variables. The temporal resolution that we are looking
467474
> > for is "monthly". This information points to the "Lmon" CMIP table. And indeed, the variable
468475
> > "gpp" can be found in the file
469-
> > [here](https://github.com/ESMValGroup/ESMValCore/blob/master/esmvalcore/
476+
> > [here](https://github.com/ESMValGroup/ESMValCore/blob/main/esmvalcore/
470477
cmor/tables/cmip6/Tables/CMIP6_Lmon.json).
471478
> >
472479
> {: .solution}
@@ -532,8 +539,9 @@ However, this makes it possible to add more variables later on.
532539
> ## Was the CMORization successful so far?
533540
>
534541
> If you run the CMORizer again, you should see that it creates an output file
535-
> named ``OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_xxxx01-xxxx12.nc``. The "xxxx" and
536-
> "yyyy" represent the start and end year of the data.
542+
> named ``OBS6_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_xxxx01-xxxx12.nc`` stored in your
543+
> ESMValTool output directory `~/esmvaltool_output/data_formatting_YYYYMMDD_HHMMSS/Tier3/FLUXCOM/`.
544+
> The "xxxx" and "yyyy" represent the start and end year of the data.
537545
>
538546
{: .callout}
539547

@@ -590,7 +598,7 @@ address in the next section.
590598

591599
### 3. Implementing additional fixes
592600

593-
Copy the output of the CMORizer to your folder `~/data/OBS6/Tier3/`
601+
Copy the output of the CMORizer to your folder `~/data/OBS6/Tier3/FLUXCOM/`
594602
and change the test recipe to look for OBS6 data instead of OBS (note: we're
595603
upgrading the CMORizer to newer standards here!). Make sure the path to ``OBS6``
596604
is set correctly in our config-user file:
@@ -603,7 +611,7 @@ rootpath:
603611
If we now run the test recipe on our newly 'CMORized' data,
604612

605613
```bash
606-
esmvaltool run recipe_check_fluxcom.yml --log_level debug
614+
esmvaltool run recipe_check_fluxcom.yml --config_file <path to config-user.yml> --log_level debug
607615
```
608616

609617
it should be able to find the correct file, but it does not succeed yet. The first
@@ -624,7 +632,8 @@ we can use it, we'll also need to make sure the coordinates have the correct
624632
standard name. Add the following code to your cmorizer:
625633

626634
```python
627-
# Fix/add coordinate information and metadata
635+
# 2. Apply the necessary fixes
636+
# 2a. Fix/add coordinate information and metadata
628637
cube.coord('lat').standard_name = 'latitude'
629638
cube.coord('lon').standard_name = 'longitude'
630639
utils.fix_coords(cube)
@@ -634,7 +643,7 @@ With some additional refactoring, our cmorization function might then look
634643
something like this:
635644

636645
```python
637-
def cmorization(in_dir, out_dir, cfg, _):
646+
def cmorization(in_dir, out_dir, cfg, cfg_user, start_date, end_date):
638647
"""Cmorize the dataset."""
639648
640649
# Get general information from the config file
@@ -665,9 +674,10 @@ def cmorization(in_dir, out_dir, cfg, _):
665674
utils.save_variable(cube=cube, var=short_name, outdir=out_dir, attrs=all_attributes)
666675
```
667676

668-
Have a look at the netCDF file, and confirm that the coordinates now have much
669-
more metadata added to them. Then, run the test recipe again with the latest
670-
CMORizer output. The next error is:
677+
Run the CMORizer script once more. Have a look at the netCDF file,
678+
and confirm that the coordinates now have much more metadata added to them.
679+
Then, run the test recipe again with the latest CMORizer output.
680+
The next error is:
671681

672682
~~~
673683
esmvalcore.cmor.check.CMORCheckError: There were errors in variable GPP:
@@ -771,6 +781,91 @@ Once everything works as expected, there's a couple of things that we can still
771781
{: .challenge}
772782

773783

784+
- **Fill the dataset information list**. The file
785+
[datasets.yml](https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/data/datasets.yml)
786+
contains the ESMValTool "tier", the data source, the last access time and
787+
download instructions for all supported datasets in ESMValTool. You can
788+
simply reuse the information written in the header of the CMORizer.
789+
790+
> ## Fill out the FLUXCOM entry in ``datasets.yml``
791+
>
792+
> Fill out the FLUXCOM entry in ``datasets.yml``. The different parts that need to be
793+
> present in the entry are the following:
794+
>
795+
> - Dataset-name
796+
> - Tier
797+
> - Source
798+
> - Last access
799+
> - Download and processing instructions
800+
>
801+
> > ## Answers
802+
> >
803+
> > The entry for the "FLUXCOM" dataset should look like:
804+
> >
805+
> > ```yaml
806+
> > FLUXCOM:
807+
> > tier: 3
808+
> > source: http://www.bgc-jena.mpg.de/geodb/BGI/Home
809+
> > last_access: 2019-07-27
810+
> > info: |
811+
> > From the website, select FLUXCOM as the data choice and click download.
812+
> > Two files will be displayed. One for Land Carbon Fluxes and one for
813+
> > Land Energy fluxes. The Land Carbon Flux file (RS + METEO) using
814+
> > CRUNCEP data file has several data files for different variables.
815+
> > The data for GPP generated using the
816+
> > Artificial Neural Network Method will be in files with name:
817+
> > GPP.ANN.CRUNCEPv6.monthly.*.nc
818+
> > A registration is required for downloading the data.
819+
> > Users in the UK with a CEDA-JASMIN account may request access to the jules
820+
> > workspace and access the data.
821+
> > Note : This data may require rechunking of the netcdf files.
822+
> > This constraint will not exist once iris is updated to
823+
> > version 2.3.0 Aug 2019
824+
> > ```
825+
> {: .solution}
826+
{: .challenge}
827+
828+
Once the ``datasets.yml`` file is filled, you can check that ESMValTool can
829+
display information about the added dataset with:
830+
831+
```bash
832+
esmvaltool data info FLUXCOM
833+
```
834+
835+
If everything is okay, the output should look something like this:
836+
837+
~~~
838+
$ esmvaltool data info FLUXCOM
839+
FLUXCOM
840+
841+
Tier: 3
842+
Source: http://www.bgc-jena.mpg.de/geodb/BGI/Home
843+
Automatic download: No
844+
845+
From the website, select FLUXCOM as the data choice and click download.
846+
Two files will be displayed. One for Land Carbon Fluxes and one for
847+
Land Energy fluxes. The Land Carbon Flux file (RS + METEO) using
848+
CRUNCEP data file has several data files for different variables.
849+
The data for GPP generated using the
850+
Artificial Neural Network Method will be in files with name:
851+
GPP.ANN.CRUNCEPv6.monthly.*.nc
852+
A registration is required for downloading the data.
853+
Users in the UK with a CEDA-JASMIN account may request access to the jules
854+
workspace and access the data.
855+
Note : This data may require rechunking of the netcdf files.
856+
This constraint will not exist once iris is updated to
857+
version 2.3.0 Aug 2019
858+
~~~
859+
{: .output}
860+
861+
Note that ``Automatic download: No`` means that no automatic downloading script
862+
is available in ESMValTool for this dataset. The implementation of such a
863+
script is beyond the scope of this tutorial. To find out which datasets come
864+
with an automatic download script, you can run: ``esmvaltool data list`` to
865+
list all datasets supported in ESMValTool. More information about the usage
866+
of automatic downloading scripts can be found in the
867+
[User Guide](https://docs.esmvaltool.org/en/latest/develop/dataset.html#downloader-script-optional).
868+
774869
- **Complete the metadata in the config file**. We have left a few fields empty
775870
in the configuration file, such as 'source'. By filling out these fields we can
776871
make sure the relevant metadata is passed on as attributes in the CMORized
@@ -794,7 +889,7 @@ utils.set_global_atts(cube, attributes)
794889
- **Add documentation**. Make sure that you have added the info of your dataset
795890
to the User Guide so that people know it is available for the ESMValTool
796891
[Obtaining input
797-
data](https://github.com/ESMValGroup/ESMValTool/blob/master/doc/sphinx/source/input.rst).
892+
data](https://github.com/ESMValGroup/ESMValTool/blob/main/doc/sphinx/source/input.rst).
798893

799894

800895
## Some final comments
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,11 @@ documentation:
22

33
description: Test recipe for FLUXCOM data
44

5+
title: This is a test recipe for the FLUXCOM data.
6+
57
authors:
68
- kalverla_peter
79

8-
title: Test recipe fluxcom.
9-
1010
maintainer:
1111
- kalverla_peter
1212

0 commit comments

Comments
 (0)