Skip to content

Commit 040d3a7

Browse files
committed
More restructuring (still WIP)
1 parent 02438e8 commit 040d3a7

File tree

1 file changed

+44
-157
lines changed

1 file changed

+44
-157
lines changed

_episodes/09-cmorization.md

Lines changed: 44 additions & 157 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,11 @@ run the CMORizer scripts:
116116
cmorize_obs -c <config-user.yml> -o <dataset-name>
117117
```
118118

119+
The ``config-user-yml`` is the file in which we define the different data
120+
paths, e.g. where the ESMValTool would find the "RAWOBS" folder. The
121+
``dataset-name`` needs to be identical to the folder name that was created
122+
to store the raw observation data files, in our case this would be "FLUXCOM".
123+
119124
If everything is okay, the output should look something like this:
120125

121126
~~~
@@ -286,6 +291,13 @@ def cmorization(in_dir, out_dir, cfg, _):
286291
# 3. store the data with the correct filename
287292
```
288293
294+
Here, ``in_dir`` corresponds to the input directory of the raw files,
295+
``out_dir`` to the output directory of final reformatted data set and ``cfg`` to
296+
a configuration dictionary given by a configuration file that we will get to shortly.
297+
298+
When you type the command ``cmorize_obs`` in the terminal, ESMValTool will call
299+
this function with the settings found in your configuration files.
300+
289301
> ## Note
290302
>
291303
> Always, always, when modifying or creating new code for the ESMValTool
@@ -294,20 +306,17 @@ def cmorization(in_dir, out_dir, cfg, _):
294306
>
295307
{: .callout}
296308
297-
### 1. Finding the input data
309+
### 1. Find the input data and store it under the right name.
298310
299311
Since the original data does not follow CMOR filename conventions, we need to
300-
tell ESMValTool what the filename for this new dataset looks like. We supply
301-
this information via a dataset configuration file. It is important to note that
302-
the name of the configuration file has to be identical to the name of the
303-
dataset. Thus, we will create a file called
312+
tell ESMValTool what the filename for this new dataset looks like. Also, we need
313+
to provide the relevant information so ESMValTool can set the correct filename
314+
for the cmorized data. We supply this information via a dataset configuration
315+
file. It is important to note that the name of the configuration file has to be
316+
identical to the name of the dataset. Thus, we will create a file called
304317
`<path_to_esmvaltool>/esmvaltool/cmorizers/obs/cmor_config/FLUXCOM.yml`.
305318
306-
In addition to the filename information, the configuration file also contains
307-
information about "global attributes" for the netCDF file that will be
308-
created and information about the variables that need to be CMORized.
309-
310-
> ## Let's create the configuration file for the "FLUXCOM" dataset
319+
> ## Create the configuration file for the "FLUXCOM" dataset
311320
>
312321
> Here is the skeleton of the "FLUXCOM" configuration file as it exists in
313322
> the ESMValTool framework. Try to fill in all missing pieces of information
@@ -362,9 +371,7 @@ created and information about the variables that need to be CMORized.
362371
> > mip: Lmon
363372
> > ```
364373
> >
365-
> > The original configuration file for the "FLUXCOM" dataset can be found here:
366-
> > [FLUXCOM.yml](https://github.com/ESMValGroup/ESMValTool/blob/master/esmvaltool/cmorizers/obs/cmor_config/FLUXCOM.yml)
367-
> >
374+
> > *Suggestion: maybe add the reference under step 3 (additional but not strictly necessary steps)*
368375
> > Note the attribute "reference" here: it should include a ``doi`` related to
369376
> > the dataset. For more information on how to add references to the
370377
> > ``reference`` section of the configuration file, see the section in the
@@ -374,18 +381,14 @@ created and information about the variables that need to be CMORized.
374381
> {: .solution}
375382
{: .challenge}
376383
384+
###### Here we need to add python code to the cmorizer script
377385
378-
### 2. Implementing additional fixes
386+
so that we can run it and see whether it was able to find the correct input and create the right output.
379387
380388
381389
382-
### 3. Finalizing the CMORizer
383-
384-
Once everything works as expected, there's a couple of things that we can still do.
390+
### 2. Implementing additional fixes
385391
386-
- Add header info
387-
- Make sure the metadata are added to the config file
388-
- Maybe go through a checklist????
389392
390393
> ## Run the test recipe again
391394
>
@@ -447,10 +450,27 @@ problems. So let's start writing a short python script that will fix these
447450
problems.
448451
449452
450-
*PK: I'd suggest doing the header last, as it's not needed or relevant in the beginning*.
451-
But the very first part of the CMORizing script is a header. The header
452-
contains information about where to obtain the data, when it was accessed
453-
the last time, which ESMValTool "tier" it is associated with, and more
453+
To simplify this process, ESMValTool provides some convenience functions in
454+
``utilities.py`` , which we already included in the boilerplate code above.
455+
456+
Apart from a function to easily save data, this module contains different
457+
kinds of small fixes to the data attributes, coordinates, and metadata which
458+
are necessary for the data field to be CMOR-compliant. We will come back to
459+
these functionalities in a bit.
460+
461+
462+
### 3. Finalizing the CMORizer
463+
464+
Once everything works as expected, there's a couple of things that we can still do.
465+
466+
- Add header info
467+
- Make sure the metadata are added to the config file
468+
- Maybe go through a checklist????
469+
- add an entry to config-references?
470+
471+
472+
The header contains information about where to obtain the data, when it was
473+
accessed the last time, which ESMValTool "tier" it is associated with, and more
454474
detailed information about the necessary downloading and processing steps.
455475
456476
> ## Fill out the header for the "FLUXCOM" dataset
@@ -508,141 +528,8 @@ detailed information about the necessary downloading and processing steps.
508528
509529
510530
511-
Now that we have defined the configuration file for our "FLUXCOM" data, we can
512-
finally start writing the actual code for the CMORizer script. The main body
513-
of the CMORizer script must contain a function called
514-
515-
```python
516-
def cmorization(in_dir, out_dir, cfg, config_user):
517-
```
518-
519-
with this exact call signature. Here, ``in_dir`` corresponds to the input
520-
directory of the raw files, ``out_dir`` to the output directory of final
521-
reformatted data set and ``cfg`` to the configuration dictionary given by the
522-
``.yml`` configuration file. The return value of this function is ignored. All
523-
the work, i.e. loading of the raw files, processing them and saving the final
524-
output, has to be performed inside its body. To simplify this process,
525-
ESMValTool provides some convenience functions in ``utilities.py`` , which
526-
can be imported into your CMORizer by
527-
528-
```python
529-
from . import utilities as utils
530-
```
531-
532-
Apart from a function to easily save data, this module contains different
533-
kinds of small fixes to the data attributes, coordinates, and metadata which
534-
are necessary for the data field to be CMOR-compliant. We will come back to
535-
these functionalities in a bit.
536-
537-
Note that this specific CMORizer script contains several subroutines in order
538-
to make the code clearer and more readable (we strongly recommend to follow
539-
that code style). For example, the function ``_get_filepath`` converts the raw
540-
filepath to the correct one and the function ``_extract_variable`` extracts and
541-
saves a single variable from the raw data.
542-
543-
After all that theory, let's have a look at the python code of the
544-
existing "FLUXCOM" CMORizer script. For now, we only want to read in the data
545-
and then store it in a new file.
546-
547-
```python
548-
"""ESMValTool CMORizer for FLUXCOM GPP data.
549-
550-
Tier
551-
Tier 3: restricted dataset.
552-
553-
Source
554-
http://www.bgc-jena.mpg.de/geodb/BGI/Home
555-
556-
Last access
557-
20190727
558-
559-
Download and processing instructions
560-
From the website, select FLUXCOM as the data choice and click download.
561-
Two files will be displayed. One for Land Carbon Fluxes and one for
562-
Land Energy fluxes. The Land Carbon Flux file (RS + METEO) using
563-
CRUNCEP data file has several data files for different variables.
564-
The data for GPP generated using the
565-
Artificial Neural Network Method will be in files with name:
566-
GPP.ANN.CRUNCEPv6.monthly.*.nc
567-
A registration is required for downloading the data.
568-
Users in the UK with a CEDA-JASMIN account may request access to the jules
569-
workspace and access the data.
570-
Note : This data may require rechunking of the netcdf files.
571-
This constraint will not exist once iris is updated to
572-
version 2.3.0 Aug 2019
573-
"""
574-
import logging
575-
import os
576-
import re
577-
import numpy as np
578-
import iris
579-
from . import utilities as utils
580-
581-
logger = logging.getLogger(__name__)
582-
583-
584-
def _get_filepath(in_dir, basename):
585-
"""Find correct name of file (extend basename with timestamp)."""
586-
regex = re.compile(basename)
587-
588-
all_files = [
589-
f for f in os.listdir(in_dir)
590-
if os.path.isfile(os.path.join(in_dir, f))
591-
]
592-
for filename in all_files:
593-
if regex.match(filename):
594-
return os.path.join(in_dir, basename)
595-
raise OSError(
596-
f"Cannot find input file matching pattern '{basename}' in '{in_dir}'")
597-
598-
599-
def _extract_variable(cmor_info, attrs, filepath, out_dir):
600-
"""Extract variable."""
601-
var = cmor_info.short_name
602-
logger.info("Var is %s", var)
603-
cubes = iris.load(filepath)
604-
for cube in cubes:
605-
logger.info("Saving file")
606-
utils.save_variable(cube,
607-
var,
608-
out_dir,
609-
attrs,
610-
unlimited_dimensions=['time'])
611-
612-
613-
def cmorization(in_dir, out_dir, cfg, _):
614-
"""Cmorization func call."""
615-
glob_attrs = cfg['attributes']
616-
cmor_table = cfg['cmor_table']
617-
filepath = _get_filepath(in_dir, cfg['filename'])
618-
logger.info("Found input file '%s'", filepath)
619-
620-
# Run the cmorization
621-
for (var, var_info) in cfg['variables'].items():
622-
logger.info("CMORizing variable '%s'", var)
623-
glob_attrs['mip'] = var_info['mip']
624-
logger.info(var_info['mip'])
625-
cmor_info = cmor_table.get_variable(var_info['mip'], var)
626-
_extract_variable(cmor_info, glob_attrs, filepath, out_dir)
627-
```
628531
629-
Let's run this CMORizing script to see if the dataset is read correctly, and
630-
what kind of file is written out. There is a specific command available in the
631-
ESMValTool to run the CMORizing scripts:
632-
633-
```bash
634-
cmorize_obs -c <config-user.yml> -o <dataset-name>
635-
```
636532
637-
The ``config-user-yml`` is the file in which we define the different data
638-
paths, e.g. where the ESMValTool would find the "RAWOBS" folder. The
639-
``dataset-name`` needs to be idential to the folder name that was created
640-
to store the raw observation data files, in our case this would be "FLUXCOM".
641-
The ESMValTool will create a folder with the correct tier information in your
642-
defined output directory if that tier folder is not already available, and
643-
then a folder named after the data set. In this folder the cmorized data set
644-
will be stored as a netCDF file. If your run was successful, one or more
645-
NetCDF files are produced in your output directory.
646533
647534
> ## Was the CMORization successful so far?!
648535
>

0 commit comments

Comments
 (0)