@@ -116,6 +116,11 @@ run the CMORizer scripts:
116116cmorize_obs -c < config-user.yml> -o < dataset-name>
117117```
118118
119+ The `` config-user-yml `` is the file in which we define the different data
120+ paths, e.g. where the ESMValTool would find the "RAWOBS" folder. The
121+ `` dataset-name `` needs to be identical to the folder name that was created
122+ to store the raw observation data files, in our case this would be "FLUXCOM".
123+
119124If everything is okay, the output should look something like this:
120125
121126~~~
@@ -286,6 +291,13 @@ def cmorization(in_dir, out_dir, cfg, _):
286291 # 3. store the data with the correct filename
287292` ` `
288293
294+ Here, ``in_dir`` corresponds to the input directory of the raw files,
295+ ` ` out_dir`` to the output directory of final reformatted data set and ``cfg`` to
296+ a configuration dictionary given by a configuration file that we will get to shortly.
297+
298+ When you type the command ``cmorize_obs`` in the terminal, ESMValTool will call
299+ this function with the settings found in your configuration files.
300+
289301> # # Note
290302>
291303> Always, always, when modifying or creating new code for the ESMValTool
@@ -294,20 +306,17 @@ def cmorization(in_dir, out_dir, cfg, _):
294306>
295307{: .callout}
296308
297- # ## 1. Finding the input data
309+ # ## 1. Find the input data and store it under the right name.
298310
299311Since the original data does not follow CMOR filename conventions, we need to
300- tell ESMValTool what the filename for this new dataset looks like. We supply
301- this information via a dataset configuration file. It is important to note that
302- the name of the configuration file has to be identical to the name of the
303- dataset. Thus, we will create a file called
312+ tell ESMValTool what the filename for this new dataset looks like. Also, we need
313+ to provide the relevant information so ESMValTool can set the correct filename
314+ for the cmorized data. We supply this information via a dataset configuration
315+ file. It is important to note that the name of the configuration file has to be
316+ identical to the name of the dataset. Thus, we will create a file called
304317` <path_to_esmvaltool>/esmvaltool/cmorizers/obs/cmor_config/FLUXCOM.yml` .
305318
306- In addition to the filename information, the configuration file also contains
307- information about "global attributes" for the netCDF file that will be
308- created and information about the variables that need to be CMORized.
309-
310- > # # Let's create the configuration file for the "FLUXCOM" dataset
319+ > # # Create the configuration file for the "FLUXCOM" dataset
311320>
312321> Here is the skeleton of the "FLUXCOM" configuration file as it exists in
313322> the ESMValTool framework. Try to fill in all missing pieces of information
@@ -362,9 +371,7 @@ created and information about the variables that need to be CMORized.
362371> > mip: Lmon
363372> > ```
364373> >
365- > > The original configuration file for the "FLUXCOM" dataset can be found here:
366- > > [FLUXCOM.yml](https://github.com/ESMValGroup/ESMValTool/blob/master/esmvaltool/cmorizers/obs/cmor_config/FLUXCOM.yml)
367- > >
374+ > > *Suggestion: maybe add the reference under step 3 (additional but not strictly necessary steps)*
368375> > Note the attribute "reference" here: it should include a ``doi`` related to
369376> > the dataset. For more information on how to add references to the
370377> > ``reference`` section of the configuration file, see the section in the
@@ -374,18 +381,14 @@ created and information about the variables that need to be CMORized.
374381> {: .solution}
375382{: .challenge}
376383
384+ # ##### Here we need to add python code to the cmorizer script
377385
378- # ## 2. Implementing additional fixes
386+ so that we can run it and see whether it was able to find the correct input and create the right output.
379387
380388
381389
382- # ## 3. Finalizing the CMORizer
383-
384- Once everything works as expected, there's a couple of things that we can still do.
390+ # ## 2. Implementing additional fixes
385391
386- - Add header info
387- - Make sure the metadata are added to the config file
388- - Maybe go through a checklist????
389392
390393> # # Run the test recipe again
391394>
@@ -447,10 +450,27 @@ problems. So let's start writing a short python script that will fix these
447450problems.
448451
449452
450- *PK: I'd suggest doing the header last, as it's not needed or relevant in the beginning*.
451- But the very first part of the CMORizing script is a header. The header
452- contains information about where to obtain the data, when it was accessed
453- the last time, which ESMValTool "tier" it is associated with, and more
453+ To simplify this process, ESMValTool provides some convenience functions in
454+ ` ` utilities.py`` , which we already included in the boilerplate code above.
455+
456+ Apart from a function to easily save data, this module contains different
457+ kinds of small fixes to the data attributes, coordinates, and metadata which
458+ are necessary for the data field to be CMOR-compliant. We will come back to
459+ these functionalities in a bit.
460+
461+
462+ # ## 3. Finalizing the CMORizer
463+
464+ Once everything works as expected, there's a couple of things that we can still do.
465+
466+ - Add header info
467+ - Make sure the metadata are added to the config file
468+ - Maybe go through a checklist????
469+ - add an entry to config-references?
470+
471+
472+ The header contains information about where to obtain the data, when it was
473+ accessed the last time, which ESMValTool "tier" it is associated with, and more
454474detailed information about the necessary downloading and processing steps.
455475
456476> # # Fill out the header for the "FLUXCOM" dataset
@@ -508,141 +528,8 @@ detailed information about the necessary downloading and processing steps.
508528
509529
510530
511- Now that we have defined the configuration file for our "FLUXCOM" data, we can
512- finally start writing the actual code for the CMORizer script. The main body
513- of the CMORizer script must contain a function called
514-
515- ` ` ` python
516- def cmorization(in_dir, out_dir, cfg, config_user):
517- ` ` `
518-
519- with this exact call signature. Here, ``in_dir`` corresponds to the input
520- directory of the raw files, ``out_dir`` to the output directory of final
521- reformatted data set and ``cfg`` to the configuration dictionary given by the
522- ` ` .yml`` configuration file. The return value of this function is ignored. All
523- the work, i.e. loading of the raw files, processing them and saving the final
524- output, has to be performed inside its body. To simplify this process,
525- ESMValTool provides some convenience functions in ``utilities.py`` , which
526- can be imported into your CMORizer by
527-
528- ` ` ` python
529- from . import utilities as utils
530- ` ` `
531-
532- Apart from a function to easily save data, this module contains different
533- kinds of small fixes to the data attributes, coordinates, and metadata which
534- are necessary for the data field to be CMOR-compliant. We will come back to
535- these functionalities in a bit.
536-
537- Note that this specific CMORizer script contains several subroutines in order
538- to make the code clearer and more readable (we strongly recommend to follow
539- that code style). For example, the function ``_get_filepath`` converts the raw
540- filepath to the correct one and the function ``_extract_variable`` extracts and
541- saves a single variable from the raw data.
542-
543- After all that theory, let's have a look at the python code of the
544- existing "FLUXCOM" CMORizer script. For now, we only want to read in the data
545- and then store it in a new file.
546-
547- ` ` ` python
548- """ESMValTool CMORizer for FLUXCOM GPP data.
549-
550- Tier
551- Tier 3: restricted dataset.
552-
553- Source
554- http://www.bgc-jena.mpg.de/geodb/BGI/Home
555-
556- Last access
557- 20190727
558-
559- Download and processing instructions
560- From the website, select FLUXCOM as the data choice and click download.
561- Two files will be displayed. One for Land Carbon Fluxes and one for
562- Land Energy fluxes. The Land Carbon Flux file (RS + METEO) using
563- CRUNCEP data file has several data files for different variables.
564- The data for GPP generated using the
565- Artificial Neural Network Method will be in files with name:
566- GPP.ANN.CRUNCEPv6.monthly.*.nc
567- A registration is required for downloading the data.
568- Users in the UK with a CEDA-JASMIN account may request access to the jules
569- workspace and access the data.
570- Note : This data may require rechunking of the netcdf files.
571- This constraint will not exist once iris is updated to
572- version 2.3.0 Aug 2019
573- """
574- import logging
575- import os
576- import re
577- import numpy as np
578- import iris
579- from . import utilities as utils
580-
581- logger = logging.getLogger(__name__)
582-
583-
584- def _get_filepath(in_dir, basename):
585- """Find correct name of file (extend basename with timestamp)."""
586- regex = re.compile(basename)
587-
588- all_files = [
589- f for f in os.listdir(in_dir)
590- if os.path.isfile(os.path.join(in_dir, f))
591- ]
592- for filename in all_files:
593- if regex.match(filename):
594- return os.path.join(in_dir, basename)
595- raise OSError(
596- f"Cannot find input file matching pattern '{basename}' in '{in_dir}'")
597-
598-
599- def _extract_variable(cmor_info, attrs, filepath, out_dir):
600- """Extract variable."""
601- var = cmor_info.short_name
602- logger.info("Var is %s", var)
603- cubes = iris.load(filepath)
604- for cube in cubes:
605- logger.info("Saving file")
606- utils.save_variable(cube,
607- var,
608- out_dir,
609- attrs,
610- unlimited_dimensions=['time'])
611-
612-
613- def cmorization(in_dir, out_dir, cfg, _):
614- """Cmorization func call."""
615- glob_attrs = cfg['attributes']
616- cmor_table = cfg['cmor_table']
617- filepath = _get_filepath(in_dir, cfg['filename'])
618- logger.info("Found input file '%s'", filepath)
619-
620- # Run the cmorization
621- for (var, var_info) in cfg['variables'].items():
622- logger.info("CMORizing variable '%s'", var)
623- glob_attrs['mip'] = var_info['mip']
624- logger.info(var_info['mip'])
625- cmor_info = cmor_table.get_variable(var_info['mip'], var)
626- _extract_variable(cmor_info, glob_attrs, filepath, out_dir)
627- ` ` `
628531
629- Let's run this CMORizing script to see if the dataset is read correctly, and
630- what kind of file is written out. There is a specific command available in the
631- ESMValTool to run the CMORizing scripts :
632-
633- ` ` ` bash
634- cmorize_obs -c <config-user.yml> -o <dataset-name>
635- ` ` `
636532
637- The ``config-user-yml`` is the file in which we define the different data
638- paths, e.g. where the ESMValTool would find the "RAWOBS" folder. The
639- ` ` dataset-name`` needs to be idential to the folder name that was created
640- to store the raw observation data files, in our case this would be "FLUXCOM".
641- The ESMValTool will create a folder with the correct tier information in your
642- defined output directory if that tier folder is not already available, and
643- then a folder named after the data set. In this folder the cmorized data set
644- will be stored as a netCDF file. If your run was successful, one or more
645- NetCDF files are produced in your output directory.
646533
647534> # # Was the CMORization successful so far?!
648535>
0 commit comments