@@ -115,7 +115,7 @@ run the existing one. There is a specific command available in the ESMValTool to
115115run the CMORizer scripts:
116116
117117``` bash
118- cmorize_obs -c < config-user.yml> -o < dataset-name>
118+ esmvaltool data format --config_file < path to config-user.yml> < dataset-name>
119119```
120120
121121The `` config-user.yml `` is the file in which we define the different data
@@ -137,40 +137,38 @@ If everything is okay, the output should look something like this:
137137
138138~~~
139139...
140- ... Starting the CMORization Tool at time: 2021-02 -26 14:02:16 UTC
140+ ... Starting the CMORization Tool at time: 2022-07 -26 14:02:16 UTC
141141... ----------------------------------------------------------------------
142142... input_dir = /home/peter/data/RAWOBS
143- ... output_dir = /home/peter/esmvaltool_output/cmorize_obs_20210226_140216
143+ ... output_dir = /home/peter/esmvaltool_output/data_formatting_20220726_140216
144144... ----------------------------------------------------------------------
145145... Running the CMORization scripts.
146- ... Using cmorizer scripts repository: /home/peter/miniconda3/envs/esmvaltool/lib/python3.8/site-packages/esmvaltool/cmorizers/obs
147- ... Processing datasets {'Tier3': ['FLUXCOM']}
146+ ... Processing datasets ['FLUXCOM']
148147... Input data from: /home/peter/data/RAWOBS/Tier3/FLUXCOM
149- ... Output will be written to: /home/peter/esmvaltool_output/cmorize_obs_20210226_140216 /Tier3/FLUXCOM
150- ... Reformat script: /home/peter/miniconda3 /envs/esmvaltool/lib/python3.8 /site-packages/esmvaltool/cmorizers/obs/cmorize_obs_fluxcom
151- ... CMORizing dataset FLUXCOM using Python script /home/peter/miniconda3 /envs/esmvaltool/lib/python3.8 /site-packages/esmvaltool/cmorizers/obs/cmorize_obs_fluxcom .py
148+ ... Output will be written to: /home/peter/esmvaltool_output/data_formatting_20220726_140216 /Tier3/FLUXCOM
149+ ... Reformat script: /home/peter/mambaforge /envs/esmvaltool/lib/python3.9 /site-packages/esmvaltool/cmorizers/data/formatters/datasets/fluxcom
150+ ... CMORizing dataset FLUXCOM using Python script /home/peter/mambaforge /envs/esmvaltool/lib/python3.9 /site-packages/esmvaltool/cmorizers/data/formatters/datasets/fluxcom .py
152151... Found input file '/home/peter/data/RAWOBS/Tier3/FLUXCOM/GPP.ANN.CRUNCEPv6.monthly.*.nc'
153152... CMORizing variable 'gpp'
154153... Lmon
155154... Var is gpp
156155... ... UserWarning: Ignoring netCDF variable 'GPP' invalid units 'gC m-2 day-1'
157- warnings.warn(msg)
156+
158157... Fixing time...
159158... Fixing latitude...
160159... Fixing longitude...
161160... Flipping dimensional coordinate latitude...
162161... Saving file
163- ... Converting data type of data from 'float64' to 'float32'
164- ... Saving: /home/peter/esmvaltool_output/cmorize_obs_20210226_140216/Tier3/FLUXCOM/OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_200001-200012.nc
162+ ... Saving: /home/peter/esmvaltool_output/data_formatting_20220726_140216/Tier3/FLUXCOM/OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_200001-200012.nc
165163... Cube has lazy data [lazy is preferred]
166- ... Ending the CMORization Tool at time: 2021-02-26 14:02:16 UTC
167- ... Time for running the CMORization scripts was: 0:00:00.605970
164+ ... CMORization of dataset FLUXCOM finished!
165+ ... Formatting successful for dataset FLUXCOM
168166~~~
169167{: .output}
170168
171169So you can see that several fixes are applied, and the CMORized file is written
172170to the ESMValTool output directory, i.e.
173- ` ~/esmvaltool_output/cmorize_obs_YYYYMMDD_HHMMSS /TierX/dataset-name/filename.nc`
171+ ` ~/esmvaltool_output/data_formatting_YYYYMMDD_HHMMSS /TierX/dataset-name/filename.nc`
174172In order to use it, we'll have to copy it from the output directory to a folder
175173called `~/data/OBS/Tier3/FLUXCOM` and make sure the path to ``OBS`` is set
176174correctly in our config-user file :
@@ -181,9 +179,9 @@ rootpath:
181179` ` `
182180
183181You can also see the path where ESMValTool stores the reformatting script :
184- ` ~/ESMValTool/esmvaltool/cmorizers/obs/cmorize_obs_fluxcom .py` . You may
182+ ` ~/ESMValTool/esmvaltool/data/formatters/datasets/fluxcom .py` . You may
185183have a look at this file if you want. The script also uses a configuration file :
186- ` ~/ESMValTool/esmvaltool/cmorizers/obs /cmor_config/FLUXCOM.yml` .
184+ ` ~/ESMValTool/esmvaltool/cmorizers/data /cmor_config/FLUXCOM.yml` .
187185
188186# # Make a test recipe
189187
@@ -194,8 +192,8 @@ CMORized, ESMValTool will give a warning or error.
194192
195193> # # Create a test recipe
196194>
197- > Create a simple recipe called ` recipe_check_fluxcom.yml` that loads the
198- > FLUXCOM data. It should include a datasets section with a single entry for
195+ > Create a simple recipe called [ recipe_check_fluxcom.yml](../files/recipe_check_fluxcom.yml)
196+ > that loads the FLUXCOM data. It should include a datasets section with a single entry for
199197> the "FLUXCOM" dataset with the correct dataset keys, and a diagnostics section
200198> with two variables: gpp. We don't need any preprocessors or
201199> scripts (set `scripts: null`), but we have to add a documentation section with
@@ -223,7 +221,7 @@ CMORized, ESMValTool will give a warning or error.
223221> > documentation:
224222> >
225223> > description: Test recipe for FLUXCOM data
226- >> title: This is a test recipe for the FLUXCOM data.
224+ > > title: This is a test recipe for the FLUXCOM data.
227225> >
228226> > authors:
229227> > - kalverla_peter
@@ -251,7 +249,7 @@ CMORized, ESMValTool will give a warning or error.
251249Try to run the example recipe with
252250
253251` ` ` bash
254- esmvaltool run recipe_check_fluxcom.yml --log_level debug
252+ esmvaltool run recipe_check_fluxcom.yml --config_file <path to config-user.yml> -- log_level debug
255253` ` `
256254
257255If everything is okay, the recipe should run without problems.
@@ -266,16 +264,16 @@ test recipe will not be able to use it anymore.
266264
267265` ` ` bash
268266rm ~/data/OBS/Tier3/FLUXCOM/OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_200001-200012.nc
269- rm ~/ESMValTool/esmvaltool/cmorizers/obs/cmorize_obs_fluxcom .py
270- rm ~/ESMValTool/esmvaltool/cmorizers/obs /cmor_config/FLUXCOM.yml
267+ rm ~/ESMValTool/esmvaltool/cmorizers/data/formatters/datasets/fluxcom .py
268+ rm ~/ESMValTool/esmvaltool/cmorizers/data /cmor_config/FLUXCOM.yml
271269` ` `
272270
273271If you now run the test recipe again it should fail, and somewhere in the output
274272you should find something like :
275273
276274~~~
277275No input files found for ...
278- Looking for files matching ['OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp[_.]*nc'] in [' /home/peter/data/OBS/Tier3/FLUXCOM']
276+ Looked for files matching: /home/peter/data/OBS/Tier3/FLUXCOM/OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp[_.]*nc
279277~~~
280278{: .error}
281279
@@ -285,10 +283,10 @@ the file so that it follows the CMOR filename conventions.
285283# # Create a new CMORizer script and a corresponding config file
286284
287285The first step now is to create a new file in the right folder that will contain
288- our new CMORizer instructions. Create a file called ``cmorize_obs_fluxcom .py``
286+ our new CMORizer instructions. Create a file called ``fluxcom .py``
289287
290288` ` ` bash
291- nano ~/ESMValTool/esmvaltool/cmorizers/obs/cmorize_obs_fluxcom .py
289+ nano ~/ESMValTool/esmvaltool/cmorizers/data/formatters/datasets/fluxcom .py
292290` ` `
293291
294292and fill it with the following boilerplate code :
@@ -299,11 +297,11 @@ and fill it with the following boilerplate code:
299297<We will add some useful info here later>
300298"""
301299import logging
302- from . import utilities as utils
300+ from esmvaltool.cmorizers.data import utilities as utils
303301
304302logger = logging.getLogger(__name__)
305303
306- def cmorization(in_dir, out_dir, cfg, _ ):
304+ def cmorization(in_dir, out_dir, cfg, cfg_user, start_date, end_date ):
307305 """Cmorize the dataset."""
308306
309307 # This is where you'll add the cmorization code
@@ -315,11 +313,14 @@ def cmorization(in_dir, out_dir, cfg, _):
315313Here, ``in_dir`` corresponds to the input directory of the raw files,
316314` ` out_dir`` to the output directory of final reformatted data set and ``cfg`` to
317315a configuration dictionary given by a configuration file that we will get to
318- shortly. When you type the command ``cmorize_obs`` in the terminal, ESMValTool
319- will call this function with the settings found in your configuration files.
316+ shortly. The last three arguments will not be considered in this script but
317+ can be used in other cases. ``cfg_user`` corresponds to the user configuration
318+ file, ``start_date`` to the start of the period to format, and ``end_date`` to
319+ the end of the period to format. When you type the command ``esmvaltool data format``
320+ in the terminal, ESMValTool will call this function with the settings found in your configuration files.
320321
321322The ESMValTool CMORizer also needs a dataset configuration file. Create a file
322- called `~/ESMValTool/esmvaltool/cmorizers/obs /cmor_config/FLUXCOM.yml`
323+ called `~/ESMValTool/esmvaltool/cmorizers/data /cmor_config/FLUXCOM.yml`
323324and fill it with the following boilerplate :
324325
325326` ` ` yaml
@@ -369,7 +370,7 @@ You can try running the CMORizer at this point, and it should work without
369370errors. However, it doesn't produce any output yet :
370371
371372` ` ` bash
372- cmorize_obs -c < config-user.yml> -o FLUXCOM
373+ esmvaltool data format --config_user <path to config-user.yml> FLUXCOM
373374` ` `
374375
375376# ## 1. Find the input data
@@ -384,7 +385,14 @@ logger.info("in_dir: '%s'", in_dir)
384385logger.info("cfg: '%s'", cfg)
385386` ` `
386387
387- If you run the CMORizer again, it will print out the content of these variables.
388+ If you run the CMORizer again, it will print out the content of these variables
389+ and the output should contain something like this :
390+
391+ ~~~
392+ ... in_dir: '/home/peter/data/RAWOBS/Tier3/FLUXCOM'
393+ ... cfg: '{'attributes': {'project_id': 'OBS6', 'comment': ''}, 'cmor_table': <esmvalcore.cmor.table.CMIP6Info object at 0x7fbd0a0f6bf0>}'
394+ ~~~
395+ {: .output}
388396
389397> # # Load the data
390398>
@@ -436,7 +444,7 @@ We already have the `cube` and the `outdir`. The variable short name (`var`) and
436444attributes (`attrs`) are set through the configuration file. So we need to find out what the correct short name and attributes are.
437445
438446The standard attributes for CMIP variables are defined in the [CMIP
439- tables](https://github.com/ESMValGroup/ESMValCore/tree/master /esmvalcore/cmor/tables/cmip6/Tables).
447+ tables](https://github.com/ESMValGroup/ESMValCore/tree/main /esmvalcore/cmor/tables/cmip6/Tables).
440448These tables are differentiated according to the "MIP" they belong to. The
441449tables are a copy of the [PCMDI](https://github.com/PCMDI) guidelines.
442450
@@ -452,7 +460,7 @@ tables are a copy of the [PCMDI](https://github.com/PCMDI) guidelines.
452460> > The variable "gpp" belongs to the land variables. The temporal resolution that we are looking
453461> > for is "monthly". This information points to the "Lmon" CMIP table. And indeed, the variable
454462> > "gpp" can be found in the file
455- > > [here](https://github.com/ESMValGroup/ESMValCore/blob/master /esmvalcore/cmor/tables/cmip6/Tables/CMIP6_Lmon.json).
463+ > > [here](https://github.com/ESMValGroup/ESMValCore/blob/main /esmvalcore/cmor/tables/cmip6/Tables/CMIP6_Lmon.json).
456464> >
457465> {: .solution}
458466{: .challenge}
@@ -517,8 +525,9 @@ However, this makes it possible to add more variables later on.
517525> # # Was the CMORization successful so far?
518526>
519527> If you run the CMORizer again, you should see that it creates an output file
520- > named ``OBS_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_xxxx01-xxxx12.nc``. The "xxxx" and
521- > "yyyy" represent the start and end year of the data.
528+ > named ``OBS6_FLUXCOM_reanaly_ANN-v1_Lmon_gpp_xxxx01-xxxx12.nc`` stored in your
529+ > ESMValTool output directory `~/esmvaltool_output/data_formatting_YYYYMMDD_HHMMSS/Tier3/FLUXCOM/`.
530+ > The "xxxx" and "yyyy" represent the start and end year of the data.
522531>
523532{: .callout}
524533
@@ -574,7 +583,7 @@ address in the next section.
574583
575584# ## 3. Implementing additional fixes
576585
577- Copy the output of the CMORizer to your folder `~/data/OBS6/Tier3/`
586+ Copy the output of the CMORizer to your folder `~/data/OBS6/Tier3/FLUXCOM/ `
578587and change the test recipe to look for OBS6 data instead of OBS (note : we're
579588upgrading the CMORizer to newer standards here!). Make sure the path to ``OBS6``
580589is set correctly in our config-user file :
@@ -587,7 +596,7 @@ rootpath:
587596If we now run the test recipe on our newly 'CMORized' data,
588597
589598` ` ` bash
590- esmvaltool run recipe_check_fluxcom.yml --log_level debug
599+ esmvaltool run recipe_check_fluxcom.yml --config_file <path to config-user.yml> -- log_level debug
591600` ` `
592601
593602it should be able to find the correct file, but it does not succeed yet. The first
@@ -608,7 +617,8 @@ we can use it, we'll also need to make sure the coordinates have the correct
608617standard name. Add the following code to your cmorizer :
609618
610619` ` ` python
611- # Fix/add coordinate information and metadata
620+ # 2. Apply the necessary fixes
621+ # 2a. Fix/add coordinate information and metadata
612622cube.coord('lat').standard_name = 'latitude'
613623cube.coord('lon').standard_name = 'longitude'
614624utils.fix_coords(cube)
@@ -618,7 +628,7 @@ With some additional refactoring, our cmorization function might then look
618628something like this :
619629
620630` ` ` python
621- def cmorization(in_dir, out_dir, cfg, _ ):
631+ def cmorization(in_dir, out_dir, cfg, cfg_user, start_date, end_date ):
622632 """Cmorize the dataset."""
623633
624634 # Get general information from the config file
@@ -649,9 +659,10 @@ def cmorization(in_dir, out_dir, cfg, _):
649659 utils.save_variable(cube=cube, var=short_name, outdir=out_dir, attrs=all_attributes)
650660` ` `
651661
652- Have a look at the netCDF file, and confirm that the coordinates now have much
653- more metadata added to them. Then, run the test recipe again with the latest
654- CMORizer output. The next error is :
662+ Run the CMORizer script once more. Have a look at the netCDF file,
663+ and confirm that the coordinates now have much more metadata added to them.
664+ Then, run the test recipe again with the latest CMORizer output.
665+ The next error is :
655666
656667~~~
657668esmvalcore.cmor.check.CMORCheckError : There were errors in variable GPP:
@@ -754,6 +765,91 @@ Once everything works as expected, there's a couple of things that we can still
754765{: .challenge}
755766
756767
768+ - **Fill the dataset information list**. The file
769+ [datasets.yml](https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/data/datasets.yml)
770+ contains the ESMValTool "tier", the data source, the last access time and
771+ download instructions for all supported datasets in ESMValTool. You can
772+ simply reuse the information written in the header of the CMORizer.
773+
774+ > # # Fill out the FLUXCOM entry in ``datasets.yml``
775+ >
776+ > Fill out the FLUXCOM entry in ``datasets.yml``. The different parts that need to be
777+ > present in the entry are the following:
778+ >
779+ > - Dataset-name
780+ > - Tier
781+ > - Source
782+ > - Last access
783+ > - Download and processing instructions
784+ >
785+ > > # # Answers
786+ > >
787+ > > The entry for the "FLUXCOM" dataset should look like:
788+ > >
789+ > > ```yaml
790+ > > FLUXCOM:
791+ > > tier: 3
792+ > > source: http://www.bgc-jena.mpg.de/geodb/BGI/Home
793+ > > last_access: 2019-07-27
794+ > > info: |
795+ > > From the website, select FLUXCOM as the data choice and click download.
796+ > > Two files will be displayed. One for Land Carbon Fluxes and one for
797+ > > Land Energy fluxes. The Land Carbon Flux file (RS + METEO) using
798+ > > CRUNCEP data file has several data files for different variables.
799+ > > The data for GPP generated using the
800+ > > Artificial Neural Network Method will be in files with name:
801+ > > GPP.ANN.CRUNCEPv6.monthly.*.nc
802+ > > A registration is required for downloading the data.
803+ > > Users in the UK with a CEDA-JASMIN account may request access to the jules
804+ > > workspace and access the data.
805+ > > Note : This data may require rechunking of the netcdf files.
806+ > > This constraint will not exist once iris is updated to
807+ > > version 2.3.0 Aug 2019
808+ > > ```
809+ > {: .solution}
810+ {: .challenge}
811+
812+ Once the ``datasets.yml`` file is filled, you can check that ESMValTool can
813+ display information about the added dataset with :
814+
815+ ` ` ` bash
816+ esmvaltool data info FLUXCOM
817+ ` ` `
818+
819+ If everything is okay, the output should look something like this :
820+
821+ ~~~
822+ $ esmvaltool data info FLUXCOM
823+ FLUXCOM
824+
825+ Tier : 3
826+ Source : http://www.bgc-jena.mpg.de/geodb/BGI/Home
827+ Automatic download : No
828+
829+ From the website, select FLUXCOM as the data choice and click download.
830+ Two files will be displayed. One for Land Carbon Fluxes and one for
831+ Land Energy fluxes. The Land Carbon Flux file (RS + METEO) using
832+ CRUNCEP data file has several data files for different variables.
833+ The data for GPP generated using the
834+ Artificial Neural Network Method will be in files with name :
835+ GPP.ANN.CRUNCEPv6.monthly.*.nc
836+ A registration is required for downloading the data.
837+ Users in the UK with a CEDA-JASMIN account may request access to the jules
838+ workspace and access the data.
839+ Note : This data may require rechunking of the netcdf files.
840+ This constraint will not exist once iris is updated to
841+ version 2.3.0 Aug 2019
842+ ~~~
843+ {: .output}
844+
845+ Note that ``Automatic download : No`` means that no automatic downloading script
846+ is available in ESMValTool for this dataset. The implementation of such a
847+ script is beyond the scope of this tutorial. To find out which datasets come
848+ with an automatic download script, you can run : ` ` esmvaltool data list`` to
849+ list all datasets supported in ESMValTool. More information about the usage
850+ of automatic downloading scripts can be found in the
851+ [User Guide](https://docs.esmvaltool.org/en/latest/develop/dataset.html#downloader-script-optional).
852+
757853- **Complete the metadata in the config file**. We have left a few fields empty
758854 in the configuration file, such as 'source'. By filling out these fields we can
759855 make sure the relevant metadata is passed on as attributes in the CMORized
@@ -776,7 +872,7 @@ utils.set_global_atts(cube, attributes)
776872- **Add documentation**. Make sure that you have added the info of your dataset
777873 to the User Guide so that people know it is available for the ESMValTool
778874 [Obtaining input
779- data](https://github.com/ESMValGroup/ESMValTool/blob/master /doc/sphinx/source/input.rst).
875+ data](https://github.com/ESMValGroup/ESMValTool/blob/main /doc/sphinx/source/input.rst).
780876
781877
782878# # Some final comments
0 commit comments