11---
22title : " Writing your own diagnostic script"
3- teaching : TBD
4- exercises : TBD
3+ teaching : 20
4+ exercises : 30
55
66questions :
77- " How do I write a new diagnostic in ESMValTool?"
@@ -16,48 +16,48 @@ keypoints:
1616 with preprocessor output."
1717- " Existing diagnostics can be used as templates and modified to write new
1818 diagnostics."
19- - " Many functions can be imported from ``esmvaltool.diag_scripts.shared`` and
19+ - " Helper functions can be imported from ``esmvaltool.diag_scripts.shared`` and
2020 used in your own diagnostic script."
2121---
2222
2323## Introduction
2424
25- The diagnostic script is an important component of ESMValTool where the
25+ The diagnostic script is an important component of ESMValTool and it is where the
2626scientific analysis or performance metric is implemented. With ESMValTool, you
2727can adapt an existing diagnostic or write a new script from scratch.
2828Diagnostics can be written in a number of open source
2929languages such as Python, R, Julia and NCL but we will focus on understanding
3030and writing Python diagnostics in this lesson.
3131
3232In this lesson, we will explain how to find an existing diagnostic and run it
33- using ESMValTool installed in a editable/development mode. For a development
33+ using ESMValTool installed in editable/development mode. For a development
3434installation, see the instructions in the lesson [ Development and
3535contribution] ({{ page.root }}{% link _ episodes/08-development-setup.md %}).
3636Also, we will work with the recipe [ recipe_python.yml] [ recipe ] and the
3737diagnostic script [ diagnostic.py] [ diagnostic ] called by this recipe that we have
3838seen in the lesson [ Running your first recipe] ({{ page.root }}{% link
3939_ episodes/04-recipe.md %}).
4040
41- Let's get started.
41+ Let's get started!
4242
4343## Understanding an existing Python diagnostic
4444
45- After development installation, a folder called `` ESMValTool `` has been created
45+ After a development mode installation, a folder called `` ESMValTool `` is created
4646in your working directory. This folder contains the source code of the tool. We
4747can find the recipe `` recipe_python.yml `` and the python script
4848`` diagnostic.py `` in these directories:
4949
5050- * path_to_ESMValTool/esmvaltool/recipes/examples/recipe_python.yml*
5151- * path_to_ESMValTool/esmvaltool/diag_scripts/examples/diagnostic.py*
5252
53- Let's have look into the codes of the `` diagnostic.py `` .
53+ Let's have look at the code in `` diagnostic.py `` .
5454For reference, we show the diagnostic code in the dropdown box below.
5555There are four main sections in the script:
5656
5757- A description i.e. the `` docstring `` (line 1).
5858- Import statements (line 2-16).
5959- Functions that implement our analysis (line 21-101).
60- - A typical python top-level script i.e. `` if __name__ == '__main__' `` (line
60+ - A typical Python top-level script i.e. `` if __name__ == '__main__' `` (line
6161 102-107).
6262
6363> ## diagnostic.py
@@ -175,7 +175,7 @@ There are four main sections in the script:
175175>
176176{:.solution}
177177
178- > # # What is the starting point of the diagnostic?
178+ > # # What is the starting point of a diagnostic?
179179>
180180> 1 . Can you spot a function called `` main`` in the code above?
181181> 2 . What are its input arguments?
@@ -184,11 +184,15 @@ There are four main sections in the script:
184184>> # # Answer
185185>>
186186>> 1 . The `` main`` function is defined in line 66 as `` main(cfg)`` .
187- >> 2 . The variable `` cfg`` is a Python dictionary holding all the necessary
188- >> information needed to run the diagnostic script like the location of input
189- >> data and various settings. In the `` main`` function, we will next parse this
190- >> `` cfg`` variable and extract information as needed to do our analyses (e.g. in line 69 ).
191- >> 3 . The `` main`` function called near the very end on line 107 .
187+ >> 2 . The input argument to this function is the variable `` cfg`` , a Python dictionary t
188+ >> hat holds all the necessary
189+ >> information needed to run the diagnostic script such as the location of input
190+ >> data and various settings. We will next parse this `` cfg`` variable
191+ >> in the `` main`` function and extract information as needed
192+ >> to do our analyses (e.g. in line 69 ).
193+ >> 3 . The `` main`` function is called near the very end on line 107 . So, it is mentioned
194+ >> twice in our code - once where it is called by the top- level Python script and
195+ >> second where it is defined.
192196> {: .solution}
193197{: .challenge}
194198
@@ -211,8 +215,8 @@ The ESMValTool documentation page provides an overview of what is in this file,
211215> # # What information do I need when writing a diagnostic script?
212216>
213217> From the lesson [Configuration]({{ page.root }}{% link _episodes/ 03 - configuration.md % }),
214- > we saw how to change the configurations before running a recipe.
215- > Let ' s first set the option ``remove_preproc_dir`` to ``false`` in the configuration file,
218+ > we saw how to change the configuration settings before running a recipe.
219+ > First we set the option `` remove_preproc_dir`` to `` false`` in the configuration file ,
216220> then run the recipe `` recipe_python.yml`` :
217221>
218222> ~~~ bash
@@ -232,37 +236,37 @@ The ESMValTool documentation page provides an overview of what is in this file,
232236>> detailed information on your data including project (e.g., CMIP6 , CMIP5 ),
233237>> dataset names (e.g., BCC - ESM1 , CanESM2), variable attributes (e.g.,
234238>> standard_name, units), preprocessor applied and time range of the data. You
235- >> can use all of these information in your own diagnostics .
239+ >> can use all of this information in your own diagnostic .
236240> >
237241> >
238242> {: .solution}
239243{: .challenge}
240244
241245# # Diagnostic shared functions
242246
243- Looking at the codes of the `` diagnostic.py`` , we see that `` input_data`` is
247+ Looking at the code in `` diagnostic.py`` , we see that `` input_data`` is
244248read from the `` cfg`` dictionary (line 69 ). Now we can group the `` input_data``
245249according to some criteria such as the model or experiment. To do so,
246- ESMValTool provides many functions like `` select_metadata`` (line 72 ),
250+ ESMValTool provides many functions such as `` select_metadata`` (line 72 ),
247251`` sorted_metadata`` (line 76 ), and `` group_metadata`` (line 80 ). As you can see
248252in line 8 , these functions are imported from `` esmvaltool.diag_scripts.shared``
249- that means these are shared between several diagnostics scripts. A list of
253+ that means these are shared across several diagnostics scripts. A list of
250254available functions and their description can be found in [Shared diagnostic
251255script code][shared].
252256
253257> # # Extracting information needed for analyses
254258>
255259> We have seen the functions used for selecting, sorting and grouping data in the
256- > script. What these functions do?
260+ > script. What do these functions do?
257261>
258262>> # # Answer
259263>>
260264>> There is a statement after use of `` select_metadata`` , `` sorted_metadata``
261265>> and `` group_metadata`` that starts with `` logger.info`` (lines 73 , 77 and
262266>> 83 ). These lines print output to the log files. In the previous exercise, we
263- >> ran the recipe `` recipe_python.yml`` . If you looked at the content of the log
264- >> file `` path_to_recipe_output/ run/ map / script1/ log.txt`` , you can see the some
265- >> information on how each function works , for example:
267+ >> ran the recipe `` recipe_python.yml`` . If you look at the log
268+ >> file `` path_to_recipe_output/ run/ map / script1/ log.txt`` , you can see the output
269+ >> from each of these functions , for example:
266270>>
267271>> ```
268272>> 2021 - 03 - 05 13 :19 :38 ,184 [34706 ] INFO diagnostic,83 Example of how to group and
@@ -317,16 +321,18 @@ script code][shared].
317321
318322# # Diagnostic computation
319323
320- After grouping and selecting data, we can read individual attributes such as the
321- filename by looping over variables (line 89 - 93 ). Following this, we see the use
322- of the function `` compute_diagnostic`` (line 94 ). Let' s have a look at the
323- definition of this function in line 43 where the analyses on the data is done.
324+ After grouping and selecting data, we can read individual attributes (such as filename)
325+ of each item. Here we have grouped the input data by `` variables``
326+ so we loop over the variables (line 89 - 93 ). Following this, is a call to the
327+ function `` compute_diagnostic`` (line 94 ). Let' s have a look at the
328+ definition of this function in line 43 where the actual analysis on the data is done.
324329
330+ Note that output from the ESMValCore preprocessor is in the form of NetCDF files.
325331Here, `` compute_diagnostic`` uses
326332[Iris](https:// scitools- iris.readthedocs.io/ en/ latest/ index.html) to read data
327- from a netCDF file and performs an operation `` squeeze`` to remove any dimension
333+ from a netCDF file and performs an operation `` squeeze`` to remove any dimensions
328334of length one. We can adapt this function to add our own analysis. As an
329- example, here we want to calculate the bias toward the averege of the data as :
335+ example, here we calculate the bias using the average of the data using Iris cubes.
330336
331337~~~ python
332338def compute_diagnostic (filename ):
@@ -344,16 +350,16 @@ def compute_diagnostic(filename):
344350
345351> ## iris cubes
346352>
347- > Iris reads data into data structures called
353+ > Iris reads data from NetCDF files into data structures called
348354> [ cubes] ( https://scitools-iris.readthedocs.io/en/latest/userguide/iris_cubes.html ) .
349355> The data in these cubes can be modified, combined with other cubes' data or
350356> plotted.
351357 {: .callout}
352358
353359> ## Reading data using xarray
354360>
355- > Use the [ xarrays] ( http://xarray.pydata.org/en/stable/ ) to read the data
356- > instead of iris .
361+ > Alternately, you can use [ xarrays] ( http://xarray.pydata.org/en/stable/ ) to read the data
362+ > instead of Iris .
357363>
358364>> ## Answer
359365>>
@@ -381,17 +387,18 @@ def compute_diagnostic(filename):
381387
382388> # # Reading data using Scipy's netCDF library
383389>
384- > Use the [SciPy' s netCDF library][netCDF] to read the data instead of iris.
390+ > Yet another option to read the NetCDF file data is to use
391+ > the [SciPy' s netCDF library][netCDF].
385392>
386393>> # # Answer
387394>>
388- >> First, import `` netcdfx`` package at the top of the script as :
395+ >> First, import the `` netcdfx`` package at the top of the script as :
389396>>
390397>>~~~ python
391398>> from scipy.io import netcdfx
392399>>~~~
393400>>
394- >> Then, change the `` compute_diagnostic`` as :
401+ >> Then, change `` compute_diagnostic`` as :
395402>>
396403>>~~~ python
397404>> def compute_diagnostic(filename):
@@ -414,7 +421,7 @@ def compute_diagnostic(filename):
414421Often, the end product of a diagnostic script is a plot or figure. The Iris cube
415422returned from the `` compute_diagnostic`` function (line 94 ) is passed to the
416423`` plot_diagnostic`` function (line 101 ). Let' s have a look at the definition of
417- this function in line 53 where we would plug in our plotting routine in the
424+ this function in line 53 . This is where we would plug in our plotting routine in the
418425diagnostic script.
419426
420427More specifically, the `` quickplot`` function (line 61 ) can be replaced with the
@@ -431,17 +438,17 @@ there:
431438 cmap: Reds
432439~~~
433440
434- In this way, we can further pass arguments for `` quickplot `` such as the type of
435- plot `` pcolormesh`` and the colormap `` cmap:Reds`` from the recipe to the
436- diagnostic.
441+ This way, we can pass arguments such as the type of
442+ plot `` pcolormesh`` and the colormap `` cmap:Reds`` from the recipe to the
443+ `` quickplot `` function in the diagnostic.
437444
438445> # # Passing arguments from the recipe to the diagnostic
439446>
440447> Change the type of the plot and its colormap and inspect the output figure.
441448>
442449>> # # Answer
443450>>
444- >> In the recipe `` recipe_python.yml`` , you should change `` plot_type`` and `` cmap`` .
451+ >> In the recipe `` recipe_python.yml`` , you could change `` plot_type`` and `` cmap`` .
445452>> As an example, we choose `` plot_type: pcolor`` and `` cmap: BuGn`` :
446453>>
447454>> ~~~ yaml
@@ -458,33 +465,35 @@ diagnostic.
458465
459466> # # ESMValTool gallery
460467>
461- > ESMValTool makes it possible to produce a wide array of such figures as seen
468+ > ESMValTool makes it possible to produce a wide array of plots and figures as seen
462469> in the [gallery](https:// docs.esmvaltool.org/ en/ latest/ gallery.html).
463470{: .callout}
464471
465472# ## Saving the output
466473
467474In our example, the function `` save_data`` in line 57 is used to save the Iris
468- cube. The saved files can be found under `` work`` directory in a `` .nc`` format .
469- There is also the function `` save_figure`` in line 63 to save the plots under
470- `` plot`` directory in a `` .png`` format . Again, you may choose your own method
471- of saving the output.
475+ cube. The saved files can be found under the `` work`` directory in a `` .nc`` format .
476+ There is also the function `` save_figure`` in line 63 to save the plots under the
477+ `` plot`` directory in a `` .png`` format (or preferred format specified in your
478+ configuration settings). Again, you may choose your own method
479+ of saving the output.
472480
473481# ## Recording the provenance
474482
475- When developing a diagnostic script, we should make sure that it records the
483+ When developing a diagnostic script, it is good practice to record
476484provenance. To do so, we use the function `` get_provenance_record`` (line 99 ).
477- Let' s have a look at the definition of this function in line 21 where we
485+ Let us have a look at the definition of this function in line 21 where we
478486describe the diagnostic data and plot. Using the dictionary `` record`` , it is
479- possible to add custom provenance. Provenance is stored in the * W3C PROV XML *
487+ possible to add custom provenance to our diagnostics output.
488+ Provenance is stored in the * W3C PROV XML *
480489format and also in an * SVG * file under the `` work`` and `` plot`` directory. For
481490more information, see [recording provenance][provenance].
482491
483492# # Congratulations!
484493
485- Now you know the diagnostic script structure and a few functions. There are many
486- more functions to be used, but these should be enough to get you started! Look
487- at other recipes and diagnostics for more examples.
494+ You now know the basic diagnostic script structure and some available tools for putting
495+ together your own diagnostics. Have a look at existing recipes and diagnostics in the
496+ repository for more examples of functions you can use in your diagnostics!
488497{% include links.md % }
489498
490499[recipe]: https:// github.com/ ESMValGroup/ ESMValTool/ blob/ master/ esmvaltool/ recipes/ examples/ recipe_python.yml
0 commit comments