|
5 | 5 | "id": "c042f571-90e9-4160-ae53-bdbc5a165525", |
6 | 6 | "metadata": {}, |
7 | 7 | "source": [ |
8 | | - "# ACCESS-MOPPeR Getting Started" |
| 8 | + "# ACCESS-MOPPeR Getting Started\n", |
| 9 | + "\n", |
| 10 | + "Welcome to the ACCESS-MOPPeR Getting Started guide!\n", |
| 11 | + "\n", |
| 12 | + "This notebook will walk you through the initial setup and basic usage of ACCESS-MOPPeR, a tool designed to post-process ACCESS model output and produce CMIP-compliant datasets. You’ll learn how to configure your environment, prepare your data, and run the CMORisation workflow using both the Python API and Dask for scalable processing.\n", |
| 13 | + "\n", |
| 14 | + "By following this guide, you’ll be able to:\n", |
| 15 | + "- Set up your user configuration\n", |
| 16 | + "- Select input data files\n", |
| 17 | + "- Run the CMORisation process for selected variables\n", |
| 18 | + "- Inspect and save the processed output\n" |
9 | 19 | ] |
10 | 20 | }, |
11 | 21 | { |
|
28 | 38 | }, |
29 | 39 | { |
30 | 40 | "cell_type": "code", |
31 | | - "execution_count": 1, |
| 41 | + "execution_count": null, |
32 | 42 | "id": "80dbbe95-35ea-43d1-a1a0-cea79082b2eb", |
33 | 43 | "metadata": {}, |
34 | 44 | "outputs": [ |
|
46 | 56 | } |
47 | 57 | ], |
48 | 58 | "source": [ |
49 | | - "from access_mopper import ACCESS_ESM_CMORiser\n", |
50 | | - "import dask.distributed as dask" |
| 59 | + "from access_mopper import ACCESS_ESM_CMORiser" |
| 60 | + ] |
| 61 | + }, |
| 62 | + { |
| 63 | + "cell_type": "markdown", |
| 64 | + "id": "eae38f8c", |
| 65 | + "metadata": {}, |
| 66 | + "source": [ |
| 67 | + "## Dask support\n", |
| 68 | + "\n", |
| 69 | + "ACCESS-MOPPeR supports Dask for parallel processing, which can significantly speed up the CMORisation workflow, especially when working with large datasets. To use Dask with ACCESS-MOPPeR, you can create a Dask client it will be used to manage the distributed computation. This allows you to take advantage of multiple CPU cores or even a cluster of machines, depending on your setup.\n", |
| 70 | + "You can configure the Dask client to use a specific number of threads per worker, which can help optimize performance based on your hardware and the size of the datasets you are processing.\n", |
| 71 | + "\n", |
| 72 | + "Here's an example of how to set up a Dask client:\n", |
| 73 | + "\n", |
| 74 | + "```python\n", |
| 75 | + "import dask.distributed as dask\n", |
| 76 | + "\n", |
| 77 | + "client = dask.Client(threads_per_worker=1)\n", |
| 78 | + "client\n", |
| 79 | + "```" |
51 | 80 | ] |
52 | 81 | }, |
53 | 82 | { |
54 | 83 | "cell_type": "code", |
55 | | - "execution_count": 2, |
| 84 | + "execution_count": null, |
56 | 85 | "id": "9000d152-d67c-49ad-a648-025a0808cfe8", |
57 | 86 | "metadata": {}, |
58 | 87 | "outputs": [ |
|
734 | 763 | } |
735 | 764 | ], |
736 | 765 | "source": [ |
| 766 | + "import dask.distributed as dask\n", |
| 767 | + "\n", |
737 | 768 | "client = dask.Client(threads_per_worker = 1)\n", |
738 | 769 | "client" |
739 | 770 | ] |
740 | 771 | }, |
| 772 | + { |
| 773 | + "cell_type": "markdown", |
| 774 | + "id": "d14ad618", |
| 775 | + "metadata": {}, |
| 776 | + "source": [ |
| 777 | + "## Data selection\n", |
| 778 | + "\n", |
| 779 | + "The `ACCESS_ESM_CMORiser` class (described in detail below) takes as input a list of paths to NetCDF files containing the raw model output variables to be CMORised. The CMORiser does **not** assume any specific folder structure, DRS (Data Reference Syntax), or file naming convention. It is intentionally left to the user to ensure that the provided files contain the original variables required for CMORisation.\n", |
| 780 | + "\n", |
| 781 | + "This design is intentional: ACCESS-NRI plans to integrate ACCESS-MOPPeR into extended workflows that leverage the [ACCESS-NRI Intake Catalog](https://github.com/ACCESS-NRI/access-nri-intake-catalog) or evaluation frameworks such as [ESMValTool](https://www.esmvaltool.org/) and [ILAMB](https://www.ilamb.org/). By decoupling file selection from the CMORiser, ACCESS-MOPPeR can be flexibly used in a variety of data processing and evaluation pipelines." |
| 782 | + ] |
| 783 | + }, |
741 | 784 | { |
742 | 785 | "cell_type": "code", |
743 | | - "execution_count": 3, |
| 786 | + "execution_count": null, |
744 | 787 | "id": "f49fd1d4-dcb6-47a8-9d4a-731a7ca1ea0d", |
745 | 788 | "metadata": {}, |
746 | 789 | "outputs": [], |
747 | 790 | "source": [ |
| 791 | + "# Here we use netcdf file from a raw ACCESS-ESM run.\n", |
748 | 792 | "import glob\n", |
749 | 793 | "files = glob.glob(\"../../Test_data/esm1-6/atmosphere/aiihca.pa-0961*_mon.nc\")" |
750 | 794 | ] |
751 | 795 | }, |
| 796 | + { |
| 797 | + "cell_type": "markdown", |
| 798 | + "id": "d458b955", |
| 799 | + "metadata": {}, |
| 800 | + "source": [ |
| 801 | + "### Parent experiment information\n", |
| 802 | + "\n", |
| 803 | + "In CMIP workflows, providing parent experiment information is required for proper data provenance and traceability. This metadata describes the relationship between your experiment and its parent (for example, a historical run branching from a piControl simulation), and is essential for CMIP data publication and compliance.\n", |
| 804 | + "\n", |
| 805 | + "However, for some applications—such as when using ACCESS-MOPPeR to interact with evaluation frameworks like [ESMValTool](https://www.esmvaltool.org/) or [ILAMB](https://www.ilamb.org/)—strict CMIP compliance is not always necessary. In these cases, you may choose to skip providing parent experiment information to simplify the workflow.\n", |
| 806 | + "\n", |
| 807 | + "If you choose to skip this step, ACCESS-MOPPeR will issue a warning to let you know that, if you write the output to disk, the resulting file may not be compatible with CMIP requirements for publication. This flexibility allows you to use ACCESS-MOPPeR for rapid evaluation and prototyping, while still supporting full CMIP compliance when needed." |
| 808 | + ] |
| 809 | + }, |
752 | 810 | { |
753 | 811 | "cell_type": "code", |
754 | 812 | "execution_count": 4, |
|
769 | 827 | "}" |
770 | 828 | ] |
771 | 829 | }, |
| 830 | + { |
| 831 | + "cell_type": "markdown", |
| 832 | + "id": "68b05b80", |
| 833 | + "metadata": { |
| 834 | + "vscode": { |
| 835 | + "languageId": "markdown" |
| 836 | + } |
| 837 | + }, |
| 838 | + "source": [ |
| 839 | + "## Set up the CMORiser for CMORisation\n", |
| 840 | + "\n", |
| 841 | + "To begin the CMORisation process, you need to create an instance of the `ACCESS_ESM_CMORiser` class. This class requires several key parameters, including the list of input NetCDF files and metadata describing your experiment.\n", |
| 842 | + "\n", |
| 843 | + "A crucial parameter is the `compound_name`, which should be specified using the full CMIP convention: `table.variable` (for example, `Amon.rsds`). This format uniquely identifies the variable, its frequency (e.g., monthly, daily), and the associated CMIP table, ensuring that all requirements for grids and metadata are correctly handled. Using the full compound name helps avoid ambiguity and guarantees that the CMORiser applies the correct standards for each variable.\n", |
| 844 | + "\n", |
| 845 | + "You can also provide additional metadata such as `experiment_id`, `source_id`, `variant_label`, and `grid_label` to ensure your output is CMIP-compliant. Optionally, you may include parent experiment information for full provenance tracking." |
| 846 | + ] |
| 847 | + }, |
772 | 848 | { |
773 | 849 | "cell_type": "code", |
774 | | - "execution_count": 5, |
| 850 | + "execution_count": null, |
775 | 851 | "id": "0e54cf4e-b707-4128-aa93-23bb9cf684d3", |
776 | 852 | "metadata": {}, |
777 | 853 | "outputs": [], |
|
784 | 860 | " variant_label=\"r1i1p1f1\",\n", |
785 | 861 | " grid_label=\"gn\",\n", |
786 | 862 | " activity_id=\"CMIP\",\n", |
787 | | - " parent_info=parent_experiment_config)" |
| 863 | + " parent_info=parent_experiment_config # <-- This is optional, can be skipped if not needed\n", |
| 864 | + " )" |
| 865 | + ] |
| 866 | + }, |
| 867 | + { |
| 868 | + "cell_type": "markdown", |
| 869 | + "id": "de6be45d", |
| 870 | + "metadata": { |
| 871 | + "vscode": { |
| 872 | + "languageId": "markdown" |
| 873 | + } |
| 874 | + }, |
| 875 | + "source": [ |
| 876 | + "## Running the CMORiser\n", |
| 877 | + "\n", |
| 878 | + "To start the CMORisation process, simply call the `run()` method on your `cmoriser` instance as shown below. This step may take some time, especially if you are processing a large number of files.\n", |
| 879 | + "\n", |
| 880 | + "We recommend using the [dask-labextension](https://github.com/dask/dask-labextension) with JupyterLab to monitor the progress of your computation. The extension provides a convenient dashboard to track task progress and resource usage directly within your notebook interface.\n" |
788 | 881 | ] |
789 | 882 | }, |
790 | 883 | { |
791 | 884 | "cell_type": "code", |
792 | | - "execution_count": 6, |
| 885 | + "execution_count": null, |
793 | 886 | "id": "5e6c9e48-9dc0-42ab-a396-6bcf7b57cb42", |
794 | 887 | "metadata": {}, |
795 | 888 | "outputs": [], |
796 | 889 | "source": [ |
797 | 890 | "cmoriser.run()" |
798 | 891 | ] |
799 | 892 | }, |
| 893 | + { |
| 894 | + "cell_type": "markdown", |
| 895 | + "id": "c1fade88", |
| 896 | + "metadata": { |
| 897 | + "vscode": { |
| 898 | + "languageId": "markdown" |
| 899 | + } |
| 900 | + }, |
| 901 | + "source": [ |
| 902 | + "### In-memory processing with xarray and Dask\n", |
| 903 | + "\n", |
| 904 | + "The CMORisation workflow processes data entirely in memory using `xarray` and Dask. This approach enables efficient parallel computation and flexible data manipulation, but requires that your system has enough memory to handle the size of your dataset. \n", |
| 905 | + "\n", |
| 906 | + "Once the CMORisation is complete, you can access the resulting dataset by calling the `to_dataset()` method on your `cmoriser` instance (see below). The returned object is a standard xarray dataset, which means you can slice, analyze, or further process the data using familiar xarray operations." |
| 907 | + ] |
| 908 | + }, |
800 | 909 | { |
801 | 910 | "cell_type": "code", |
802 | 911 | "execution_count": 7, |
|
1677 | 1786 | "ds" |
1678 | 1787 | ] |
1679 | 1788 | }, |
| 1789 | + { |
| 1790 | + "cell_type": "markdown", |
| 1791 | + "id": "f2a97420", |
| 1792 | + "metadata": { |
| 1793 | + "vscode": { |
| 1794 | + "languageId": "markdown" |
| 1795 | + } |
| 1796 | + }, |
| 1797 | + "source": [ |
| 1798 | + "### Writing the output to a NetCDF file\n", |
| 1799 | + "\n", |
| 1800 | + "To save your CMORised data to disk, use the `write()` method of the `cmoriser` instance. This will create a NetCDF file with all attributes set according to the CMIP Controlled Vocabulary, ensuring compliance with CMIP metadata standards.\n", |
| 1801 | + "\n", |
| 1802 | + "After writing the file, we recommend validating it using [PrePARE](https://github.com/PCMDI/cmor/tree/master/PrePARE), a tool provided by PCMDI to check the conformity of CMIP files. PrePARE will help you identify any issues with metadata or file structure before publication or further analysis." |
| 1803 | + ] |
| 1804 | + }, |
1680 | 1805 | { |
1681 | 1806 | "cell_type": "code", |
1682 | 1807 | "execution_count": 9, |
|
0 commit comments