You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
logger.warning("The module intakebuilder is not installed. Do you have intakebuilder in your sys.path or have you activated the conda environment with the intakebuilder package in it? ")
20
-
logger.warning("Attempting again with adjusted sys.path ")
logger.error("The module 'intakebuilder' is still not installed. Do you have intakebuilder in your sys.path or have you activated the conda environment with the intakebuilder package in it?")
33
-
raiseImportError("The module 'intakebuilder' is still not installed. Do you have intakebuilder in your sys.path or have you activated the conda environment with the intakebuilder package in it?")
The Catalog Builder team welcomes all contributions. If you would like to help develop the package, please follow the steps outlined below.
6
+
7
+
8
+
How to contribute
9
+
=================
10
+
11
+
Set up a clean environment
12
+
--------------------------
13
+
14
+
First, create a new environment for your Catalog Builder development work. The recommended approach is to use a `python virtual environment (venv) <https://docs.python.org/3/library/venv.html>`_. A conda environment will also work fine if such is desired.
15
+
16
+
.. code-block:: console
17
+
18
+
python3 -m venv /path/to/new/virtual/environment
19
+
20
+
Then, activate the environment by sourcing the activation script. The command varies by operating system and shell:
It is recommended that developers install an `editable <https://setuptools.pypa.io/en/latest/userguide/development_mode.html>`_ Catalog Builder package. This makes development simple as any local changes will immediately be testable. From the root of the repository, run:
Copy file name to clipboardExpand all lines: doc/generation.rst
+68-74Lines changed: 68 additions & 74 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,90 +1,63 @@
1
+
========================
1
2
Generating data catalogs
2
3
========================
3
4
4
-
There are a few ways to use the catalog builder.
5
+
There are a few ways to use the catalog builder. This page contains instructions to help you start using the tool.
5
6
6
7
Installation
7
-
------------
8
+
============
8
9
9
-
Recommended approach: Install as a `conda package <https://anaconda.org/NOAA-GFDL/catalogbuilder>`_
10
+
You will need to install the Catalog Builder package to begin.
10
11
11
-
.. code-block:: console
12
+
Cloning the repository
13
+
----------------------
12
14
13
-
conda install catalogbuilder -c noaa-gfdl
15
+
The current recommended approach for installing the catalog builder is to install the tool as a pip package. You'll need to first clone the `github repository <https://github.com/NOAA-GFDL/CatalogBuilder>`_:
14
16
15
-
Alternatively, you may clone the `git repository <https://github.com/NOAA-GFDL/CatalogBuilder>`_
16
-
and create your conda environment using the `environment.yml <https://github.com/NOAA-GFDL/CatalogBuilder/blob/main/environment.yml>`_ in the git repository.
This would create a catalog.csv and catalog.json in the user's home directory.
47
+
Configuration
48
+
=============
77
49
78
-
.. image:: _static/ezgif-4-786144c287.gif
79
-
:width:1000px
80
-
:alt:Catalog generation demonstration
50
+
A template/configuration file is used for all catalog generation.
81
51
82
-
See `Flags`_ here.
52
+
What is a catalog template?
53
+
---------------------------
54
+
55
+
A catalog template is a YAML file defining headerlist, output path template, output file template, and input/output paths.
83
56
84
-
Using a configuration file
85
-
--------------------------
57
+
Using a custom template
58
+
-----------------------
86
59
87
-
We recommend the use of a configuration file to provide input to the catalog builder. This is necessary and useful if you want to work with datasets and directories that are *not quite* GFDL post-processed directory oriented.
60
+
A default configuration is used for catalog generation unless a custom configuration is provided. We recommend the use of a custom configuration file if you want to work with datasets and directories that are *not quite* GFDL post-processed directory oriented. Configs must be passed to the builder using the ``--config flag``. See `Flags`_ here.
88
61
89
62
`Here <https://github.com/NOAA-GFDL/CatalogBuilder/blob/main/catalogbuilder/tests/config-cfname.yaml>`_ is an example configuration file.
90
63
@@ -106,9 +79,12 @@ with the ESM collection specification standards and the appropriate workflows.
For a directory structure like /archive/am5/am5/am5f3b1r0/c96L65_am5f3b1r0_pdclim1850F/gfdl.ncrc5-deploy-prod-openmp/pp
110
-
the output_path_template is set as above. We have NA in those values that do not match up with any of the expected headerlist (CSV columns), otherwise we
111
-
simply specify the associated header name in the appropriate place. E.g. The third directory in the PP path example above is the model (source_id), so the third list value in output_path_template is set to 'source_id'. We make sure this is a valid value in headerlist as well. The fourth directory is am5f3b1r0 which does not map to an existing header value. So we simply NA in output_path_template for the fourth value. We have NA in values that do not match up with any of the expected headerlist (CSV columns), otherwise we simply specify the associated header name in the appropriate place. E.g. The third directory in the PP path example above is the model (source_id), so the third list value in output_path_template is set to 'source_id'. We make sure this is a valid value in headerlist as well. #The fourth directory is am5f3b1r0 which does not map to an existing header value. So we simply set NA in output_path_template for the fourth value.
82
+
For a directory structure like /archive/am5/am5/am5f3b1r0/c96L65_am5f3b1r0_pdclim1850F/gfdl.ncrc5-deploy-prod-openmp/pp the output_path_template is set as above.
83
+
84
+
We have NA in those values that do not match up with any of the expected headerlist (CSV columns), otherwise we
85
+
simply specify the associated header name in the appropriate place. E.g. The third directory in the PP path example above is the model (source_id), so the third list value in output_path_template is set to 'source_id'. We make sure this is a valid value in headerlist as well. The fourth directory is am5f3b1r0 which does not map to an existing header value. So we simply add NA in output_path_template for the fourth value.
86
+
87
+
We have NA in values that do not match up with any of the expected headerlist (CSV columns), otherwise we simply specify the associated header name in the appropriate place. E.g. The third directory in the PP path example above is the model (source_id), so the third list value in output_path_template is set to 'source_id'. We make sure this is a valid value in headerlist as well.
112
88
113
89
.. code-block:: yaml
114
90
@@ -121,14 +97,32 @@ simply specify the associated header name in the appropriate place. E.g. The thi
output_path: "/home/a1r/github/noaa-gfdl/catalogs/c96L65_am5f7b10r0_amip"# ENTER NAME OF THE CSV AND JSON, THE SUFFIX ALONE. This can be an absolute or a relative path
123
99
124
-
Template
125
-
--------
100
+
Creating a data catalog
101
+
=======================
102
+
103
+
Using the installed package
104
+
---------------------------
105
+
106
+
Catalogs are generated by the following command: *gen_intake_gfdl.py <INPUT_PATH> <OUTPUT_PATH>*
107
+
108
+
Output path argumment should end with the desired output filename WITHOUT a file ending. See example below.
This would create a catalog.csv and catalog.json in the user's home directory.
115
+
116
+
.. image:: _static/ezgif-4-786144c287.gif
117
+
:width:1000px
118
+
:alt:Catalog generation demonstration
126
119
127
-
All data catalogs are generated using a template file. This file defines headerlist, output path template, output file template, and input/output paths.
120
+
See `Flags`_ here.
128
121
129
122
From a Python script
130
123
---------------------
131
124
Do you have a python script or a notebook where you could also include steps to generate a data catalog?
125
+
132
126
See example `here <https://github.com/NOAA-GFDL/CatalogBuilder/blob/main/catalogbuilder/scripts/gen_intake_gfdl_runner_config.py>`_
133
127
134
128
Here is another example *with a custom configuration*:
@@ -211,17 +205,10 @@ Refer to this `notebook <https://github.com/aradhakrishnanGFDL/canopy-cats/blob/
211
205
.. image:: _static/catalog_generation.png
212
206
:alt:Screenshot of a notebook showing catalog generation
213
207
214
-
215
208
Using FRE-CLI (GFDL only)
216
209
-------------------------
217
210
218
-
**1. Activate conda environment**
219
-
220
-
.. code-block:: console
221
-
222
-
conda activate /nbhome/fms/conda/envs/fre-cli
223
-
224
-
**2. Call the builder**
211
+
Follow the `fre-cli setup documentation <https://noaa-gfdl.readthedocs.io/projects/fre-cli/en/latest/setup.html>`_ to gain access to fre-cli.
225
212
226
213
Catalogs are generated by the following command: *fre catalog buildcatalog <INPUT_PATH> <OUTPUT_PATH>*
227
214
@@ -234,13 +221,18 @@ Catalogs are generated by the following command: *fre catalog buildcatalog <INPU
234
221
235
222
See `Flags`_ here.
236
223
237
-
See `Fre-CLI Documentation here <https://noaa-gfdl.github.io/fre-cli/>`_
224
+
See `Fre-CLI Documentation here <https://noaa-gfdl.readthedocs.io/projects/fre-cli/en/latest/>`_
225
+
226
+
Expected output
227
+
---------------
238
228
229
+
The catalog builder tool generates a JSON catalog specification file and a CSV catalog in the specfied output directory with the specified name.
239
230
240
-
Arguments/Options
241
-
_____
231
+
Arguments and Options
232
+
=====================
242
233
243
-
**Input/Output paths can be passed directly to catalog builder tool through calling command**
234
+
Arguments
235
+
---------
244
236
245
237
All methods of catalog builder generation support direct input/output path passing.
246
238
@@ -249,6 +241,8 @@ Input path must be the 1st argument. Output path must be the 2nd.
- --config - Allows for catalogs to be generated with a custom configuration. Requires path to YAML configuration file. (Ex. "--config custom_config.yaml")
0 commit comments