diff --git a/book_source/03_topical_pages/02_pecan_standards.Rmd b/book_source/03_topical_pages/02_pecan_standards.Rmd index d71e3b3ddd..424da74478 100644 --- a/book_source/03_topical_pages/02_pecan_standards.Rmd +++ b/book_source/03_topical_pages/02_pecan_standards.Rmd @@ -1,5 +1,6 @@ # PEcAn standard formats {#pecan-standards} +## Overview of PEcAn standards PEcAn relies on standardized data formats to ensure reproducibility, interoperability, and consistency across models, sites, and workflows. These standards define how inputs, internal representations, and outputs are structured throughout the PEcAn workflows. **In this chapter** @@ -11,29 +12,36 @@ PEcAn relies on standardized data formats to ensure reproducibility, interoperab ## Defining new input formats -* New formats can be defined on the ['formats' page of BETYdb](http://betydb.org/formats) -* After creating a new format, the contents should be defined by specifying the BETYdb variable name and the name used in the file/ +This chapter describes the standard conventions used by PEcAn for input data, output data, and metadata to ensure consistency across models and workflows. -## Time Standard -Internal PEcAn standard time follows ISO_8601 format for dates and time (https://en.wikipedia.org/wiki/ISO_8601). For example ordinal dates go from 1 365/366 (https://en.wikipedia.org/wiki/ISO_8601#Ordinal_dates). However, time used in met drivers or model outputs follows CF convention with julian dates following the 0 to 364/365 format +## Core data standards -To aid in the conversion between PEcAn internal ISO_8601 standard and CF convention used in all met drivers and PEcAn standard output you can utilize the functions: "cf2datetime","datetime2doy",and "cf2doy" +### Defining new input formats -## Input Standards +- New formats can be defined on the ['formats' page of BETYdb](http://betydb.org/formats) +- After creating a new format, the contents should be defined by specifying the BETYdb variable name and the name used in the file/ -### Meteorology Standards +### Time standards -#### Dimensions +Internal PEcAn standard time follows ISO_8601 format for dates and time (). For example ordinal dates go from 1 365/366 (). However, time used in met drivers or model outputs follows CF convention with julian dates following the 0 to 364/365 format +To aid in the conversion between PEcAn internal ISO_8601 standard and CF convention used in all met drivers and PEcAn standard output you can utilize the functions: "cf2datetime","datetime2doy",and "cf2doy" -|CF standard-name | units | -|:------------------------------------------|:------| -| time | days since 1700-01-01 00:00:00 UTC| -| longitude | degrees_east| -| latitude |degrees_north| +### Input Standards -General Note: dates in the database should be date-time (preferably with timezone), and datetime passed around in PEcAn should be of type POSIXct. +### Meteorology Data +#### Dimensions + +| CF standard-name | units | +|:-----------------|:-----------------------------------| +| time | days since 1700-01-01 00:00:00 UTC | +| longitude | degrees_east | +| latitude | degrees_north | + +::: callout-note +dates in the database should be date-time (preferably with timezone), and datetime passed around in PEcAn should be of type POSIXct. +::: #### Variable names should be `standard_name` @@ -71,33 +79,30 @@ datatable(in_tab, extensions = c('FixedColumns',"Buttons"), ``` -* preferred variables indicated in bold -* wind_direction has no CF equivalent and should not be converted, instead the met2CF functions should convert wind_direction and wind_speed to eastward_wind and northward_wind -* standard_name is CF-convention standard names -* units can be converted by udunits, so these can vary (e.g. the time denominator may change with time frequency of inputs) -* soil moisture for the full column, rather than a layer, is soil_moisture_content -* The list of PEcAn standard variable names, units and dimensions are provided in a table in the [Output Standards]{#OutputStandards} section and maintained in the file: [base/utils/data/standard_vars.csv](https://github.com/PecanProject/pecan/blob/develop/base/utils/data/standard_vars.csv). +- preferred variables indicated in bold +- wind_direction has no CF equivalent and should not be converted, instead the met2CF functions should convert wind_direction and wind_speed to eastward_wind and northward_wind +- standard_name is CF-convention standard names +- units can be converted by udunits, so these can vary (e.g. the time denominator may change with time frequency of inputs) +- soil moisture for the full column, rather than a layer, is soil_moisture_content +- The list of PEcAn standard variable names, units and dimensions are provided in a table in the [Output Standards]{#OutputStandards} section and maintained in the file: [base/utils/data/standard_vars.csv](https://github.com/PecanProject/pecan/blob/develop/base/utils/data/standard_vars.csv). -For example, in the [MsTMIP-CRUNCEP](https://www.betydb.org/inputs/280) data, the variable `rain` should be `precipitation_rate`. -We want to standardize the units as well as part of the `met2CF.` step. I believe we want to use the CF "canonical" units but retain the MsTMIP units any time CF is ambiguous about the units. +**Example:** in the [MsTMIP-CRUNCEP](https://www.betydb.org/inputs/280) data, the variable `rain` should be `precipitation_rate`. We want to standardize the units as well as part of the `met2CF.` step. I believe we want to use the CF "canonical" units but retain the MsTMIP units any time CF is ambiguous about the units. The key is to process each type of met data (site, reanalysis, forecast, climate scenario, etc) to the exact same standard. This way every operation after that (extract, gap fill, downscale, convert to a model, etc) will always have the exact same inputs. This will make everything else much simpler to code and allow us to avoid a lot of unnecessary data checking, tests, etc being repeated in every downstream function. -### Soils and Vegetation Inputs - -#### Soil Data +### Soil Data See the [Soil Data] section on more into on creating a standard soil data file. -#### Vegetation Data +### Vegetation Data See the [Vegetation Data] section on more info on creating a standard vegetation data file -## Output Standards {#OutputStandards} +## Output Standards -* created by `model2netcdf` functions -* based on format used by [MsTMIP](http://nacp.ornl.gov/MsTMIP_variables.shtml) -* Can be seen at HERE +- created by `model2netcdf` functions +- based on format used by [MsTMIP](http://nacp.ornl.gov/MsTMIP_variables.shtml) +- Can be seen at HERE We originally used the [MsTMIP](http://nacp.ornl.gov/MsTMIP_variables.shtml) conventions. Since then, we've added the PaLEON variable conventions to our standard as well. If a variable isn't in one of those two, we stick to the CF conventions. diff --git a/book_source/03_topical_pages/11_adding_to_pecan.Rmd b/book_source/03_topical_pages/11_adding_to_pecan.Rmd index ffd5daf0d3..85234d2163 100644 --- a/book_source/03_topical_pages/11_adding_to_pecan.Rmd +++ b/book_source/03_topical_pages/11_adding_to_pecan.Rmd @@ -1,34 +1,34 @@ # Adding to PEcAn {#adding-to-pecan} -- Case studies - - [Adding a model](#adding-model) - - [Adding input data](#NewInput) - - [Adding data through the web interface](#adding-data-web) - - Adding new species, PFTs, and traits from a new site - - Add a site - - Add some species - - Add PFT - - Add trait data - - Adding a benchmark - - Adding a met driver -- [Reference](#editing-records) (How to edit records in bety) - - Models - - Species - - PFTs - - Traits - - Inputs - - DB files - - Variables - - Formats - - (Link each section to relevant Bety tables) +- Case studies + - [Adding a model](#adding-model) + - [Adding input data](#NewInput) + - [Adding data through the web interface](#adding-data-web) + - Adding new species, PFTs, and traits from a new site + - Add a site + - Add some species + - Add PFT + - Add trait data + - Adding a benchmark + - Adding a met driver +- [Reference](#editing-records) (How to edit records in bety) + - Models + - Species + - PFTs + - Traits + - Inputs + - DB files + - Variables + - Formats + - (Link each section to relevant Bety tables) ## Adding An Ecosystem Model {#adding-model} **Adding a model to PEcAn involves two activities:** -1. Writing the interface modules between the model and PEcAn -2. Integrate the interface into the rest of the PEcAn system -3. Updating the PEcAn database to register the model (optional) +1. Writing the interface modules between the model and PEcAn +2. Integrate the interface into the rest of the PEcAn system +3. Updating the PEcAn database to register the model (optional) **Note that coupling a model to PEcAn should not require any changes to the model code itself**. A key aspect of our design philosophy is that we want it to be easy to add models to the system and we want to using the working version of the code that is used by all other model users, not a special branch (which would rapidly end up out-of-date). @@ -36,32 +36,35 @@ #### Setting up the module directory (required) -PEcAn assumes that the interface modules are available as an R package in the models directory named after the model in question. The simplest way to get started on that R package is to make a copy the [_template_](https://github.com/PecanProject/pecan/tree/main/models/template) directory in the pecan/models folder and re-name it to the name of your model. In the code, filenames, and examples below you will want to substitute the word **MODEL** for the name of your model (note: R is case-sensitive). +PEcAn assumes that the interface modules are available as an R package in the models directory named after the model in question. The simplest way to get started on that R package is to make a copy the [*template*](https://github.com/PecanProject/pecan/tree/main/models/template) directory in the pecan/models folder and re-name it to the name of your model. In the code, filenames, and examples below you will want to substitute the word **MODEL** for the name of your model (note: R is case-sensitive). -If you do not want to write the interface modules in R then it is fairly simple to set up the R functions describe below to just call the script you want to run using R's _system_ command. Scripts that are not R functions should be placed in the _inst_ folder and R can look up the location of these files using the function _system.file_ which takes as arguments the _local_ path of the file within the package folder and the name of the package (typically PEcAn.MODEL). For example +If you do not want to write the interface modules in R then it is fairly simple to set up the R functions describe below to just call the script you want to run using R's `system` command. Scripts that are not R functions should be placed in the `inst` folder and R can look up the location of these files using the function `system.file` which takes as arguments the `local` path of the file within the package folder and the name of the package (typically PEcAn.MODEL). For example - ## Example met conversion wrapper function - met2model.MODEL <- function(in.path, in.prefix, outfolder, start_date, end_date){ - myMetScript <- system.file("inst/met2model.MODEL.sh", "PEcAn.MODEL") - system(paste(myMetScript, file.path(in.path, in.prefix), outfolder, start_date, end_date)) - } +``` +## Example met conversion wrapper function +met2model.MODEL <- function(in.path, in.prefix, outfolder, start_date, end_date){ + myMetScript <- system.file("inst/met2model.MODEL.sh", "PEcAn.MODEL") + system(paste(myMetScript, file.path(in.path, in.prefix), outfolder, start_date, end_date)) +} +``` would execute the following at the Linux command line - inst/met2model.MODEL.sh in.path/in.prefix outfolder start_date end_date ` +``` +inst/met2model.MODEL.sh in.path/in.prefix outfolder start_date end_date ` +``` -#### DESCRIPTION +#### DESCRIPTION File Within the module folder open the *DESCRIPTION* file and change the package name to PEcAn.MODEL. Fill out other fields such as Title, Author, Maintainer, and Date. -#### NAMESPACE +#### NAMESPACE File This file is managed by Roxygen and will update automatically when you build the model package. Do not edit it by hand. #### Building the package -if you have a favorite tool for building R packages from local directories, you can use it as normal during development. If you do not yet have a favorite, use PEcAn's Make system: Add your package to the Makefile by adding its name to the line starting `MODELS := ` near the top of the file. Then you can update Roxygen output with `make document`, build and install the package with `make install`, and run package checks and tests with `make check` / `make check`. Since these run on every PEcAn package by default, you may want to limit these to the MODEL package with `make .check/models/MODEL`. - +if you have a favorite tool for building R packages from local directories, you can use it as normal during development. If you do not yet have a favorite, use PEcAn's Make system: Add your package to the Makefile by adding its name to the line starting `MODELS :=` near the top of the file. Then you can update Roxygen output with `make document`, build and install the package with `make install`, and run package checks and tests with `make check` / `make check`. Since these run on every PEcAn package by default, you may want to limit these to the MODEL package with `make .check/models/MODEL`. #### write.config.MODEL (required) @@ -86,17 +89,17 @@ Converts meteorology input files from the PEcAn standard (netCDF, CF metadata) t #### Additional Input Converters In addition to met2model.MODEL, PEcAn also supports the following additional input conversions: -* veg2model.MODEL - handles vegetation initial conditions. Supports both pool-based and cohort-based models -* write.events.MODEL - handles management events (still in early development) -* soil physical parameters (e.g., texture, hydraulics, thermodynamics) - there is not yet a convention for stand-alone converters, existing models currently read the soil.nc file from settings$run$inputs$soil_physics as part of their write.configs. This file is generated by PEcAn.data.land::soil2netcdf -* vegetation phenology - there is not yet a convention for stand-alone converters, existing models read the phenology file from settings$run$inputs$leaf_phenology within their write.configs. This is currently a csv file with the columns: "year", "site_id", "lat", "lon", "leafonday","leafoffday","leafon_qa","leafoff_qa" -Because not all models accept these inputs, the `pecan.MODEL` package template does not include converter skeletons for them. See the code of existing model coupler packages to find examples, and ask freely in Slack for suggestions specific to your project. +- `veg2model.MODEL` - handles vegetation initial conditions. Supports both pool-based and cohort-based models +- `write.events.MODEL` - handles management events (still in early development) +- soil physical parameters (e.g., texture, hydraulics, thermodynamics) - there is not yet a convention for stand-alone converters, existing models currently read the `soil.nc` file from `settings$run$inputs$soil_physics` as part of their `write.configs`. This file is generated by `PEcAn.data.land::soil2netcdf` +- vegetation phenology - there is not yet a convention for stand-alone converters, existing models read the phenology file from `settings$run$inputs$leaf_phenology` within their write.configs. This is currently a csv file with the columns: `year`, `site_id`, `lat`, `lon`, `leafonday`,`leafoffday`,`leafon_qa`,`leafoff_qa` -See also [Adding a new input converter](#InputConversions) for more about PEcAn's approach to input handling and what information is passed in each input type. +Because not all models accept these inputs, the `pecan.MODEL` package template does not include converter skeletons for them. See the code of existing model coupler packages to find examples, and ask freely in Slack for suggestions specific to your project. +See also [Adding a new input converter](#InputConversions) for more about PEcAn's approach to input handling and what information is passed in each input type. -#### Commit changes +#### Commit Changes Once the MODEL modules are written, you should follow the [Using-Git](Using-Git.md) instructions on how to commit your changes to your local git repository, verify that PEcAn compiles using *scripts/build.sh*, push these changes to Github, and submit a pull request so that your model module is added to the PEcAn system. It is important to note that while we encourage users to make their models open, adding the PEcAn interface module to the Github repository in no way requires that the model code itself be made public. It does, however, allow anyone who already has a copy of the model code to use PEcAn so we strongly encourage that any new model modules be committed to Github. @@ -104,20 +107,19 @@ Once the MODEL modules are written, you should follow the [Using-Git](Using-Git. Once the package is defined, you will also want to make the rest of PEcAn notice it: -* Add it to the PEcAn Makefile, as part of the line near the top that starts `MODELS := ` (if you didn't do this earlier to build the package). -* Run `scripts/generate_dependencies.R` to add the list of your model's dependencies to the packages pre-installed on the PEcAn docker images. -* Add it to the list of packages in `base/all/data/pecan_version_history.csv` -* Optionally, add it as a `Suggests:` dependency in base/all/DESCRIPTION (not all models choose to do this). -* If your model package has a Dockerfile, add it to the `modelsbinary` section of the Docker build in ``.github/workflows/docker.yml` -* Once the package is working and merged into the develop branch of PEcAn, open a pull request to add it to the `packages.json` file of [https://github.com/PecanProject/pecanproject.r-universe.dev]. This will cue R-Universe to start building and distributing the package as part of the PEcAn collection. -* Announce it in the CHANGELOG! - +- Add it to the PEcAn Makefile, as part of the line near the top that starts `MODELS :=` (if you didn't do this earlier to build the package). +- Run `scripts/generate_dependencies.R` to add the list of your model's dependencies to the packages pre-installed on the PEcAn docker images. +- Add it to the list of packages in `base/all/data/pecan_version_history.csv` +- Optionally, add it as a `Suggests:` dependency in base/all/DESCRIPTION (not all models choose to do this). +- If your model package has a Dockerfile, add it to the `modelsbinary` section of the Docker build in \``.github/workflows/docker.yml` +- Once the package is working and merged into the develop branch of PEcAn, open a pull request to add it to the `packages.json` file of []. This will cue R-Universe to start building and distributing the package as part of the PEcAn collection. +- Announce it in the CHANGELOG! ### Add model info to PEcAn Database -Note: As support for running PEcAn with no database expands, this step has become less important, but keep reading -- this section is still the best summary of what information needs to be _available to_ the model, whether looked up from the database on the fly or passed in by hand. +Note: As support for running PEcAn with no database expands, this step has become less important, but keep reading -- this section is still the best summary of what information needs to be *available to* the model, whether looked up from the database on the fly or passed in by hand. -To run a model within PEcAn requires that the PEcAn database has sufficient information about the model. This includes a MODEL_TYPE designation, the types of inputs the model requires, the location of the model executable, and the plant functional types used by the model. +To run a model within PEcAn requires that the PEcAn database has sufficient information about the model. This includes a MODEL_TYPE designation, the types of inputs the model requires, the location of the model executable, and the plant functional types used by the model. The instructions in this section assume that you will be specifying this information using the BETYdb web-based interface. This can be done either on your local VM (localhost:3280/bety or localhost:6480/bety) or on a server installation of BETYdb. However you interact with BETYdb, we encourage you to set up your PEcAn instance to support [database syncs](#database-sync) so that these changes can be shared and backed-up across the PEcAn network. @@ -129,57 +131,57 @@ The figure below summarizes the relevant database tables that need to be updated ### Define MODEL_TYPE -The first step to adding a model is to create a new MODEL_TYPE, which defines the abstract model class. This MODEL_TYPE is used to specify input requirements, define plant functional types, and keep track of different model versions. +The first step to adding a model is to create a new MODEL_TYPE, which defines the abstract model class. This MODEL_TYPE is used to specify input requirements, define plant functional types, and keep track of different model versions. -The MODEL_TYPE is created by selecting Runs > Model Type and then clicking on _New Model Type_. The MODEL_TYPE name should be identical to the MODEL package name (see Interface Module below) and is case sensitive. +The MODEL_TYPE is created by selecting Runs \> Model Type and then clicking on *New Model Type*. The MODEL_TYPE name should be identical to the MODEL package name (see Interface Module below) and is case sensitive. -![](03_topical_pages/11_images/bety_modeltype_1.png) -![](03_topical_pages/11_images/bety_modeltype_2.png) +![](03_topical_pages/11_images/bety_modeltype_1.png) ![](03_topical_pages/11_images/bety_modeltype_2.png) -### MACHINE +### Define MACHINE -The PEcAn design acknowledges that the same model executables and input files may exist on multiple computers. Therefore, we need to define the machine that that we are using. If you are running on the VM then the local machine is already defined as _pecan_. Otherwise, you will need to select Runs > Machines, click _New Machine_, and enter the URL of your server (e.g. pecan2.bu.edu). +The PEcAn design acknowledges that the same model executables and input files may exist on multiple computers. Therefore, we need to define the machine that that we are using. If you are running on the VM then the local machine is already defined as *pecan*. Otherwise, you will need to select Runs \> Machines, click *New Machine*, and enter the URL of your server (e.g. pecan2.bu.edu). -### MODEL +### Register MODEL executable -Next we are going to tell PEcAn where the model executable is. Select Runs > Files, and click ADD. Use the pull down menu to specify the machine you just defined above and fill in the path and name for the executable. For example, if SIPNET is installed at /usr/local/bin/sipnet then the path is /usr/local/bin/ and the file (executable) is sipnet. +Next we are going to tell PEcAn where the model executable is. Select Runs \> Files, and click ADD. Use the pull down menu to specify the machine you just defined above and fill in the path and name for the executable. For example, if SIPNET is installed at /usr/local/bin/sipnet then the path is /usr/local/bin/ and the file (executable) is sipnet. -Now we will create the model record and associate this with the File we just registered. The first time you do this select Runs > Models and click _New Model_. Specify a descriptive name of the model (which doesn't have to be the same as MODEL_TYPE), select the MODEL_TYPE from the pull down, and provide a revision identifier for the model (e.g. v3.2.1). Once the record is created select it from the Models table and click EDIT RECORD. Click on "View Related Files" and when the search window appears search for the model executable you just added (if you are unsure which file to choose you can go back to the Files menu and look up the unique ID number). You can then associate this Model record with the File by clicking on the +/- symbol. By contrast, clicking on the name itself will take you to the File record. +Now we will create the model record and associate this with the File we just registered. The first time you do this select Runs \> Models and click *New Model*. Specify a descriptive name of the model (which doesn't have to be the same as MODEL_TYPE), select the MODEL_TYPE from the pull down, and provide a revision identifier for the model (e.g. v3.2.1). Once the record is created select it from the Models table and click EDIT RECORD. Click on "View Related Files" and when the search window appears search for the model executable you just added (if you are unsure which file to choose you can go back to the Files menu and look up the unique ID number). You can then associate this Model record with the File by clicking on the +/- symbol. By contrast, clicking on the name itself will take you to the File record. In the future, if you set up the SAME MODEL VERSION on a different computer you can add that Machine and File to PEcAn and then associate this new File with this same Model record. A single version of a model should only be entered into PEcAn **once**. If a new version of the model is developed that is derived from the current version you should add this as a new Model record but with the same MODEL_TYPE as the original. Furthermore, you should set the previous version of the model as Parent of this new version. -### FORMATS +### Define Input FORMATS The PEcAn database keep track of all the input files passed to models, as well as any data used in model validation or data assimilation. Before we start to register these files with PEcAn we need to define the format these files will be in. To create a new format see [Formats Documentation](#NewFormat). -### MODEL_TYPE -> Formats +### Associate MODEL_TYPE With Formats -For each of the input formats you specify for your model, you will need to edit your MODEL_TYPE record to add an association between the format and the MODEL_TYPE. Go to Runs > Model Type, select your record and click on the Edit button. Next, click on "Edit Associated Formats" and choose the Format you just defined from the pull down menu. If the *Input* box is checked then all matching Input records will be displayed in the PEcAn site run selection page when you are defining a model run. In other words, the set of model inputs available through the PEcAn web interface is model-specific and dynamically generated from the associations between MODEL_TYPEs and Formats. If you also check the *Required* box, then the Input will be treated as required and PEcAn will not run the model if that input is not available. Furthermore, on the site selection webpage, PEcAn will filter the available sites and only display pins on the Google Map for sites that have a full set of required inputs (or where those inputs could be generated using PEcAn's workflows). Similarly, to make a site appear on the Google Map, all you need to do is specify Inputs, as described in the next section, and the point should automatically appear on the map. +For each of the input formats you specify for your model, you will need to edit your MODEL_TYPE record to add an association between the format and the MODEL_TYPE. Go to Runs \> Model Type, select your record and click on the Edit button. Next, click on "Edit Associated Formats" and choose the Format you just defined from the pull down menu. If the *Input* box is checked then all matching Input records will be displayed in the PEcAn site run selection page when you are defining a model run. In other words, the set of model inputs available through the PEcAn web interface is model-specific and dynamically generated from the associations between MODEL_TYPEs and Formats. If you also check the *Required* box, then the Input will be treated as required and PEcAn will not run the model if that input is not available. Furthermore, on the site selection webpage, PEcAn will filter the available sites and only display pins on the Google Map for sites that have a full set of required inputs (or where those inputs could be generated using PEcAn's workflows). Similarly, to make a site appear on the Google Map, all you need to do is specify Inputs, as described in the next section, and the point should automatically appear on the map. -### INPUTS +### Register Input Files After a file Format has been created then input files can be registered with the database. Creating Inputs can be found under [How to insert new Input data](#NewInput). ### Add Plant Functional Types (PFTs) -Since many of the PEcAn tools are designed to keep track of parameter uncertainties and assimilate data into models, to use PEcAn with a model it is important to define Plant Functional Types for the sites or regions that you will be running the model. +Since many of the PEcAn tools are designed to keep track of parameter uncertainties and assimilate data into models, to use PEcAn with a model it is important to define Plant Functional Types for the sites or regions that you will be running the model. + +Create a new PFT entry by selecting Data \> PFTs and then clicking on *New PFT*. -Create a new PFT entry by selecting Data > PFTs and then clicking on _New PFT_. +![](03_topical_pages/11_images/bety_pft_1.png) -![](03_topical_pages/11_images/bety_pft_1.png) ![](03_topical_pages/11_images/bety_pft_2.png) -Give the PFT a descriptive name (e.g., temperate deciduous). PFTs are MODEL_TYPE specific, so choose your MODEL_TYPE from the pull down menu. +Give the PFT a descriptive name (e.g., temperate deciduous). PFTs are MODEL_TYPE specific, so choose your MODEL_TYPE from the pull down menu. ![](03_topical_pages/11_images/bety_pft_3.png) -#### Species +**Species** Within PEcAn there are no predefined PFTs and user can create new PFTs very easily at whatever taxonomic level is most appropriate, from PFTs for individual species up to one PFT for all plants globally. To allow PEcAn to query its trait database for information about a PFT, you will want to associate species with the PFT record by choosing Edit and then "View Related Species". Species can be searched for by common or scientific name and then added to a PFT using the +/- button. -#### Cultivars +**Cultivars** You can also define PFTs whose members are *cultivars* instead of species. This is designed for analyses where you want to want to perform meta-analysis on within-species comparisons (e.g. cultivar evaluation in an agricultural model) but may be useful for other cases when you want to specify different priors for some member of a species. You cannot associate both species and cultivars with the same PFT, but the cultivars in a cultivar PFT may come from different species, potentially including all known cultivars from some of the species, if you wish to and have thought about how to interpret the results. @@ -191,103 +193,100 @@ In addition to adding species, a PFT is defined in PEcAn by the list of variable There are a wide variety of priors already defined in the PEcAn database that often range from very diffuse and generic to very informative priors for specific PFTs. -These pre-existing prior distributions can be added to a PFT. Navigate to the PFT from Data > PFTs and selecting the edit button in the Actions column for the chosen PFT. +These pre-existing prior distributions can be added to a PFT. Navigate to the PFT from Data \> PFTs and selecting the edit button in the Actions column for the chosen PFT. ![](03_topical_pages/11_images/bety_priors_1.png) -Click on "View Related Priors" button and search through the list for desired prior distributions. The list can be filtered by adding terms into the search box. Add a prior to the PFT by clicking on the far left button for the desired prior, changing it to an X. +Click on "View Related Priors" button and search through the list for desired prior distributions. The list can be filtered by adding terms into the search box. Add a prior to the PFT by clicking on the far left button for the desired prior, changing it to an X. ![](03_topical_pages/11_images/bety_priors_2.png) -Save this by scrolling to the bottom of the PFT page and hitting the Update button. +Save this by scrolling to the bottom of the PFT page and hitting the Update button. ![](03_topical_pages/11_images/bety_priors_3.png) -#### Creating new prior distributions +### Creating new prior distributions -A new prior distribution can be created for a pre-existing variable, if a more constrained or specific one is known. +A new prior distribution can be created for a pre-existing variable, if a more constrained or specific one is known. -* Select Data > Priors then “New Prior” -* In the _Citation_ box, type in or select an existing reference that indicates how the prior was defined. There are a number of unpublished citations in current use that simply state the expert opinion of an individual -* Fill the _Variable_ box by typing in part or all of a pre-existing variable's name and selecting it -* The _Phylogeny_ box allows one to specify what taxonomic grouping the prior is defined for, at it is important to note that this is just for reference and doesn’t have to be specified in any standard way nor does it have to be monophyletic (i.e. it can be a functional grouping) -* The prior distribution is defined by choosing an option from the drop-down _Distribution_ box, and then specifying values for both _Parameter a_ and _Parameter b_. The exact meaning of the two parameters depends on the distribution chosen. For example, for the Normal distribution a and b are the mean and standard deviation while for the Uniform they are the minimum and maximum. All parameters are defined based on their standard parameterization in the R language -* Specify the prior sample size in _N_ if the prior is based on observed data (independent of data in the PEcAn database) -* When this is done, scroll down and hit the Create button +- Select Data \> Priors then “New Prior” +- In the *Citation* box, type in or select an existing reference that indicates how the prior was defined. There are a number of unpublished citations in current use that simply state the expert opinion of an individual +- Fill the *Variable* box by typing in part or all of a pre-existing variable's name and selecting it +- The *Phylogeny* box allows one to specify what taxonomic grouping the prior is defined for, at it is important to note that this is just for reference and doesn’t have to be specified in any standard way nor does it have to be monophyletic (i.e. it can be a functional grouping) +- The prior distribution is defined by choosing an option from the drop-down *Distribution* box, and then specifying values for both *Parameter a* and *Parameter b*. The exact meaning of the two parameters depends on the distribution chosen. For example, for the Normal distribution a and b are the mean and standard deviation while for the Uniform they are the minimum and maximum. All parameters are defined based on their standard parameterization in the R language +- Specify the prior sample size in *N* if the prior is based on observed data (independent of data in the PEcAn database) +- When this is done, scroll down and hit the Create button ![](03_topical_pages/11_images/bety_priors_4.png) -The new prior distribution can then be added a PFT as described in the "Adding Priors for Each Variable" section. +The new prior distribution can then be added a PFT as described in the "Adding Priors for Each Variable" section. -#### Creating new variables +### Creating new variables It is important to note that the priors are defined for the variable name and units as specified in the Variables table. **If the variable name or units is different within the model it is the responsibility of write.configs.MODEL function to handle name and unit conversions** (see Interface Modules below). This can also include common but nonlinear transformations, such as converting SLA to LMA or changing the reference temperature for respiration rates. -To add a new variable, select Data > Variables and click the New Variable button. Fill in the _Name_ field with the desired name for the variable and the units in the _Units_ field. There are additional fields, such as _Standard Units_, _Notes_, and _Description_, that can be filled out if desired. When done, hit the Create button. +To add a new variable, select Data \> Variables and click the New Variable button. Fill in the *Name* field with the desired name for the variable and the units in the *Units* field. There are additional fields, such as *Standard Units*, *Notes*, and *Description*, that can be filled out if desired. When done, hit the Create button. ![](03_topical_pages/11_images/bety_priors_5.png) -The new variable can be used to create a prior distribution for it as in the "Creating new prior distributions" section. - - - +The new variable can be used to create a prior distribution for it as in the "Creating new prior distributions" section. ## Adding input data {#NewInput} ### Input records in BETY -All model input data or data used for model calibration/validation must be registered in the BETY database. +All model input data or data used for model calibration/validation must be registered in the BETY database. Before creating a new Input record, you must make sure that the format type of your data is registered in the database. If you need to make a new format record, see [Creating a new format record in BETY](#NewFormat). ### Create a database file record for the input data -An input record contains all the metadata required to identify the data, however, this record does not include the location of the data file. Since the same data may be stored in multiple places, every file has its own dbfile record. +An input record contains all the metadata required to identify the data, however, this record does not include the location of the data file. Since the same data may be stored in multiple places, every file has its own dbfile record. From your BETY interface: -* Create a DBFILES entry for the path to the file - + From the menu click RUNS then FILES - + Click “New File” - + Select the machine your file is located at - + Fill in the File Path where your file is located (aka folder or directory) NOT including the name of the file itself - + Fill in the File Name with the name of the file itself. Note that some types of input records will refer to be ALL the files in a directory and thus File Name can be blank - + Click Update - +- Create a DBFILES entry for the path to the file + - From the menu click RUNS then FILES + - Click “New File” + - Select the machine your file is located at + - Fill in the File Path where your file is located (aka folder or directory) NOT including the name of the file itself + - Fill in the File Name with the name of the file itself. Note that some types of input records will refer to be ALL the files in a directory and thus File Name can be blank + - Click Update + ### Creating a new Input record in BETY From your BETY interface: -* Create an INPUT entry for your data - + From the menu click RUNS then INPUTS - + Click “New Input” - + Select the SITE that this data is associated with the input data set - + Other required fields are a unique name for the input, the start and end dates of the data set, and the format of the data. If the data is not in a currently known format you will need to create a NEW FORMAT and possibly a new input converter. Instructions on how to do add a converter can be found here [Input conversion](#InputConversions). Instructions on how to add a format record can be found [here](#NewFormat) - + Parent ID is an optional variable to indicated that one dataset was derived from another. - + Click “Create” -* Associate the DBFILE with the INPUT - + In the RUNS -> INPUTS table, search and find the input record you just created - + Click on the EDIT icon - + Select “View related Files” - + In the Search window, search for the DBFILE you just created - * Once you have found the DBFILE, click on the “+” icon to add the file - * Click on “Update” at the bottom when you are done. - +- Create an INPUT entry for your data + - From the menu click RUNS then INPUTS + - Click “New Input” + - Select the SITE that this data is associated with the input data set + - Other required fields are a unique name for the input, the start and end dates of the data set, and the format of the data. If the data is not in a currently known format you will need to create a NEW FORMAT and possibly a new input converter. Instructions on how to do add a converter can be found here [Input conversion](#InputConversions). Instructions on how to add a format record can be found [here](#NewFormat) + - Parent ID is an optional variable to indicated that one dataset was derived from another. + - Click “Create” +- Associate the DBFILE with the INPUT + - In the RUNS -\> INPUTS table, search and find the input record you just created + - Click on the EDIT icon + - Select “View related Files” + - In the Search window, search for the DBFILE you just created +- Once you have found the DBFILE, click on the “+” icon to add the file +- Click on “Update” at the bottom when you are done. + ### Adding a new input converter {#InputConversions} Three Types of data conversions are discussed below: Meteorological data, Vegetation data, and Soil data. Each section provides instructions on how to convert data from their raw formats into a PEcAn standard format, whether it be from a database or if you have raw data in hand. Also, see [PEcAn standard formats]. -#### Meterological Data +#### Meteorological Data -##### Adding a function to PEcAn to convert a met data source +**Adding a conversion function** In general, you will need to write a function to download the raw met data and one to convert it to the PEcAn standard. -Downloading raw data function are named `download..R`. These functions are stored within the PEcAn directory: [`/modules/data.atmosphere/R`](https://github.com/PecanProject/pecan/tree/develop/modules/data.atmosphere/R). +Downloading raw data function are named `download..R`. These functions are stored within the PEcAn directory: [`/modules/data.atmosphere/R`](https://github.com/PecanProject/pecan/tree/develop/modules/data.atmosphere/R). -Conversion function from raw to standard are named `met2CF..R`. These functions are stored within the PEcAn directory: [`/modules/data.atmosphere/R`](https://github.com/PecanProject/pecan/tree/develop/modules/data.atmosphere/R). +Conversion function from raw to standard are named `met2CF..R`. These functions are stored within the PEcAn directory: [`/modules/data.atmosphere/R`](https://github.com/PecanProject/pecan/tree/develop/modules/data.atmosphere/R). Current Meteorological products that are coupled to PEcAn can be found in our [Available Meteorological Drivers](#met-drivers) page. @@ -295,21 +294,22 @@ Note: Unless you are also adding a new model, you will not need to write a scrip *Standards dimesion, names, nad units can be found here:* [Input Standards] -##### Adding Single-Site Specific Meteorological Data +**Single-site meteorological data** Perhaps you have meteorological data specific to one site, with a unique format that you would like to add to PEcAn. Your steps would be to: - 1. write a script or function to convert your files into the netcdf PEcAn standard - 2. insert that file as an input record for your site following these [instructions](#NewInput) - -##### Processing Met data outside of the workflow using PEcAn functions -Perhaps you would like to obtain data from one of the sources coupled to PEcAn on its own. To do so you can run PEcAn functions on their own. +1. write a script or function to convert your files into the netcdf PEcAn standard +2. insert that file as an input record for your site following these [instructions](#NewInput) + +**Processing meteorological data outside the workflow** -###### Example 1: Processing data from a database +Perhaps you would like to obtain data from one of the sources coupled to PEcAn on its own. To do so you can run PEcAn functions on their own. + +**Example 1: Processing data from a database** Download Amerifluxlbl from Niwot Ridge for the year 2004: -``` +``` raw.file <-PEcAn.data.atmosphere::download.AmerifluxLBL(sitename = "US-NR1", outfolder = ".", start_date = "2004-01-01", @@ -320,7 +320,7 @@ Using the information returned as the object `raw.file` you will then convert th Open a connection with BETY. You may need to change the host name depending on what machine you are hosting BETY. You can find the hostname listed in the machines table of BETY. -``` +``` con <- PEcAn.DB::db.open( params = list( @@ -330,12 +330,11 @@ con <- PEcAn.DB::db.open( user = "bety", password = "bety") ) - ``` Next you will set up the arguments for the function -``` +``` in.path <- '.' in.prefix <- raw.file$dbfile.name outfolder <- '.' @@ -345,11 +344,12 @@ lon <- -105.54 lat <- 40.03 format$time_zone <- "America/Chicago" ``` + Note: The format.id can be pulled from the BETY database if you know the format of the raw data. Once these arguments are defined you can execute the `met2CF.csv` function -``` +``` PEcAn.data.atmosphere::met2CF.csv(in.path = in.path, in.prefix =in.prefix, outfolder = ".", @@ -360,19 +360,17 @@ PEcAn.data.atmosphere::met2CF.csv(in.path = in.path, format = format) ``` - - -###### Example 2: Processing data from data already in hand +**Example 2: Processing data from data already in hand** If you have Met data already in hand and you would like to convert into the PEcAn standard follow these instructions. Update BETY with file record, format record and input record according to this page [How to Insert new Input Data](#NewInput) - + If your data is in a csv format you can use the `met2CF.csv`function to convert your data into a PEcAn standard file. Open a connection with BETY. You may need to change the host name depending on what machine you are hosting BETY. You can find the hostname listed in the machines table of BETY. -``` +``` con <- PEcAn.DB::db.open( params = list( driver = RPostgres::Postgres(), @@ -383,9 +381,9 @@ con <- PEcAn.DB::db.open( ) ``` -Prepare the arguments you need to execute the met2CF.csv function +Prepare the arguments you need to execute the met2CF.csv function -``` +``` in.path <- 'path/where/the/raw/file/lives' in.prefix <- 'prefix_of_the_raw_file' outfolder <- 'path/to/where/you/want/to/output/thecsv/' @@ -399,7 +397,8 @@ end_date <- End date of your data in "y-m-d" ``` Next you can execute the function: -``` + +``` PEcAn.data.atmosphere::met2CF.csv(in.path = in.path, in.prefix =in.prefix, outfolder = ".", @@ -410,23 +409,23 @@ PEcAn.data.atmosphere::met2CF.csv(in.path = in.path, format = format) ``` - #### Vegetation Data Vegetation data will be required to parameterize your model. In these examples we will go over how to produce a standard initial condition file. The main function to process cohort data is the `ic_process.R` function. As of now however, if you require pool data you will run a separate function, `pool_ic_list2netcdf.R`. -###### Example 1: Processing Veg data from data in hand. +**Example: Processing vegetation data from local files** In the following example we will process vegetation data that you have in hand using PEcAn. First, you'll need to create a input record in BETY that will have a file record and format record reflecting the location and format of your file. Instructions can be found in our [How to Insert new Input Data](#NewInput) page. -Once you have created an input record you must take note of the input id of your record. An easy way to take note of this is in the URL of the BETY webpage that shows your input record. In this example we use an input record with the id `1000013064` which can be found at this url: https://psql-pecan.bu.edu/bety/inputs/1000013064# . Note that this is the Boston University BETY database. If you are on a different machine, your url will be different. +Once you have created an input record you must take note of the input id of your record. An easy way to take note of this is in the URL of the BETY webpage that shows your input record. In this example we use an input record with the id `1000013064` which can be found at this url: . Note that this is the Boston University BETY database. If you are on a different machine, your url will be different. -With the input id in hand you can now edit a pecan XML so that the PEcAn function `ic_process` will know where to look in order to process your data. The `inputs` section of your pecan XML will look like this. As of now ic_process is set up to work with the ED2 model so we will use ED2 settings and then grab the intermediary Rds data file that is created as the standard PEcAn file. For your Inputs section you will need to input your input id wherever you see the `useic` flag. -``` +With the input id in hand you can now edit a pecan XML so that the PEcAn function `ic_process` will know where to look in order to process your data. The `inputs` section of your pecan XML will look like this. As of now ic_process is set up to work with the ED2 model so we will use ED2 settings and then grab the intermediary Rds data file that is created as the standard PEcAn file. For your Inputs section you will need to input your input id wherever you see the `useic` flag. + +``` FFT @@ -474,7 +473,8 @@ With the input id in hand you can now edit a pecan XML so that the PEcAn functi ``` This IC workflow also supports generating ensembles of initial conditions from posterior estimates of DBH. To do this the tags below can be inserted to the pecan.xml: -``` + +``` PalEON css @@ -487,8 +487,10 @@ This IC workflow also supports generating ensembles of initial conditions from p ``` + Here the `id` should point to a file that has MCMC samples to generate the ensemble from. The number between the `` tag defines the number of ensembles requested. The workflow will populate the settings list `run$inputs` tag with ensemble member information. E.g.: -``` + +``` ... @@ -525,41 +527,39 @@ Here the `id` should point to a file that has MCMC samples to generate the ensem Once you edit your PEcAn.xml you can than create a settings object using PEcAn functions. Your `pecan.xml` must be in your working directory. -``` +``` settings <- PEcAn.settings::read.settings("pecan.xml") settings <- PEcAn.settings::prepare.settings(settings, force=FALSE) ``` + You can then execute the `ic_process` function to convert data into a standard Rds file: -``` +``` input <- settings$run$inputs dir <- "." ic_process(settings, input, dir, overwrite = FALSE) ``` -Note that the argument `dir` is set to the current directory. You will find the final ED2 file there. More importantly though you will find the `.Rds ` file within the same directory. - +Note that the argument `dir` is set to the current directory. You will find the final ED2 file there. More importantly though you will find the `.Rds` file within the same directory. +**Example 3 Pool Initial Condition files** - -###### Example 3 Pool Initial Condition files -If you have pool vegetation data, you'll need the [`pool_ic_list2netcdf.R`](https://github.com/PecanProject/pecan/blob/develop/modules/data.land/R/pool_ic_list2netcdf.R) function to convert the pool data into PEcAn -standard. +If you have pool vegetation data, you'll need the [`pool_ic_list2netcdf.R`](https://github.com/PecanProject/pecan/blob/develop/modules/data.land/R/pool_ic_list2netcdf.R) function to convert the pool data into PEcAn standard. The function stands alone and requires that you provide a named list of netcdf dimensions and values, and a named list of variables and values. Names and units need to match the standard_vars.csv table found [here](https://github.com/PecanProject/pecan/blob/develop/base/utils/data/standard_vars.csv). -``` +``` #Create a list object with necessary dimensions for your site input<-list() dims<- list(lat=-115,lon=45, time= 1) variables<- list(SoilResp=8,TotLivBiom=295) input$dims <- dims input$vals <- variables -``` +``` Once this is done, set `outdir` to where you'd like the file to write out to and a siteid. Siteid in this can be used as an file name identifier. Once part of the automated workflow siteid will reflect the site id within the BET db. -``` +``` outdir <- "." siteid <- 772 pool_ic_list2netcdf(input = input, outdir = outdir, siteid = siteid) @@ -569,11 +569,11 @@ You should now have a netcdf file with initial conditions. #### Soil Data -###### Example 1: Converting Data in hand +**Example 1: Converting Data in hand** Local data that has the correct names and units can easily be written out in PEcAn standard using the function soil2netcdf. -``` +``` soil.data <- list(volume_fraction_of_sand_in_soil = c(0.3,0.4,0.5), volume_fraction_of_clay_in_soil = c(0.3,0.3,0.3), soil_depth = c(0.2,0.5,1.0)) @@ -583,12 +583,11 @@ soil2netcdf(soil.data,"soil.nc") At the moment this file would need to be inserted into Inputs manually. By default, this function also calls soil_params, which will estimate a number of hydraulic and thermal parameters from texture. Be aware that at the moment not all model couplers are yet set up to read this file and/or convert it to model-specific formats. +**Example 2: Converting PalEON data** -###### Example 2: Converting PalEON data - -In addition to location-specific soil data, PEcAn can extract soil texture information from the PalEON regional soil product, which itself is a subset of the MsTMIP Unified North American Soil Map. If this product is installed on your machine, the appropriate step in the do_conversions workflow is enabled by adding the following tag under `````` in your pecan.xml +In addition to location-specific soil data, PEcAn can extract soil texture information from the PalEON regional soil product, which itself is a subset of the MsTMIP Unified North American Soil Map. If this product is installed on your machine, the appropriate step in the do_conversions workflow is enabled by adding the following tag under `` in your pecan.xml -```xml +``` xml 1000012896 @@ -596,12 +595,11 @@ In addition to location-specific soil data, PEcAn can extract soil texture infor In the future we aim to extend this extraction to a wider range of soil products. - -###### Example 3: Extracting soil properties from gSSURGO database +**Example 3: Extracting soil properties from gSSURGO database** In addition to location-specific soil data, PEcAn can extract soil texture information from the gSSURGO data product. This product needs no installation and it extract soil proeprties for the lower 48 states in U.S. In order to let the pecan know that you're planning to use gSSURGO, you can the following XML tag under input in your pecan xml file. -```xml +``` xml gSSURGO @@ -609,90 +607,87 @@ In addition to location-specific soil data, PEcAn can extract soil texture infor ``` - - - -## Pecan Data Ingest via Web Interface {#adding-data-web} +## Pecan Data Ingest via Web Interface {#adding-data-web} This tutorial explains the process of ingesting data into PEcAn via our Data-Ingest Application. In order to ingest data, the users must first select data that they wish to upload. Then, they enter metadata to help PEcAn parse and load the data into the main PEcAn workflow. ### Loading Data #### Selecting Ingest Method -The Data-Ingest application is capable of loading data from the DataONE data federation and from the user's local machine. The first step in the workflow is therefore to select an upload method. The application defaults to uploading from DataONE. To upload data from a local device, simply select the radio button titled `Local Files `. -#### DataONE Upload Example -
+The Data-Ingest application is capable of loading data from the DataONE data federation and from the user's local machine. The first step in the workflow is therefore to select an upload method. The application defaults to uploading from DataONE. To upload data from a local device, simply select the radio button titled `Local Files`. + +#### DataONE Upload Example ```{r, echo=FALSE,out.height= "50%", out.width="50%", fig.align='center'} knitr::include_graphics("02_demos_tutorials_workflows/02_user_demos/07_advanced_user_guide/images/data-ingest/D1Ingest-1.gif") ``` -
-The DataONE download feature allows the user to download data at a given doi or DataONE specific package id. To do so, enter the doi or identifier in the `Import From DataONE` field and select `download`. The download process may take a couple of minutes to run depending on the number of files in the dataONE package. This may be a convenient option if the user does not wish to download files directly to their local machine. Once the files have been successfully downloaded from DataONE, they are displayed in a table. Before proceeding to the next step, the user can select a file to ingest by clicking on the corresponding row in the data table. -
+ +
The DataONE download feature allows the user to download data at a given doi or DataONE specific package id. To do so, enter the doi or identifier in the `Import From DataONE` field and select `download`. The download process may take a couple of minutes to run depending on the number of files in the dataONE package. This may be a convenient option if the user does not wish to download files directly to their local machine. Once the files have been successfully downloaded from DataONE, they are displayed in a table. Before proceeding to the next step, the user can select a file to ingest by clicking on the corresponding row in the data table.
### Local Upload Example -
-To upload local files, the user should first select the `Local Files` button. From there, the user can upload files from their local machines by selecting `Browse` or by dragging and dropping files into the text box. The files will begin uploading automatically. From there, the user should select a file to ingest and then select the `Next Step` button. -
-After this step, the workflow is identical for both methods. However, please note that if it becomes necessary to switch from loading data via `DataONE` to uploading local files after the first step, please restart the application. -
+To upload local files, the user should first select the `Local Files` button. From there, the user can upload files from their local machines by selecting `Browse` or by dragging and dropping files into the text box. The files will begin uploading automatically. From there, the user should select a file to ingest and then select the `Next Step` button. + +After this step, the workflow is identical for both methods. However, please note that if it becomes necessary to switch from loading data via `DataONE` to uploading local files after the first step, please restart the application. \`\`\` ```{r, echo=FALSE,out.height= "50%", out.width="50%", fig.align='center'} knitr::include_graphics("02_demos_tutorials_workflows/02_user_demos/07_advanced_user_guide/images/data-ingest/Local_loader_sm.gif") ``` -
+ ```{r, echo=FALSE,out.height= "50%", out.width="50%", fig.align='center'} knitr::include_graphics("02_demos_tutorials_workflows/02_user_demos/07_advanced_user_guide/images/data-ingest/local_browse.gif") ``` + ### 2. Creating an Input Record -Creating an input record requires some basic metadata about the file that is being ingested. Each entry field is briefly explained below. -
- - Site: To link the selected file with a site, the user can scroll or type to search all the sites in PEcAn. See Example: -
+Creating an input record requires some basic metadata about the file that is being ingested. Each entry field is briefly explained below.
+ +- Site: To link the selected file with a site, the user can scroll or type to search all the sites in PEcAn. See Example: + ```{r, echo=FALSE,out.height= "50%", out.width="50%", fig.align='center'} knitr::include_graphics("02_demos_tutorials_workflows/02_user_demos/07_advanced_user_guide/images/data-ingest/Selectize_Input_sm.gif") ``` -
-- Parent: To link the selected file with another dataset, type to search existing datasets in the `Parent` field. + +
- Parent: To link the selected file with another dataset, type to search existing datasets in the `Parent` field. - Name: this field should be autofilled by selecting a file in step 1. - Format: If the selected file has an existing format name, the user can search and select in the `Format` field. If the selected file's format is not already in pecan, the user can create a new format by selecting `Create New Format`. Once this new format is created, it will automatically populate the `Format` box and the `Current Mimetype` box (See Section 3). -- Mimetype: If the format already exists, select an existing mimetype. +- Mimetype: If the format already exists, select an existing mimetype. - Start and End Date and Time: Inputs can be entered manually or by using the user interface. See example -
```{r, echo=FALSE,out.height= "50%", out.width="50%", fig.align='center'} knitr::include_graphics("02_demos_tutorials_workflows/02_user_demos/07_advanced_user_guide/images/data-ingest/DateTime.gif") ``` -- Notes: Describe the data that is being uploaded. Please include any citations or references. +- Notes: Describe the data that is being uploaded. Please include any citations or references. ### 3. Creating a format record -If it is necessary to add a new format to PEcAn, the user should fill out the form attached to the `Create New Format` button. The inputs to this form are described below: + +If it is necessary to add a new format to PEcAn, the user should fill out the form attached to the `Create New Format` button. The inputs to this form are described below: - Mimetype: type to search existing mimetypes. If the mimetype is not in that list, please click on the link `Create New Mimetype` and create a new mimetype via the BETY website. -- New Format Name: Add the name of the new format. Please exclude spaces from the name. Instead please use underscores "_". +- New Format Name: Add the name of the new format. Please exclude spaces from the name. Instead please use underscores "\_". - Header: If there is space before the first line of data in the dataset, please select `Yes` -- Skip: The number of lines in the header that should be skipped before the data. +- Skip: The number of lines in the header that should be skipped before the data. -- Please enter notes that describe the format. +- Please enter notes that describe the format. + +Example: -Example: -
```{r, echo=FALSE,out.height= "50%", out.width="50%", fig.align='center'} knitr::include_graphics("02_demos_tutorials_workflows/02_user_demos/07_advanced_user_guide/images/data-ingest/new_format_record.gif") ``` + ### 4. Formats_Variables Record -The final step in the ingest process is to register a formats-variables record. This record links pecan variables with variables from the selected data. + +The final step in the ingest process is to register a formats-variables record. This record links pecan variables with variables from the selected data. - Variable: PEcAn variable that is equivalent to variable in selected file. @@ -702,69 +697,64 @@ The final step in the ingest process is to register a formats-variables record. - Storage Type: Storage type need only be specified if the variable is stored in a format other than would be expected (e.g. if numeric values are stored as quoted character strings). Additionally, storage_type stores POSIX codes that are used to store any time variables (e.g. a column with a 4-digit year would be `%Y`). -- Column Number: Vector of integers that list the column numbers associated with variables in a dataset. Required for text files that lack headers. -
+- Column Number: Vector of integers that list the column numbers associated with variables in a dataset. Required for text files that lack headers.
+ ```{r, echo=FALSE,out.height= "50%", out.width="50%", fig.align='center'} knitr::include_graphics("02_demos_tutorials_workflows/02_user_demos/07_advanced_user_guide/images/data-ingest/D1Ingest-9_sm.gif") ``` -Finally, the path to the ingest data is displayed in the `Select Files` box. - +Finally, the path to the ingest data is displayed in the `Select Files` box. ## Creating a new format {#NewFormat} + ### Formats in BETY The PEcAn database keeps track of all the input files passed to models, as well as any data used in model validation or data assimilation. Before we start to register these files with PEcAn we need to define the format these files will be in. -The main goal is to take all the meta-data we have about a data file and create a record of it that pecan can use as a guide when parsing the data file. +The main goal is to take all the meta-data we have about a data file and create a record of it that pecan can use as a guide when parsing the data file. -This information is stored in a Format record in the bety database. Make sure to read through the current Formats before deciding to make a new one. +This information is stored in a Format record in the bety database. Make sure to read through the current Formats before deciding to make a new one. ### Creating a new format in BETY If the Format you are looking for is not available, you will need to create a new record. Before entering information into the database, you need to be able to answer the following questions about your data: -- What is the file MIME type? - - We have a suit of functions for loading in data in open formats such as CSV, txt, netCDF, etc. - +- What is the file MIME type? + - We have a suit of functions for loading in data in open formats such as CSV, txt, netCDF, etc. - What variables does the file contain? - - What are the variables named? - - What are the variable units? - - How do the variable names and units in the data map to PEcAn variables in the BETY database? See [below](### Name and Unit) for an example. It is most likely that you will NOT need to add variables to BETY. However, identifying the appropriate variables matches in the database may require some work. We are always available to help answer your questions. - -- Is there a timestamp on the data? - - What are the units of time? - + - What are the variables named? + - What are the variable units? + - How do the variable names and units in the data map to PEcAn variables in the BETY database? See [below](###%20Name%20and%20Unit) for an example. It is most likely that you will NOT need to add variables to BETY. However, identifying the appropriate variables matches in the database may require some work. We are always available to help answer your questions. +- Is there a timestamp on the data? + - What are the units of time? Here is an example using a fake dataset: ![example_data](02_demos_tutorials_workflows/02_user_demos/07_advanced_user_guide/images/example_data.png) +This data started out as an excel document, but was saved as a CSV file. - -This data started out as an excel document, but was saved as a CSV file. - -To create a Formats record for this data, in the web interface of BETY, select Runs > Formats and click _New Format_. +To create a Formats record for this data, in the web interface of BETY, select Runs \> Formats and click *New Format*. You will need to fill out the following fields: - MIME type: File type (you can search for other formats in the text field) - Name: The name of your format (this can be whatever you want) - Header: Boolean that denotes whether or not your data contains a header as the first line of the data. (1 = TRUE, 0 = FALSE) -- Skip: The number of lines above the data that should be skipped. For example, metadata that should not be included when reading in the data or blank spaces. -- Notes: Any additional information about the data such as sources and citations. +- Skip: The number of lines above the data that should be skipped. For example, metadata that should not be included when reading in the data or blank spaces. +- Notes: Any additional information about the data such as sources and citations. Here is the Formats record for the example data: ![format_record_1](02_demos_tutorials_workflows/02_user_demos/07_advanced_user_guide/images/format_record_1.png) -When you have finished this section, hit Create. The final record will be displayed on the screen. +When you have finished this section, hit Create. The final record will be displayed on the screen. -#### Formats -> Variables +#### Formats -\> Variables After a Format entry has been created, you are encouraged to edit the entry to add relationships between the file's variables and the Variables table in PEcAn. Not only do these relationships provide meta-data describing the file format, but they also allow PEcAn to search and (for some MIME types) read files. -To enter this data, select Edit Record and on the edit screen select View Related Variable. +To enter this data, select Edit Record and on the edit screen select View Related Variable. Here is the record for the example data after adding related variables: @@ -774,35 +764,31 @@ Here is the record for the example data after adding related variables: For each variable in the file you will want at a minimum to specify the NAME of the variable within your file and match that to the equivalent Variable in the pulldown. -Make sure to search for your variables under Data > Variables before suggesting that we create a new variable record. This may not always be a straightforward process. +Make sure to search for your variables under Data \> Variables before suggesting that we create a new variable record. This may not always be a straightforward process. -For example bety contains a record for Net Primary Productivity: +For example bety contains a record for Net Primary Productivity: ![var_record](03_topical_pages/11_images/var_record.png) -This record does not have the same variable name or the same units as NPP in the example data. -You may have to do some reading to confirm that they are the same variable. -In this case -- Both the data and the record are for Net Primary Productivity (the notes section provides additional resources for interpreting the variable.) -- The units of the data can be converted to those of the vairiable record (this can be checked by running `udunits2::ud.are.convertible("g C m-2 yr-1", "Mg C ha-1 yr-1")`) +This record does not have the same variable name or the same units as NPP in the example data. You may have to do some reading to confirm that they are the same variable. In this case - Both the data and the record are for Net Primary Productivity (the notes section provides additional resources for interpreting the variable.) - The units of the data can be converted to those of the vairiable record (this can be checked by running `udunits2::ud.are.convertible("g C m-2 yr-1", "Mg C ha-1 yr-1")`) -Differences between the data and the variable record can be accounted for in the data Formats record. +Differences between the data and the variable record can be accounted for in the data Formats record. - Under Variable, select the variable as it is recorded in bety. - Under Name, write the name the variable has in your data file. -- Under Unit, write the units the variable has in your data file. +- Under Unit, write the units the variable has in your data file. NOTE: All units must be written in a udunits compliant format. To check that your units can be read by udunits, in R, load the udunits2 package and run `udunits2::is.parseable("g C m-2 yr-1")` -**If the name or the units are the same**, you can leave the Name and Unit fields blank. This is can be seen with the variable LAI. +**If the name or the units are the same**, you can leave the Name and Unit fields blank. This is can be seen with the variable LAI. ##### Storage Type -_Storage Type_ only needs to be specified if the variable is stored in a format other than what would be expected (e.g. if numeric values are stored as quoted character strings). +*Storage Type* only needs to be specified if the variable is stored in a format other than what would be expected (e.g. if numeric values are stored as quoted character strings). One such example is *time variables*. -PEcAn converts all dates into POSIX format using R functions such as `strptime`. These functions require that the user specify the format in which the date is written. +PEcAn converts all dates into POSIX format using R functions such as `strptime`. These functions require that the user specify the format in which the date is written. The default is `"%Y-%m-%d %H:%M:%S"` which would look like `"2017-01-01 00:00:00"` @@ -810,17 +796,16 @@ A list of date formats can be found in the [R documentation for the function `st Below are some commonly used codes: -| %d | Day of the month as decimal number (01–31). | -| ---- | ---------------------------------------- | -| %D | Date format such as %m/%d/%y. | -| %H | Hours as decimal number (00–23). | -| %m | Month as decimal number (01–12). | -| %M | Minute as decimal number (00–59). | -| %S | Second as integer (00–61), allowing for up to two leap-seconds (but POSIX-compliant implementations will ignore leap seconds). | -| %T | Equivalent to %H:%M:%S. | -| %y | Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’. | -| %Y | Year with century. | - +| %d | Day of the month as decimal number (01–31). | +|------------------------------------|------------------------------------| +| %D | Date format such as %m/%d/%y. | +| %H | Hours as decimal number (00–23). | +| %m | Month as decimal number (01–12). | +| %M | Minute as decimal number (00–59). | +| %S | Second as integer (00–61), allowing for up to two leap-seconds (but POSIX-compliant implementations will ignore leap seconds). | +| %T | Equivalent to %H:%M:%S. | +| %y | Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’. | +| %Y | Year with century. | ##### Column Number @@ -834,22 +819,22 @@ To acquire Format information from a Format record, use the R function `query.fo - `bety`: connection to BETY - `input.id=NA` and/or `format.id=NA`: Input or Format record ID from BETY - - At least one must be specified. Defaults to `format.id` if both provided. +- At least one must be specified. Defaults to `format.id` if both provided. - `var.ids=NA`: optional vector of variable IDs. If provided, limits results to these variables. ##### Output -- R list object containing many things. Fill this in. +- R list object containing many things. Fill this in. ## Creating a new benchmark reference run {#NewBenchmark} -The purpose of the reference run record in BETY is to store all the settings from a run that are necessary in exactly recreating it. +The purpose of the reference run record in BETY is to store all the settings from a run that are necessary in exactly recreating it. -The pecan.xml file is the home of absolutely all the settings for a particular run in pecan. However, much of the information in the pecan.xml file is server and user specific and more importantly, the pecan.xml files are stored on individual servers and may not be available to the public. +The pecan.xml file is the home of absolutely all the settings for a particular run in pecan. However, much of the information in the pecan.xml file is server and user specific and more importantly, the pecan.xml files are stored on individual servers and may not be available to the public. -When a run that is performed using pecan is registered as a reference run, the settings that were used to make that run are made available to all users through the database. +When a run that is performed using pecan is registered as a reference run, the settings that were used to make that run are made available to all users through the database. -All completed runs are not automatically registered as reference runs. To register a run, navigate to the benchmarking section of the workflow visualizations Shiny app. +All completed runs are not automatically registered as reference runs. To register a run, navigate to the benchmarking section of the workflow visualizations Shiny app. ## Editing records {#editing-records}