Skip to content

ag-informatics/vt-dsa-processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Variety Trial and Crop Model Data Suitability Assessment Processing

This repository serves to store the code used for processing variety trial and crop model variable inventories and carrying out a data suitability assessment. It was last updated on October 04, 2025.

File Locations

All variable inventories and model-data scoring sheets are stored in the Purdue University Research Repository "Data Suitability Assessment Materials for Commercial Horticultural Variety Trials and Crop Models" dataset. The file setup of this repository mirrors that of the dataset so that the user can simply copy this repository's files into a separate folder and add the variable inventories (and scoring sheets if they wish not to run the processing code).

Repository Structure

project_folders

  • model_parameters - Holds the data-variable, model-variable, and universal-variable inventory csv files used to conduct the data suitability assessment.
  • prevalence - Holds the preliminary, inspected, and final prevalence calculation csv files for variables in the dataset.
  • recommendations - Holds the preliminary, inspected, and final recommendation csv files for increasing prevalence of rare and uncommon variables.
  • scoring_sheets - Holds the preliminary, inspected, and final suitability score sheet csv files for each model and dataset combination.
    • final_datasets - Holds the final suitability score sheet csv files for dataset-level suitability.
    • final_parameters - Holds the final suitability score sheet csv files for variable-level suitability.
    • reviewed - Holds the inspected suitability score sheet csv files for variable-level suitability.
  • variety_trial_codes - Holds the raw and processed code book csv files produced from qualitative coding of variety trial reports and raw data.

Environment

A description of the environment used for all processing and analysis is found in the environment.yml file in the main directory.

Important Scripts

  1. raw_data_processing.ipynb (in variety_trial_codes directory) - notebook that intakes variety trial code book csv files and produces dataset-level code book csv files.
  2. additional_code_processing.ipynb (in variety_trial_codes directory) - notebook that intakes the dataset-level code book csv files and refines variables into higher-level model-compatible variables to produce final dataset-level code book csv files.
  3. trial_report_param_prevalence.ipynb - notebook that calculates the prevalence of each variable in the variety trial reports and produces a prevalence csv file.
  4. raw_data_param_prevalence.ipynb - notebook that calculates the prevalence of each variable in the variety trial raw data and produces a prevalence csv file.
  5. composite_param_prevalence.ipynb - notebook that calculates the prevalence of each variable in the composite of variety trial reports and raw data and produces a prevalence csv file.
  6. param_scoring.ipynb - notebook that intakes data and model variable inventory csv files and produces variable-level suitability score sheet csv files.
  7. dataset_scoring.ipynb - notebook that intakes data and model variable-level score sheet csv files and produces dataset-level suitability score sheet csv files.
  8. scoring.py - script that contains scoring functions for variable and dataset-level scoring notebooks.
  9. param_recommendations.ipynb - notebook that intakes variable prevalence csv files and produces recommendation csv files for rare and uncommon variables.

About

Variety trial and crop model data suitability assessment processing notebooks and code, to be used in conjunction with variety trial and crop model variable inventories.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors