This repository serves to store the code used for processing variety trial and crop model variable inventories and carrying out a data suitability assessment. It was last updated on October 04, 2025.
All variable inventories and model-data scoring sheets are stored in the Purdue University Research Repository "Data Suitability Assessment Materials for Commercial Horticultural Variety Trials and Crop Models" dataset. The file setup of this repository mirrors that of the dataset so that the user can simply copy this repository's files into a separate folder and add the variable inventories (and scoring sheets if they wish not to run the processing code).
project_folders
- model_parameters - Holds the data-variable, model-variable, and universal-variable inventory csv files used to conduct the data suitability assessment.
- prevalence - Holds the preliminary, inspected, and final prevalence calculation csv files for variables in the dataset.
- recommendations - Holds the preliminary, inspected, and final recommendation csv files for increasing prevalence of rare and uncommon variables.
- scoring_sheets - Holds the preliminary, inspected, and final suitability score sheet csv files for each model and dataset combination.
- final_datasets - Holds the final suitability score sheet csv files for dataset-level suitability.
- final_parameters - Holds the final suitability score sheet csv files for variable-level suitability.
- reviewed - Holds the inspected suitability score sheet csv files for variable-level suitability.
- variety_trial_codes - Holds the raw and processed code book csv files produced from qualitative coding of variety trial reports and raw data.
A description of the environment used for all processing and analysis is found in the environment.yml file in the main directory.
- raw_data_processing.ipynb (in variety_trial_codes directory) - notebook that intakes variety trial code book csv files and produces dataset-level code book csv files.
- additional_code_processing.ipynb (in variety_trial_codes directory) - notebook that intakes the dataset-level code book csv files and refines variables into higher-level model-compatible variables to produce final dataset-level code book csv files.
- trial_report_param_prevalence.ipynb - notebook that calculates the prevalence of each variable in the variety trial reports and produces a prevalence csv file.
- raw_data_param_prevalence.ipynb - notebook that calculates the prevalence of each variable in the variety trial raw data and produces a prevalence csv file.
- composite_param_prevalence.ipynb - notebook that calculates the prevalence of each variable in the composite of variety trial reports and raw data and produces a prevalence csv file.
- param_scoring.ipynb - notebook that intakes data and model variable inventory csv files and produces variable-level suitability score sheet csv files.
- dataset_scoring.ipynb - notebook that intakes data and model variable-level score sheet csv files and produces dataset-level suitability score sheet csv files.
- scoring.py - script that contains scoring functions for variable and dataset-level scoring notebooks.
- param_recommendations.ipynb - notebook that intakes variable prevalence csv files and produces recommendation csv files for rare and uncommon variables.