Updating readme and reducing verbose script explanations

mridulk97 · mridulk97 · commit 9b89c64394da · 2025-12-12T14:46:38.000-05:00
diff --git a/CITATION.cff b/CITATION.cff
@@ -16,15 +16,15 @@ authors:
     given-names: "Alyson"
     email: "sydne.record@maine.edu"
     affiliation: "The University of Maine"
-  - family-names: "Fluck"
-    given-names: "Isadora E."
-    affiliation: "University of Florida"
   - family-names: "Campolongo"
     given-names: "Elizabeth G."
     affiliation: "The Ohio State University"
   - family-names: "Stevens"
     given-names: "Samuel"
     affiliation: "The Ohio State University"
+  - family-names: "Wu"
+    given-names: "Jiaman"
+    affiliation: "The Ohio State University"
   - family-names: "Taylor"
     given-names: "Graham W."
     affiliation: "University of Guelph"
diff --git a/README.md b/README.md
@@ -84,35 +84,18 @@ CVAT (Computer Vision Annotation Tool) annotations containing:
 
 **Script:** `2018_neon_beetles_get_individual_images.py`
 
-Extracts individual beetle specimens from annotated group images using CVAT XML annotations.
-
-**Features:**
-- Parses CVAT XML format
-- Extracts bounding box coordinates
-- Crops individual specimens with optional padding
-- Saves as separate PNG files with specimen numbering
-- Progress tracking with tqdm
-
-**Key Functions:**
-- `parse_cvat_annotations(xml_path)`: Parse CVAT XML and extract image metadata
-- `crop_and_save_images(images_data, images_dir, output_dir, padding)`: Crop and save specimens
+Extracts individual beetle specimens from group images using CVAT XML bounding box annotations. Parses coordinates, crops specimens with optional padding, and saves as numbered PNG files with progress tracking.
 
 ### 3. **Image Resizing with Uniform Scaling**
 
 **Script:** `resizing_individual_beetle_images.py`
 
-Resizes individual beetle specimen images to match BeetlePalooza's resized group images using uniform scaling factors. <<<< Is this for the 2018 NEON beetles to get measurements based on Zooniverse size???
-
-**Purpose:**
-- Aligns individual specimen images with the resolution of BeetlePalooza's processed group images
-- Ensures morphometric measurements made on resized images can be accurately applied to individual specimens
-- Uses uniform scaling (average of x and y scale factors) for consistency
+Aligns individual beetle crops with BeetlePalooza's Zooniverse-processed group images by applying uniform scaling factors. This enables accurate transfer of citizen science measurements from resized group images to individual specimens.
 
 **Workflow:**
-1. Calculate uniform scaling factors between original and BeetlePalooza resized group images
-2. Save scaling factors to JSON for reference and reproducibility
-3. Apply uniform scaling to all individual specimen images
-4. Generate processing summary with statistics
+1. Calculate uniform scaling factors (average of x and y) between original and resized group images
+2. Apply scaling to all individual specimen images
+3. Save scaling metadata and processing statistics to JSON
 
 ### 4. **Dataset Upload to Hugging Face**
 
@@ -140,9 +123,11 @@ python upload_dataset_to_hf.py \
 
 ### 5. **Zero-Shot Object Detection**
 
-**Notebook:** `grounding_dino.ipynb`
+**Script:** `beetle_detection.py` | **Notebook:** `grounding_dino.ipynb`
 
-Advanced pipeline using **Grounding DINO** for automated beetle detection and segmentation. `scripts/beetle_detection.py` is this notebook converted to a runnable script. An example minimal run (passing only required parameters) is provided below:
+Automated beetle detection pipeline using **Grounding DINO** zero-shot object detection. The script version provides a command-line interface for the notebook workflow.
+
+**Basic Usage:**
 
 ```console
 python scripts/beetle_detection.py \
@@ -152,94 +137,38 @@ python scripts/beetle_detection.py \
   --output_csv data/processed.csv
 ```
 
-Additional optional parameters that can be passed are as follows:
-- `model_id`: Model ID for Grounding-DINO, default is `IDEA-Research/grounding-dino-base`.
-- `text`: Text prompt for detection, default is `"a beetle."`.
-- `box_threshold`: Box threshold for detection, default is `0.2`.
-- `text_threshold`: Text threshold for detection,default is `0.2`.
-- `padding`: Padding factor for cropping, default is `0.1`.
-- `iou_threshold`: IoU threshold for Non-Maximum Suppression (NMS), default is `0.6`.
+Optional parameters: `--model_id` (default: `IDEA-Research/grounding-dino-base`), `--text` (prompt, default: `"a beetle."`), `--box_threshold` (0.2), `--text_threshold` (0.2), `--padding` (0.1), `--iou_threshold` (0.6).
 
-**Workflow:**
-1. Load beetle measurements from the [2018 NEON Ethanol-preserved Ground Beetles dataset](https://huggingface.co/datasets/imageomics/2018-NEON-beetles)
-2. Initialize Grounding DINO model
-3. For each image:
-   - Detect beetles using text prompt ("a beetle")
-   - Filter detections based on adaptive area thresholds
-   - Verify detections contain elytra measurement points
-   - Apply Non-Maximum Suppression (NMS) to remove duplicates
-   - Select best bounding box (largest area with highest confidence)
-4. Save individual beetle images and CSV metadata
+The pipeline detects beetles using text prompts, filters by adaptive area thresholds, validates measurement points, applies NMS to remove duplicates, and selects optimal bounding boxes before saving crops and metadata.
 
 ### 6. **Inter-Annotator Agreement**
 
 **Script:** `inter_annotator.py`
 
-Quantifies measurement consistency between multiple human annotators for continuous morphometric traits.
-
-**Analysis:**
-- Compares three annotator pairs:
-  - Annotator A vs. Annotator B
-  - Annotator B vs. Annotator C
-  - Annotator C vs. Annotator A
-
-**Metrics Computed:**
-- **RMSE** (Root Mean Square Error): Overall measurement disagreement
-- **R² Score**: Correlation strength between annotators
-- **Average Bias**: Systematic over/under-measurement tendencies
-
-**Output:**
-- `InterAnnotatorAgreement.pdf`: Three-panel scatter plot
-- Console report with detailed metrics
+Quantifies measurement consistency between human annotators using three pairwise comparisons. Computes RMSE (measurement disagreement), R² (correlation strength), and average bias (systematic tendencies). Generates `InterAnnotatorAgreement.pdf` with scatter plots and console metrics report.
 
 ### 7. **Human vs. Automated System Validation**
 
 **Script:** `calipers_vs_toras.py`
 
-Evaluates TORAS measurement annotations performance against human expert measurements using calipers (gold standard).
-
-**Comparisons:**
-- Annotator A vs. Automated System
-- Annotator B vs. Automated System
-- Annotator C vs. Automated System
-- Average Human vs. Automated System
-
-**Metrics:**
-- RMSE, R², Average Bias (same as inter-annotator analysis)
-
-**Output:**
-- `CalipersVsToras.pdf`: Comparison plots
-- Quantitative performance metrics
+Validates automated TORAS measurements against human caliper measurements (gold standard). Compares three annotators individually and averaged against the automated system using RMSE, R², and bias metrics. Generates `CalipersVsToras.pdf` with comparison plots.
 
 
 ### 8. **NEON Data Analysis and Visualization**
 
 **Script:** `Figure6and10.R`
 
-Comprehensive analysis of NEON beetle data from PUUM site (Hawaii) with BeetlePalooza integration.
-
-**Data Sources:**
-- **NEON API**: DP1.10022.001 (Ground beetle sequences DNA barcode)
-- **BeetlePalooza**: Citizen science measurement data
-- Site: PUUM (Pu'u Maka'ala Natural Area Reserve, Hawaii)
-
-**Outputs:**
-- `BeetlePUUM_abundance.png`: Species abundance with imaging status (Not Imaged vs. Imaged)
-- Merged dataset combining NEON taxonomic data with BeetlePalooza measurements
+Analyzes NEON beetle data from PUUM site (Pu'u Maka'ala Natural Area Reserve, Hawaii) integrated with BeetlePalooza citizen science measurements. Retrieves data via NEON API, merges taxonomic identifications with morphometric measurements, and generates species abundance visualizations. Produces `BeetlePUUM_abundance.png` showing imaging status and merged analysis dataset.
 
-**R Libraries:**
-- `ggplot2`: Data visualization
-- `dplyr`: Data manipulation
-- `ggpubr`: Publication-ready themes
-- `neonUtilities`: NEON API interface
+**Requirements:** R packages: `ggplot2`, `dplyr`, `ggpubr`, `neonUtilities`
 
 ---
 
 ## 🛠️ Installation
 
 ### Prerequisites
 
-- **Python 3.8+** (for Python scripts and notebooks)
+- **Python 3.10+** (for Python scripts and notebooks)
 - **R 4.0+** (for R scripts)
 - **Git** (for version control)
 - **CUDA-capable GPU** (recommended for Grounding DINO, but not required)
@@ -290,139 +219,56 @@ For R script (`Figure6and10.R`):
 Extract individual beetles from group images using CVAT annotations:
 
 ```bash
-python 2018_neon_beetles_get_individual_images.py \
-    --xml_file 2018_neon_beetles_bbox.xml \
+python scripts/2018_neon_beetles_get_individual_images.py \
+    --xml_file annotations/2018_neon_beetles_bbox.xml \
     --images_dir /path/to/group_images/ \
     --output_dir /path/to/individual_beetles/ \
+    --padding 0
 ```
 
-**Parameters:**
-- `--xml_file`: Path to CVAT XML annotation file
-- `--images_dir`: Directory containing original group images
-- `--output_dir`: Output directory for cropped beetle images
-- `--padding`: (OPTIONAL) Additional pixels around bounding box (default: 0)
-
-**Output:**
-- Individual beetle images named: `{original_name}_specimen_{N}.png`
+Outputs individual beetle images named `{original_name}_specimen_{N}.png`.
 
 ### 2. Zero-Shot Object Detection
 
-Run `scripts/beetle_detection.py` (or `notebook grounding_dino.ipynb`) for automated beetle detection.
+Run automated beetle detection:
 
-```console
+```bash
 python scripts/beetle_detection.py \
   --csv_path data/metadata.csv \
   --image_dir data/group_images \
   --save_folder data/individual_images \
   --output_csv data/processed.csv
 ```
 
-**Key Configuration Variables** (as in notebook):
-
-```python
-# Data paths
-df_bm = pd.read_csv("BeetleMeasurements_updated_merged_uniqueBeetles.csv")
-image_dir = "/path/to/resized_images/"
-outdir = "/path/to/individual_images/"
-
-# Model parameters
-model_id = "IDEA-Research/grounding-dino-base"
-text = "a beetle."
-box_threshold = 0.2
-text_threshold = 0.2
-iou_threshold = 0.6
-padding = 0.1
-```
+Optional parameters include `--model_id`, `--text` (detection prompt), `--box_threshold`, `--text_threshold`, `--iou_threshold`, and `--padding`. See Pipeline Components section for parameter details.
 
 ### 3. Quality Control and Validation
 
 #### Inter-Annotator Agreement
 
 ```bash
-python inter_annotator.py
-```
-
-**Configuration** (edit in script):
-```python
-DATA_PATH = "data/traits.csv"
-OUTPUT_FIG = "InterAnnotatorAgreement.pdf"
-
-ANNOTATOR_PAIRS = [
-    ('AnnotatorA_length', 'AnnotatorB_length', 'Title', 'Label A', 'Label B'),
-    # ... add more pairs
-]
-
-LIM_MIN, LIM_MAX = 0.15, 0.65  # Axis limits for consistency
+python scripts/inter_annotator.py
 ```
 
-**Output:**
-```
-📊 === Inter-Annotator Agreement Metrics ===
-Annotator A vs Annotator B:
-   RMSE       = 0.0234
-   R² Score   = 0.9567
-   Avg. Bias  = -0.0012
-
-📈 === Average Across All Annotator Pairs ===
-   RMSE (mean)  = 0.0245
-   R² (mean)    = 0.9523
-   Bias (mean)  = -0.0008
-```
+Edit `DATA_PATH` and `ANNOTATOR_PAIRS` in the script to configure input data and comparisons. Outputs `InterAnnotatorAgreement.pdf` and console metrics.
 
 #### Human vs. Automated System
 
 ```bash
-python calipers_vs_toras.py
-```
-
-**Configuration** (edit in script):
-```python
-DATA_PATH = "data/traits.csv"
-OUTPUT_FIG = "CalipersVsToras.pdf"
-
-ANNOTATOR_PAIRS = [
-    ('AnnotatorA_length', 'System_length', 'Title', 'Annotator A'),
-    # ... add more pairs
-]
+python scripts/calipers_vs_toras.py
 ```
 
-**Output:**
-- PDF figure with scatter plots
-- Metrics comparing each annotator to automated system
-- Average human vs. system metrics
+Edit configuration variables in the script for data paths and comparison pairs. Generates `CalipersVsToras.pdf` with validation metrics.
 
 ### 4. Data Visualization
 
 Run R script for NEON data analysis:
 
 ```bash
-Rscript Figure6and10.R
-```
-
-**Configuration** (edit in script):
-```r
-# Set working directory
-setwd("/path/to/project/")
-
-# NEON configuration
-Beetle_dpID <- "DP1.10022.001"
-NEON_TOKEN <- read.delim("NEON_Token.txt", header = FALSE)[1, 1]
-
-# BeetlePalooza data
-meta_Plooza <- read.csv("./BeetlePalooza_Data/individual_metadata.csv")
+Rscript scripts/Figure6and10.R
 ```
 
-**Workflow:**
-1. Load NEON data via API for PUUM site
-2. Filter and merge parataxonomist/expert identifications
-3. Load BeetlePalooza metadata
-4. Merge datasets by specimen ID
-5. Create species abundance plots with imaging status
-6. Save publication-ready figures
-
-**Output:**
-- `BeetlePUUM_abundance.png`: Species distribution bar chart
-- Merged dataset with taxonomic and measurement data
+Requires NEON API token saved in `NEON_Token.txt` and BeetlePalooza metadata. Edit paths in script as needed. Produces `BeetlePUUM_abundance.png` showing species distributions.
 
 ---
 
@@ -435,24 +281,18 @@ The processed datasets from this pipeline are available on Hugging Face:
 #### 1. Hawaii Beetles Dataset
 **Repository:** [imageomics/Hawaii-beetles](https://huggingface.co/datasets/imageomics/Hawaii-beetles)
 
-- Group beetle images from PUUM site
-- CVAT bounding box annotations
-- Individual beetle crops
-- Taxonomic identifications
-- Collection metadata
+PUUM site beetle specimens including group images, individual crops, taxonomic identifications, and collection metadata.
 
 #### 2. 2018 NEON Ethanol-preserved Ground Beetles Dataset
 **Repository:** [imageomics/2018-NEON-beetles](https://huggingface.co/datasets/imageomics/2018-NEON-beetles)
 
-Contains NEON beetle data from 2018 including:
-
-Contains BeetlePalooza citizen science data including:
-- Individual beetle images (cropped and processed)
+Contains 2018 NEON beetle specimens with BeetlePalooza citizen science annotations:
+- Individual beetle images (cropped from group images)
 - Morphometric measurements (elytra length and width)
 - Measurement coordinates with scale bar calibration
-- Specimen metadata (genus, species, collection information)
-- Site environmental data
-- User annotations from multiple annotators
+- Specimen metadata (genus, species, collection site)
+- User annotations from multiple citizen scientists
+- Quality-controlled measurement data
 
 
 ### CVAT Annotations
@@ -467,11 +307,11 @@ Manual annotations created using CVAT (Computer Vision Annotation Tool) for 577
 
 ### Citing This Software
 
-If you use this code or methodology, please both this repo and our paper:
+If you use this code or methodology, please cite both this repository and our paper:
 
 ```bibtex
 @software{Rayeed_Carabidae_Beetle_Processing_2025,
-   author = {Rayeed, S M and Khurana, Mridul and East, Alyson and Fluck, Isadora E. and Campolongo, Elizabeth G. and Stevens, Samuel and Taylor, Graham W.},
+   author = {Rayeed, S M and Khurana, Mridul and East, Alyson and Campolongo, Elizabeth G. and Stevens, Samuel and Wu, Jiaman and Taylor, Graham W.},
    license = {MIT},
    month = nov,
    title = {{Carabidae Beetle Processing Pipeline}},
@@ -483,16 +323,6 @@ If you use this code or methodology, please both this repo and our paper:
 
 **Paper:** Coming Soon!
 
-<!--
-```bibtex
-@article{Rayeed_Ground_Beetles_2025,
-   author = {Rayeed, S M and Khurana, Mridul and East, Alyson and Fluck, Isadora E. and Campolongo, Elizabeth G. and Stevens, Samuel and Zarubiieva, Iuliia and Lowe, Scott C. and Denslow, Michael W. and Donoso, Evan D. and Wu, Jiaman and Ramirez, Michelle and Baiser, Benjamin and Stewart, Charles V. and Mabee, Paula and Berger-Wolf, Tanya and Karpatne, Anuj and Lapp, Hilmar and Guralnick, Robert P. and Taylor, Graham W. and Record, Sydne},
-   title = {A continental-scale dataset of ground beetles with high-resolution images and validated morphological trait measurements},
-   year = {2025}
-}
-```
--->
-
 ---
 
 ## Acknowledgments