Skip to content

Commit 9b89c64

Browse files
committed
Updating readme and reducing verbose script explanations
1 parent bb995ee commit 9b89c64

File tree

2 files changed

+40
-210
lines changed

2 files changed

+40
-210
lines changed

CITATION.cff

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,15 @@ authors:
1616
given-names: "Alyson"
1717
1818
affiliation: "The University of Maine"
19-
- family-names: "Fluck"
20-
given-names: "Isadora E."
21-
affiliation: "University of Florida"
2219
- family-names: "Campolongo"
2320
given-names: "Elizabeth G."
2421
affiliation: "The Ohio State University"
2522
- family-names: "Stevens"
2623
given-names: "Samuel"
2724
affiliation: "The Ohio State University"
25+
- family-names: "Wu"
26+
given-names: "Jiaman"
27+
affiliation: "The Ohio State University"
2828
- family-names: "Taylor"
2929
given-names: "Graham W."
3030
affiliation: "University of Guelph"

README.md

Lines changed: 37 additions & 207 deletions
Original file line numberDiff line numberDiff line change
@@ -84,35 +84,18 @@ CVAT (Computer Vision Annotation Tool) annotations containing:
8484

8585
**Script:** `2018_neon_beetles_get_individual_images.py`
8686

87-
Extracts individual beetle specimens from annotated group images using CVAT XML annotations.
88-
89-
**Features:**
90-
- Parses CVAT XML format
91-
- Extracts bounding box coordinates
92-
- Crops individual specimens with optional padding
93-
- Saves as separate PNG files with specimen numbering
94-
- Progress tracking with tqdm
95-
96-
**Key Functions:**
97-
- `parse_cvat_annotations(xml_path)`: Parse CVAT XML and extract image metadata
98-
- `crop_and_save_images(images_data, images_dir, output_dir, padding)`: Crop and save specimens
87+
Extracts individual beetle specimens from group images using CVAT XML bounding box annotations. Parses coordinates, crops specimens with optional padding, and saves as numbered PNG files with progress tracking.
9988

10089
### 3. **Image Resizing with Uniform Scaling**
10190

10291
**Script:** `resizing_individual_beetle_images.py`
10392

104-
Resizes individual beetle specimen images to match BeetlePalooza's resized group images using uniform scaling factors. <<<< Is this for the 2018 NEON beetles to get measurements based on Zooniverse size???
105-
106-
**Purpose:**
107-
- Aligns individual specimen images with the resolution of BeetlePalooza's processed group images
108-
- Ensures morphometric measurements made on resized images can be accurately applied to individual specimens
109-
- Uses uniform scaling (average of x and y scale factors) for consistency
93+
Aligns individual beetle crops with BeetlePalooza's Zooniverse-processed group images by applying uniform scaling factors. This enables accurate transfer of citizen science measurements from resized group images to individual specimens.
11094

11195
**Workflow:**
112-
1. Calculate uniform scaling factors between original and BeetlePalooza resized group images
113-
2. Save scaling factors to JSON for reference and reproducibility
114-
3. Apply uniform scaling to all individual specimen images
115-
4. Generate processing summary with statistics
96+
1. Calculate uniform scaling factors (average of x and y) between original and resized group images
97+
2. Apply scaling to all individual specimen images
98+
3. Save scaling metadata and processing statistics to JSON
11699

117100
### 4. **Dataset Upload to Hugging Face**
118101

@@ -140,9 +123,11 @@ python upload_dataset_to_hf.py \
140123

141124
### 5. **Zero-Shot Object Detection**
142125

143-
**Notebook:** `grounding_dino.ipynb`
126+
**Script:** `beetle_detection.py` | **Notebook:** `grounding_dino.ipynb`
144127

145-
Advanced pipeline using **Grounding DINO** for automated beetle detection and segmentation. `scripts/beetle_detection.py` is this notebook converted to a runnable script. An example minimal run (passing only required parameters) is provided below:
128+
Automated beetle detection pipeline using **Grounding DINO** zero-shot object detection. The script version provides a command-line interface for the notebook workflow.
129+
130+
**Basic Usage:**
146131

147132
```console
148133
python scripts/beetle_detection.py \
@@ -152,94 +137,38 @@ python scripts/beetle_detection.py \
152137
--output_csv data/processed.csv
153138
```
154139

155-
Additional optional parameters that can be passed are as follows:
156-
- `model_id`: Model ID for Grounding-DINO, default is `IDEA-Research/grounding-dino-base`.
157-
- `text`: Text prompt for detection, default is `"a beetle."`.
158-
- `box_threshold`: Box threshold for detection, default is `0.2`.
159-
- `text_threshold`: Text threshold for detection,default is `0.2`.
160-
- `padding`: Padding factor for cropping, default is `0.1`.
161-
- `iou_threshold`: IoU threshold for Non-Maximum Suppression (NMS), default is `0.6`.
140+
Optional parameters: `--model_id` (default: `IDEA-Research/grounding-dino-base`), `--text` (prompt, default: `"a beetle."`), `--box_threshold` (0.2), `--text_threshold` (0.2), `--padding` (0.1), `--iou_threshold` (0.6).
162141

163-
**Workflow:**
164-
1. Load beetle measurements from the [2018 NEON Ethanol-preserved Ground Beetles dataset](https://huggingface.co/datasets/imageomics/2018-NEON-beetles)
165-
2. Initialize Grounding DINO model
166-
3. For each image:
167-
- Detect beetles using text prompt ("a beetle")
168-
- Filter detections based on adaptive area thresholds
169-
- Verify detections contain elytra measurement points
170-
- Apply Non-Maximum Suppression (NMS) to remove duplicates
171-
- Select best bounding box (largest area with highest confidence)
172-
4. Save individual beetle images and CSV metadata
142+
The pipeline detects beetles using text prompts, filters by adaptive area thresholds, validates measurement points, applies NMS to remove duplicates, and selects optimal bounding boxes before saving crops and metadata.
173143

174144
### 6. **Inter-Annotator Agreement**
175145

176146
**Script:** `inter_annotator.py`
177147

178-
Quantifies measurement consistency between multiple human annotators for continuous morphometric traits.
179-
180-
**Analysis:**
181-
- Compares three annotator pairs:
182-
- Annotator A vs. Annotator B
183-
- Annotator B vs. Annotator C
184-
- Annotator C vs. Annotator A
185-
186-
**Metrics Computed:**
187-
- **RMSE** (Root Mean Square Error): Overall measurement disagreement
188-
- **R² Score**: Correlation strength between annotators
189-
- **Average Bias**: Systematic over/under-measurement tendencies
190-
191-
**Output:**
192-
- `InterAnnotatorAgreement.pdf`: Three-panel scatter plot
193-
- Console report with detailed metrics
148+
Quantifies measurement consistency between human annotators using three pairwise comparisons. Computes RMSE (measurement disagreement), R² (correlation strength), and average bias (systematic tendencies). Generates `InterAnnotatorAgreement.pdf` with scatter plots and console metrics report.
194149

195150
### 7. **Human vs. Automated System Validation**
196151

197152
**Script:** `calipers_vs_toras.py`
198153

199-
Evaluates TORAS measurement annotations performance against human expert measurements using calipers (gold standard).
200-
201-
**Comparisons:**
202-
- Annotator A vs. Automated System
203-
- Annotator B vs. Automated System
204-
- Annotator C vs. Automated System
205-
- Average Human vs. Automated System
206-
207-
**Metrics:**
208-
- RMSE, R², Average Bias (same as inter-annotator analysis)
209-
210-
**Output:**
211-
- `CalipersVsToras.pdf`: Comparison plots
212-
- Quantitative performance metrics
154+
Validates automated TORAS measurements against human caliper measurements (gold standard). Compares three annotators individually and averaged against the automated system using RMSE, R², and bias metrics. Generates `CalipersVsToras.pdf` with comparison plots.
213155

214156

215157
### 8. **NEON Data Analysis and Visualization**
216158

217159
**Script:** `Figure6and10.R`
218160

219-
Comprehensive analysis of NEON beetle data from PUUM site (Hawaii) with BeetlePalooza integration.
220-
221-
**Data Sources:**
222-
- **NEON API**: DP1.10022.001 (Ground beetle sequences DNA barcode)
223-
- **BeetlePalooza**: Citizen science measurement data
224-
- Site: PUUM (Pu'u Maka'ala Natural Area Reserve, Hawaii)
225-
226-
**Outputs:**
227-
- `BeetlePUUM_abundance.png`: Species abundance with imaging status (Not Imaged vs. Imaged)
228-
- Merged dataset combining NEON taxonomic data with BeetlePalooza measurements
161+
Analyzes NEON beetle data from PUUM site (Pu'u Maka'ala Natural Area Reserve, Hawaii) integrated with BeetlePalooza citizen science measurements. Retrieves data via NEON API, merges taxonomic identifications with morphometric measurements, and generates species abundance visualizations. Produces `BeetlePUUM_abundance.png` showing imaging status and merged analysis dataset.
229162

230-
**R Libraries:**
231-
- `ggplot2`: Data visualization
232-
- `dplyr`: Data manipulation
233-
- `ggpubr`: Publication-ready themes
234-
- `neonUtilities`: NEON API interface
163+
**Requirements:** R packages: `ggplot2`, `dplyr`, `ggpubr`, `neonUtilities`
235164

236165
---
237166

238167
## 🛠️ Installation
239168

240169
### Prerequisites
241170

242-
- **Python 3.8+** (for Python scripts and notebooks)
171+
- **Python 3.10+** (for Python scripts and notebooks)
243172
- **R 4.0+** (for R scripts)
244173
- **Git** (for version control)
245174
- **CUDA-capable GPU** (recommended for Grounding DINO, but not required)
@@ -290,139 +219,56 @@ For R script (`Figure6and10.R`):
290219
Extract individual beetles from group images using CVAT annotations:
291220

292221
```bash
293-
python 2018_neon_beetles_get_individual_images.py \
294-
--xml_file 2018_neon_beetles_bbox.xml \
222+
python scripts/2018_neon_beetles_get_individual_images.py \
223+
--xml_file annotations/2018_neon_beetles_bbox.xml \
295224
--images_dir /path/to/group_images/ \
296225
--output_dir /path/to/individual_beetles/ \
226+
--padding 0
297227
```
298228

299-
**Parameters:**
300-
- `--xml_file`: Path to CVAT XML annotation file
301-
- `--images_dir`: Directory containing original group images
302-
- `--output_dir`: Output directory for cropped beetle images
303-
- `--padding`: (OPTIONAL) Additional pixels around bounding box (default: 0)
304-
305-
**Output:**
306-
- Individual beetle images named: `{original_name}_specimen_{N}.png`
229+
Outputs individual beetle images named `{original_name}_specimen_{N}.png`.
307230

308231
### 2. Zero-Shot Object Detection
309232

310-
Run `scripts/beetle_detection.py` (or `notebook grounding_dino.ipynb`) for automated beetle detection.
233+
Run automated beetle detection:
311234

312-
```console
235+
```bash
313236
python scripts/beetle_detection.py \
314237
--csv_path data/metadata.csv \
315238
--image_dir data/group_images \
316239
--save_folder data/individual_images \
317240
--output_csv data/processed.csv
318241
```
319242

320-
**Key Configuration Variables** (as in notebook):
321-
322-
```python
323-
# Data paths
324-
df_bm = pd.read_csv("BeetleMeasurements_updated_merged_uniqueBeetles.csv")
325-
image_dir = "/path/to/resized_images/"
326-
outdir = "/path/to/individual_images/"
327-
328-
# Model parameters
329-
model_id = "IDEA-Research/grounding-dino-base"
330-
text = "a beetle."
331-
box_threshold = 0.2
332-
text_threshold = 0.2
333-
iou_threshold = 0.6
334-
padding = 0.1
335-
```
243+
Optional parameters include `--model_id`, `--text` (detection prompt), `--box_threshold`, `--text_threshold`, `--iou_threshold`, and `--padding`. See Pipeline Components section for parameter details.
336244

337245
### 3. Quality Control and Validation
338246

339247
#### Inter-Annotator Agreement
340248

341249
```bash
342-
python inter_annotator.py
343-
```
344-
345-
**Configuration** (edit in script):
346-
```python
347-
DATA_PATH = "data/traits.csv"
348-
OUTPUT_FIG = "InterAnnotatorAgreement.pdf"
349-
350-
ANNOTATOR_PAIRS = [
351-
('AnnotatorA_length', 'AnnotatorB_length', 'Title', 'Label A', 'Label B'),
352-
# ... add more pairs
353-
]
354-
355-
LIM_MIN, LIM_MAX = 0.15, 0.65 # Axis limits for consistency
250+
python scripts/inter_annotator.py
356251
```
357252

358-
**Output:**
359-
```
360-
📊 === Inter-Annotator Agreement Metrics ===
361-
Annotator A vs Annotator B:
362-
RMSE = 0.0234
363-
R² Score = 0.9567
364-
Avg. Bias = -0.0012
365-
366-
📈 === Average Across All Annotator Pairs ===
367-
RMSE (mean) = 0.0245
368-
R² (mean) = 0.9523
369-
Bias (mean) = -0.0008
370-
```
253+
Edit `DATA_PATH` and `ANNOTATOR_PAIRS` in the script to configure input data and comparisons. Outputs `InterAnnotatorAgreement.pdf` and console metrics.
371254

372255
#### Human vs. Automated System
373256

374257
```bash
375-
python calipers_vs_toras.py
376-
```
377-
378-
**Configuration** (edit in script):
379-
```python
380-
DATA_PATH = "data/traits.csv"
381-
OUTPUT_FIG = "CalipersVsToras.pdf"
382-
383-
ANNOTATOR_PAIRS = [
384-
('AnnotatorA_length', 'System_length', 'Title', 'Annotator A'),
385-
# ... add more pairs
386-
]
258+
python scripts/calipers_vs_toras.py
387259
```
388260

389-
**Output:**
390-
- PDF figure with scatter plots
391-
- Metrics comparing each annotator to automated system
392-
- Average human vs. system metrics
261+
Edit configuration variables in the script for data paths and comparison pairs. Generates `CalipersVsToras.pdf` with validation metrics.
393262

394263
### 4. Data Visualization
395264

396265
Run R script for NEON data analysis:
397266

398267
```bash
399-
Rscript Figure6and10.R
400-
```
401-
402-
**Configuration** (edit in script):
403-
```r
404-
# Set working directory
405-
setwd("/path/to/project/")
406-
407-
# NEON configuration
408-
Beetle_dpID <- "DP1.10022.001"
409-
NEON_TOKEN <- read.delim("NEON_Token.txt", header = FALSE)[1, 1]
410-
411-
# BeetlePalooza data
412-
meta_Plooza <- read.csv("./BeetlePalooza_Data/individual_metadata.csv")
268+
Rscript scripts/Figure6and10.R
413269
```
414270

415-
**Workflow:**
416-
1. Load NEON data via API for PUUM site
417-
2. Filter and merge parataxonomist/expert identifications
418-
3. Load BeetlePalooza metadata
419-
4. Merge datasets by specimen ID
420-
5. Create species abundance plots with imaging status
421-
6. Save publication-ready figures
422-
423-
**Output:**
424-
- `BeetlePUUM_abundance.png`: Species distribution bar chart
425-
- Merged dataset with taxonomic and measurement data
271+
Requires NEON API token saved in `NEON_Token.txt` and BeetlePalooza metadata. Edit paths in script as needed. Produces `BeetlePUUM_abundance.png` showing species distributions.
426272

427273
---
428274

@@ -435,24 +281,18 @@ The processed datasets from this pipeline are available on Hugging Face:
435281
#### 1. Hawaii Beetles Dataset
436282
**Repository:** [imageomics/Hawaii-beetles](https://huggingface.co/datasets/imageomics/Hawaii-beetles)
437283

438-
- Group beetle images from PUUM site
439-
- CVAT bounding box annotations
440-
- Individual beetle crops
441-
- Taxonomic identifications
442-
- Collection metadata
284+
PUUM site beetle specimens including group images, individual crops, taxonomic identifications, and collection metadata.
443285

444286
#### 2. 2018 NEON Ethanol-preserved Ground Beetles Dataset
445287
**Repository:** [imageomics/2018-NEON-beetles](https://huggingface.co/datasets/imageomics/2018-NEON-beetles)
446288

447-
Contains NEON beetle data from 2018 including:
448-
449-
Contains BeetlePalooza citizen science data including:
450-
- Individual beetle images (cropped and processed)
289+
Contains 2018 NEON beetle specimens with BeetlePalooza citizen science annotations:
290+
- Individual beetle images (cropped from group images)
451291
- Morphometric measurements (elytra length and width)
452292
- Measurement coordinates with scale bar calibration
453-
- Specimen metadata (genus, species, collection information)
454-
- Site environmental data
455-
- User annotations from multiple annotators
293+
- Specimen metadata (genus, species, collection site)
294+
- User annotations from multiple citizen scientists
295+
- Quality-controlled measurement data
456296

457297

458298
### CVAT Annotations
@@ -467,11 +307,11 @@ Manual annotations created using CVAT (Computer Vision Annotation Tool) for 577
467307

468308
### Citing This Software
469309

470-
If you use this code or methodology, please both this repo and our paper:
310+
If you use this code or methodology, please cite both this repository and our paper:
471311

472312
```bibtex
473313
@software{Rayeed_Carabidae_Beetle_Processing_2025,
474-
author = {Rayeed, S M and Khurana, Mridul and East, Alyson and Fluck, Isadora E. and Campolongo, Elizabeth G. and Stevens, Samuel and Taylor, Graham W.},
314+
author = {Rayeed, S M and Khurana, Mridul and East, Alyson and Campolongo, Elizabeth G. and Stevens, Samuel and Wu, Jiaman and Taylor, Graham W.},
475315
license = {MIT},
476316
month = nov,
477317
title = {{Carabidae Beetle Processing Pipeline}},
@@ -483,16 +323,6 @@ If you use this code or methodology, please both this repo and our paper:
483323

484324
**Paper:** Coming Soon!
485325

486-
<!--
487-
```bibtex
488-
@article{Rayeed_Ground_Beetles_2025,
489-
author = {Rayeed, S M and Khurana, Mridul and East, Alyson and Fluck, Isadora E. and Campolongo, Elizabeth G. and Stevens, Samuel and Zarubiieva, Iuliia and Lowe, Scott C. and Denslow, Michael W. and Donoso, Evan D. and Wu, Jiaman and Ramirez, Michelle and Baiser, Benjamin and Stewart, Charles V. and Mabee, Paula and Berger-Wolf, Tanya and Karpatne, Anuj and Lapp, Hilmar and Guralnick, Robert P. and Taylor, Graham W. and Record, Sydne},
490-
title = {A continental-scale dataset of ground beetles with high-resolution images and validated morphological trait measurements},
491-
year = {2025}
492-
}
493-
```
494-
-->
495-
496326
---
497327

498328
## Acknowledgments

0 commit comments

Comments
 (0)