You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Extracts individual beetle specimens from annotated group images using CVAT XML annotations.
88
-
89
-
**Features:**
90
-
- Parses CVAT XML format
91
-
- Extracts bounding box coordinates
92
-
- Crops individual specimens with optional padding
93
-
- Saves as separate PNG files with specimen numbering
94
-
- Progress tracking with tqdm
95
-
96
-
**Key Functions:**
97
-
-`parse_cvat_annotations(xml_path)`: Parse CVAT XML and extract image metadata
98
-
-`crop_and_save_images(images_data, images_dir, output_dir, padding)`: Crop and save specimens
87
+
Extracts individual beetle specimens from group images using CVAT XML bounding box annotations. Parses coordinates, crops specimens with optional padding, and saves as numbered PNG files with progress tracking.
99
88
100
89
### 3. **Image Resizing with Uniform Scaling**
101
90
102
91
**Script:**`resizing_individual_beetle_images.py`
103
92
104
-
Resizes individual beetle specimen images to match BeetlePalooza's resized group images using uniform scaling factors. <<<< Is this for the 2018 NEON beetles to get measurements based on Zooniverse size???
105
-
106
-
**Purpose:**
107
-
- Aligns individual specimen images with the resolution of BeetlePalooza's processed group images
108
-
- Ensures morphometric measurements made on resized images can be accurately applied to individual specimens
109
-
- Uses uniform scaling (average of x and y scale factors) for consistency
93
+
Aligns individual beetle crops with BeetlePalooza's Zooniverse-processed group images by applying uniform scaling factors. This enables accurate transfer of citizen science measurements from resized group images to individual specimens.
110
94
111
95
**Workflow:**
112
-
1. Calculate uniform scaling factors between original and BeetlePalooza resized group images
113
-
2. Save scaling factors to JSON for reference and reproducibility
114
-
3. Apply uniform scaling to all individual specimen images
115
-
4. Generate processing summary with statistics
96
+
1. Calculate uniform scaling factors (average of x and y) between original and resized group images
97
+
2. Apply scaling to all individual specimen images
98
+
3. Save scaling metadata and processing statistics to JSON
Advanced pipeline using **Grounding DINO** for automated beetle detection and segmentation. `scripts/beetle_detection.py` is this notebook converted to a runnable script. An example minimal run (passing only required parameters) is provided below:
128
+
Automated beetle detection pipeline using **Grounding DINO** zero-shot object detection. The script version provides a command-line interface for the notebook workflow.
- Apply Non-Maximum Suppression (NMS) to remove duplicates
171
-
- Select best bounding box (largest area with highest confidence)
172
-
4. Save individual beetle images and CSV metadata
142
+
The pipeline detects beetles using text prompts, filters by adaptive area thresholds, validates measurement points, applies NMS to remove duplicates, and selects optimal bounding boxes before saving crops and metadata.
173
143
174
144
### 6. **Inter-Annotator Agreement**
175
145
176
146
**Script:**`inter_annotator.py`
177
147
178
-
Quantifies measurement consistency between multiple human annotators for continuous morphometric traits.
179
-
180
-
**Analysis:**
181
-
- Compares three annotator pairs:
182
-
- Annotator A vs. Annotator B
183
-
- Annotator B vs. Annotator C
184
-
- Annotator C vs. Annotator A
185
-
186
-
**Metrics Computed:**
187
-
-**RMSE** (Root Mean Square Error): Overall measurement disagreement
188
-
-**R² Score**: Correlation strength between annotators
Quantifies measurement consistency between human annotators using three pairwise comparisons. Computes RMSE (measurement disagreement), R² (correlation strength), and average bias (systematic tendencies). Generates `InterAnnotatorAgreement.pdf` with scatter plots and console metrics report.
194
149
195
150
### 7. **Human vs. Automated System Validation**
196
151
197
152
**Script:**`calipers_vs_toras.py`
198
153
199
-
Evaluates TORAS measurement annotations performance against human expert measurements using calipers (gold standard).
200
-
201
-
**Comparisons:**
202
-
- Annotator A vs. Automated System
203
-
- Annotator B vs. Automated System
204
-
- Annotator C vs. Automated System
205
-
- Average Human vs. Automated System
206
-
207
-
**Metrics:**
208
-
- RMSE, R², Average Bias (same as inter-annotator analysis)
209
-
210
-
**Output:**
211
-
-`CalipersVsToras.pdf`: Comparison plots
212
-
- Quantitative performance metrics
154
+
Validates automated TORAS measurements against human caliper measurements (gold standard). Compares three annotators individually and averaged against the automated system using RMSE, R², and bias metrics. Generates `CalipersVsToras.pdf` with comparison plots.
213
155
214
156
215
157
### 8. **NEON Data Analysis and Visualization**
216
158
217
159
**Script:**`Figure6and10.R`
218
160
219
-
Comprehensive analysis of NEON beetle data from PUUM site (Hawaii) with BeetlePalooza integration.
220
-
221
-
**Data Sources:**
222
-
-**NEON API**: DP1.10022.001 (Ground beetle sequences DNA barcode)
223
-
-**BeetlePalooza**: Citizen science measurement data
224
-
- Site: PUUM (Pu'u Maka'ala Natural Area Reserve, Hawaii)
225
-
226
-
**Outputs:**
227
-
-`BeetlePUUM_abundance.png`: Species abundance with imaging status (Not Imaged vs. Imaged)
228
-
- Merged dataset combining NEON taxonomic data with BeetlePalooza measurements
161
+
Analyzes NEON beetle data from PUUM site (Pu'u Maka'ala Natural Area Reserve, Hawaii) integrated with BeetlePalooza citizen science measurements. Retrieves data via NEON API, merges taxonomic identifications with morphometric measurements, and generates species abundance visualizations. Produces `BeetlePUUM_abundance.png` showing imaging status and merged analysis dataset.
229
162
230
-
**R Libraries:**
231
-
-`ggplot2`: Data visualization
232
-
-`dplyr`: Data manipulation
233
-
-`ggpubr`: Publication-ready themes
234
-
-`neonUtilities`: NEON API interface
163
+
**Requirements:** R packages: `ggplot2`, `dplyr`, `ggpubr`, `neonUtilities`
235
164
236
165
---
237
166
238
167
## 🛠️ Installation
239
168
240
169
### Prerequisites
241
170
242
-
-**Python 3.8+** (for Python scripts and notebooks)
171
+
-**Python 3.10+** (for Python scripts and notebooks)
243
172
-**R 4.0+** (for R scripts)
244
173
-**Git** (for version control)
245
174
-**CUDA-capable GPU** (recommended for Grounding DINO, but not required)
@@ -290,139 +219,56 @@ For R script (`Figure6and10.R`):
290
219
Extract individual beetles from group images using CVAT annotations:
Optional parameters include `--model_id`, `--text` (detection prompt), `--box_threshold`, `--text_threshold`, `--iou_threshold`, and `--padding`. See Pipeline Components section for parameter details.
LIM_MIN, LIM_MAX=0.15, 0.65# Axis limits for consistency
250
+
python scripts/inter_annotator.py
356
251
```
357
252
358
-
**Output:**
359
-
```
360
-
📊 === Inter-Annotator Agreement Metrics ===
361
-
Annotator A vs Annotator B:
362
-
RMSE = 0.0234
363
-
R² Score = 0.9567
364
-
Avg. Bias = -0.0012
365
-
366
-
📈 === Average Across All Annotator Pairs ===
367
-
RMSE (mean) = 0.0245
368
-
R² (mean) = 0.9523
369
-
Bias (mean) = -0.0008
370
-
```
253
+
Edit `DATA_PATH` and `ANNOTATOR_PAIRS` in the script to configure input data and comparisons. Outputs `InterAnnotatorAgreement.pdf` and console metrics.
2. Filter and merge parataxonomist/expert identifications
418
-
3. Load BeetlePalooza metadata
419
-
4. Merge datasets by specimen ID
420
-
5. Create species abundance plots with imaging status
421
-
6. Save publication-ready figures
422
-
423
-
**Output:**
424
-
-`BeetlePUUM_abundance.png`: Species distribution bar chart
425
-
- Merged dataset with taxonomic and measurement data
271
+
Requires NEON API token saved in `NEON_Token.txt` and BeetlePalooza metadata. Edit paths in script as needed. Produces `BeetlePUUM_abundance.png` showing species distributions.
426
272
427
273
---
428
274
@@ -435,24 +281,18 @@ The processed datasets from this pipeline are available on Hugging Face:
author = {Rayeed, S M and Khurana, Mridul and East, Alyson and Fluck, Isadora E. and Campolongo, Elizabeth G. and Stevens, Samuel and Taylor, Graham W.},
314
+
author = {Rayeed, S M and Khurana, Mridul and East, Alyson and Campolongo, Elizabeth G. and Stevens, Samuel and Wu, Jiaman and Taylor, Graham W.},
475
315
license = {MIT},
476
316
month = nov,
477
317
title = {{Carabidae Beetle Processing Pipeline}},
@@ -483,16 +323,6 @@ If you use this code or methodology, please both this repo and our paper:
483
323
484
324
**Paper:** Coming Soon!
485
325
486
-
<!--
487
-
```bibtex
488
-
@article{Rayeed_Ground_Beetles_2025,
489
-
author = {Rayeed, S M and Khurana, Mridul and East, Alyson and Fluck, Isadora E. and Campolongo, Elizabeth G. and Stevens, Samuel and Zarubiieva, Iuliia and Lowe, Scott C. and Denslow, Michael W. and Donoso, Evan D. and Wu, Jiaman and Ramirez, Michelle and Baiser, Benjamin and Stewart, Charles V. and Mabee, Paula and Berger-Wolf, Tanya and Karpatne, Anuj and Lapp, Hilmar and Guralnick, Robert P. and Taylor, Graham W. and Record, Sydne},
490
-
title = {A continental-scale dataset of ground beetles with high-resolution images and validated morphological trait measurements},
0 commit comments