You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"description": "Pipeline for processing, analyzing, and validating beetle specimen images and morphometric measurements from NEON (National Ecological Observatory Network) beetle specimens (specifically for the <a href=\"https://huggingface.co/datasets/imageomics/2018-NEON-beetles\">2018 NEON Beetles</a> and <a href=\"https://huggingface.co/datasets/imageomics/Hawaii-beetles\">Hawaii Beetles</a> datasets). The project focuses on Carabidae (ground beetles) and implements automated beetle detection and cropping, morphometric trait extraction, inter-annotator agreement analysis, human vs. automated system validation, and species distribution visualization.",
-[5. Quality Control and Validation](#5-quality-control-and-validation)
17
+
-[6. NEON Data Analysis and Visualization](#6-neon-data-analysis-and-visualization)
18
+
-[7. Dataset Upload to Hugging Face](#7-dataset-upload-to-hugging-face)
19
+
-[Installation](#%EF%B8%8F-installation)
18
20
-[Data Sources](#-data-sources)
19
21
-[Citation](#-citation)
20
22
-[Acknowledgements](#acknowledgments)
@@ -62,6 +64,8 @@ carabidae_beetle_processing/
62
64
63
65
## 🔬 Pipeline Components
64
66
67
+
The pipeline and usage instructions are provided below. Please be sure to set up your coding environments appropriately for the needed portion of the pipeline (see [Installation](#%EF%B8%8F-installation) for detailed guidance).
Extracts individual beetle specimens from group images using CVAT XML bounding box annotations. Parses coordinates, crops specimens with optional padding, and saves as numbered PNG files with progress tracking.
88
91
92
+
#### Usage Instructions
93
+
94
+
Extract individual beetles from group images using CVAT annotations:
Outputs individual beetle images named `{original_name}_specimen_{N}.png`.
105
+
89
106
### 3. **Image Resizing with Uniform Scaling**
90
107
91
108
**Script:**`resizing_individual_beetle_images.py`
92
109
93
-
Aligns individual beetle crops with BeetlePalooza's Zooniverse-processed group images by applying uniform scaling factors. This enables accurate transfer of citizen science measurements from resized group images to individual specimens.
110
+
Aligns individual beetle crops with the 2018-NEON-Beetles Zooniverse-processed group images by applying uniform scaling factors. This enables accurate transfer of citizen science measurements from resized group images to individual specimens. Set proper base directories at the top of the script before use.
94
111
95
112
**Workflow:**
96
113
1. Calculate uniform scaling factors (average of x and y) between original and resized group images
97
114
2. Apply scaling to all individual specimen images
98
115
3. Save scaling metadata and processing statistics to JSON
99
116
100
-
### 4. **Dataset Upload to Hugging Face**
101
-
102
-
**Script:**`upload_dataset_to_hf.py`
103
-
104
-
Utility script for uploading processed beetle datasets to Hugging Face Hub for public access and reproducibility.
105
-
106
-
**Usage:**
107
-
```bash
108
-
export HF_TOKEN="your_hugging_face_token"
109
-
110
-
python upload_dataset_to_hf.py \
111
-
--folder_path /path/to/local/images \
112
-
--repo_id imageomics/dataset-name \
113
-
--path_in_repo images \
114
-
--branch main
115
-
```
116
-
117
-
**Parameters:**
118
-
-`--folder_path`: Local directory containing files to upload
119
-
-`--repo_id`: Hugging Face repository identifier (org/repo-name)
120
-
-`--path_in_repo`: Subdirectory within the repository (default: "images")
121
-
-`--repo_type`: Repository type - "dataset" or "model" (default: "dataset")
The pipeline detects beetles using text prompts, filters by adaptive area thresholds, validates measurement points, applies NMS to remove duplicates, and selects optimal bounding boxes before saving crops and metadata.
143
136
144
-
### 6. **Inter-Annotator Agreement**
137
+
### 5. Quality Control and Validation
138
+
139
+
#### Inter-Annotator Agreement
145
140
146
141
**Script:**`inter_annotator.py`
147
142
148
143
Quantifies measurement consistency between human annotators using three pairwise comparisons. Computes RMSE (measurement disagreement), R² (correlation strength), and average bias (systematic tendencies). Generates `InterAnnotatorAgreement.pdf` with scatter plots and console metrics report.
149
144
150
-
### 7. **Human vs. Automated System Validation**
145
+
```bash
146
+
python scripts/inter_annotator.py
147
+
```
148
+
149
+
Edit `DATA_PATH` and `ANNOTATOR_PAIRS` in the script to configure input data and comparisons. Outputs `InterAnnotatorAgreement.pdf` and console metrics.
150
+
151
+
#### Human vs. Automated System
151
152
152
153
**Script:**`calipers_vs_toras.py`
153
154
154
155
Validates automated TORAS measurements against human caliper measurements (gold standard). Compares three annotators individually and averaged against the automated system using RMSE, R², and bias metrics. Generates `CalipersVsToras.pdf` with comparison plots.
155
156
157
+
```bash
158
+
python scripts/calipers_vs_toras.py
159
+
```
160
+
161
+
Edit configuration variables in the script for data paths and comparison pairs. Generates `CalipersVsToras.pdf` with validation metrics.
156
162
157
-
### 8. **NEON Data Analysis and Visualization**
163
+
### 6. **NEON Data Analysis and Visualization**
158
164
159
165
**Script:**`Figure6and10.R`
160
166
161
167
Analyzes NEON beetle data from PUUM site (Pu'u Maka'ala Natural Area Reserve, Hawaii) integrated with BeetlePalooza citizen science measurements. Retrieves data via NEON API, merges taxonomic identifications with morphometric measurements, and generates species abundance visualizations. Produces `BeetlePUUM_abundance.png` showing imaging status and merged analysis dataset.
162
168
169
+
Run R script for NEON data analysis:
170
+
171
+
```bash
172
+
Rscript scripts/Figure6and10.R
173
+
```
174
+
175
+
Requires NEON API token saved in `NEON_Token.txt` (see [NEON token instructions](#neon-api-token)) and BeetlePalooza metadata (2018-NEON-Beetles `individual_metadata.csv`). Edit paths in script as needed. Produces `BeetlePUUM_abundance.png` showing species distributions.
176
+
163
177
**Requirements:** R packages: `ggplot2`, `dplyr`, `ggpubr`, `neonUtilities`
164
178
179
+
### 7. **Dataset Upload to Hugging Face**
180
+
181
+
**Script:**`upload_dataset_to_hf.py`
182
+
183
+
Utility script used to upload the processed beetle datasets to Hugging Face Hub for public access and reproducibility.
184
+
185
+
**Usage:**
186
+
```bash
187
+
export HF_TOKEN="your_hugging_face_token"
188
+
189
+
python upload_dataset_to_hf.py \
190
+
--folder_path /path/to/local/images \
191
+
--repo_id imageomics/dataset-name \
192
+
--path_in_repo images \
193
+
--branch main
194
+
```
195
+
196
+
**Parameters:**
197
+
-`--folder_path`: Local directory containing files to upload
198
+
-`--repo_id`: Hugging Face repository identifier (org/repo-name)
199
+
-`--path_in_repo`: Subdirectory within the repository (default: "images")
200
+
-`--repo_type`: Repository type - "dataset" or "model" (default: "dataset")
201
+
-`--branch`: Target branch name (default: "main")
202
+
165
203
---
166
204
167
205
## 🛠️ Installation
168
206
169
207
### Prerequisites
170
208
171
-
-**Python 3.10+** (for Python scripts and notebooks)
172
-
-**R 4.0+** (for R scripts)
173
-
-**Git** (for version control)
209
+
-**Python 3.10+**
210
+
-**R 4.0+**
174
211
-**CUDA-capable GPU** (recommended for Grounding DINO, but not required)
Outputs individual beetle images named `{original_name}_specimen_{N}.png`.
230
-
231
-
### 2. Zero-Shot Object Detection
232
-
233
-
Run automated beetle detection:
234
-
235
-
```bash
236
-
python scripts/beetle_detection.py \
237
-
--csv_path data/metadata.csv \
238
-
--image_dir data/group_images \
239
-
--save_folder data/individual_images \
240
-
--output_csv data/processed.csv
241
-
```
242
-
243
-
Optional parameters include `--model_id`, `--text` (detection prompt), `--box_threshold`, `--text_threshold`, `--iou_threshold`, and `--padding`. See Pipeline Components section for parameter details.
244
-
245
-
### 3. Quality Control and Validation
246
-
247
-
#### Inter-Annotator Agreement
248
-
249
-
```bash
250
-
python scripts/inter_annotator.py
251
-
```
252
-
253
-
Edit `DATA_PATH` and `ANNOTATOR_PAIRS` in the script to configure input data and comparisons. Outputs `InterAnnotatorAgreement.pdf` and console metrics.
254
-
255
-
#### Human vs. Automated System
256
-
257
-
```bash
258
-
python scripts/calipers_vs_toras.py
259
-
```
260
-
261
-
Edit configuration variables in the script for data paths and comparison pairs. Generates `CalipersVsToras.pdf` with validation metrics.
262
-
263
-
### 4. Data Visualization
264
-
265
-
Run R script for NEON data analysis:
266
-
267
-
```bash
268
-
Rscript scripts/Figure6and10.R
269
-
```
270
-
271
-
Requires NEON API token saved in `NEON_Token.txt` and BeetlePalooza metadata. Edit paths in script as needed. Produces `BeetlePUUM_abundance.png` showing species distributions.
272
-
273
-
---
274
-
275
252
## 📊 Data Sources
276
253
277
254
### Hugging Face Datasets (Primary Access Point)
278
255
279
-
The processed datasets from this pipeline are available on Hugging Face:
256
+
The processed datasets from this pipeline are available on Hugging Face along with the original data:
0 commit comments