Skip to content

Commit 2044a01

Browse files
authored
Merge pull request #3 from Imageomics/pre-release
Re-organize README and update citation metadata
2 parents 1700f57 + 530dc85 commit 2044a01

File tree

5 files changed

+189
-120
lines changed

5 files changed

+189
-120
lines changed
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
name: Check zenodo metadata
2+
3+
on:
4+
push:
5+
paths:
6+
- '.zenodo.json'
7+
- '.github/workflows/validate-zenodo.yaml'
8+
9+
jobs:
10+
check-zenodo-metadata:
11+
12+
runs-on: ubuntu-latest
13+
14+
steps:
15+
- uses: actions/checkout@v4
16+
- uses: actions/setup-node@v4
17+
with:
18+
node-version: '22'
19+
- name: Install dependencies
20+
run: npm install [email protected]
21+
- name: Check .zenodo.json file
22+
run: |
23+
npx zenodraft metadata validate .zenodo.json

.zenodo.json

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
{
2+
"creators": [
3+
{
4+
"name": "Rayeed, S M",
5+
"affiliation": "Rensselaer Polytechnic Institute"
6+
},
7+
{
8+
"name": "Khurana, Mridul",
9+
"affiliation": "Virginia Tech"
10+
},
11+
{
12+
"name": "East, Alyson",
13+
"affiliation": "The University of Maine"
14+
},
15+
{
16+
"name": "Campolongo, Elizabeth G.",
17+
"affiliation": "The Ohio State University"
18+
},
19+
{
20+
"name": "Stevens, Samuel",
21+
"affiliation": "The Ohio State University"
22+
},
23+
{
24+
"name": "Wu, Jiaman",
25+
"affiliation": "The Ohio State University"
26+
},
27+
{
28+
"name": "Taylor, Graham W.",
29+
"affiliation": "University of Guelph"
30+
}
31+
],
32+
"description": "Pipeline for processing, analyzing, and validating beetle specimen images and morphometric measurements from NEON (National Ecological Observatory Network) beetle specimens (specifically for the <a href=\"https://huggingface.co/datasets/imageomics/2018-NEON-beetles\">2018 NEON Beetles</a> and <a href=\"https://huggingface.co/datasets/imageomics/Hawaii-beetles\">Hawaii Beetles</a> datasets). The project focuses on Carabidae (ground beetles) and implements automated beetle detection and cropping, morphometric trait extraction, inter-annotator agreement analysis, human vs. automated system validation, and species distribution visualization.",
33+
"keywords": [
34+
"imageomics",
35+
"computer-vision",
36+
"beetles",
37+
"carabidae",
38+
"morphometrics",
39+
"neon",
40+
"grounding-dino",
41+
"zero-shot-detection",
42+
"quality-control",
43+
"biodiversity",
44+
"ecology",
45+
"animals",
46+
"image",
47+
"segmentation",
48+
"species",
49+
"elytra",
50+
"basal pronotum",
51+
"traits",
52+
"annotation",
53+
"measurements",
54+
"pinned specimens",
55+
"Hawaii",
56+
"ground-beetles"
57+
],
58+
"title": "Carabidae Beetle Processing Pipeline",
59+
"version": "1.0.0",
60+
"license": "MIT",
61+
"publication_date": "2025-12-18",
62+
"grants": [
63+
{
64+
"id": "021nxhr62::2118240"
65+
},
66+
{
67+
"id": "021nxhr62::2330423"
68+
}
69+
]
70+
}

CITATION.cff

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@ authors:
1414
affiliation: "Virginia Tech"
1515
- family-names: "East"
1616
given-names: "Alyson"
17-
1817
affiliation: "The University of Maine"
1918
- family-names: "Campolongo"
2019
given-names: "Elizabeth G."
@@ -59,6 +58,6 @@ keywords:
5958
- ground-beetles
6059
license: MIT
6160
version: "1.0.0"
62-
date-released: "2025-11-XX" # Update before release!
61+
date-released: "2025-12-18"
6362
#doi: Add version agnostic DOI on release
6463
type: software

README.md

Lines changed: 79 additions & 102 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,14 @@
99
- [Overview](#-overview)
1010
- [Project Structure](#-project-structure)
1111
- [Pipeline Components](#-pipeline-components)
12-
- [Installation](#-installation)
13-
- [Usage](#-usage)
14-
- [1. Individual Beetle Extraction](#1-individual-beetle-extraction)
15-
- [2. Zero-Shot Object Detection](#2-zero-shot-object-detection)
16-
- [3. Quality Control and Validation](#3-quality-control-and-validation)
17-
- [4. Data Visualization](#4-data-visualization)
12+
- [1. Image Annotation and Extraction](#1-image-annotation-and-extraction)
13+
- [2. Traditional Bouding Box Cropping (Individual Beetle Extraction)](#2-traditional-bounding-box-cropping)
14+
- [3. Image Resizing with Uniform Scaling](#3-image-resizing-with-uniform-scaling)
15+
- [4. Zero-Shot Object Detection](#4-zero-shot-object-detection)
16+
- [5. Quality Control and Validation](#5-quality-control-and-validation)
17+
- [6. NEON Data Analysis and Visualization](#6-neon-data-analysis-and-visualization)
18+
- [7. Dataset Upload to Hugging Face](#7-dataset-upload-to-hugging-face)
19+
- [Installation](#%EF%B8%8F-installation)
1820
- [Data Sources](#-data-sources)
1921
- [Citation](#-citation)
2022
- [Acknowledgements](#acknowledgments)
@@ -62,6 +64,8 @@ carabidae_beetle_processing/
6264

6365
## 🔬 Pipeline Components
6466

67+
The pipeline and usage instructions are provided below. Please be sure to set up your coding environments appropriately for the needed portion of the pipeline (see [Installation](#%EF%B8%8F-installation) for detailed guidance).
68+
6569
### 1. **Image Annotation and Extraction**
6670

6771
**File:** `2018_neon_beetles_bbox.xml`
@@ -70,7 +74,6 @@ CVAT (Computer Vision Annotation Tool) annotations containing:
7074
- 577 annotated images
7175
- Bounding box coordinates for individual beetles in group images
7276
- Image dimensions (5568 × 3712 pixels)
73-
- Created: April 2025
7477

7578
**Format:**
7679
```xml
@@ -86,42 +89,32 @@ CVAT (Computer Vision Annotation Tool) annotations containing:
8689

8790
Extracts individual beetle specimens from group images using CVAT XML bounding box annotations. Parses coordinates, crops specimens with optional padding, and saves as numbered PNG files with progress tracking.
8891

92+
#### Usage Instructions
93+
94+
Extract individual beetles from group images using CVAT annotations:
95+
96+
```bash
97+
python scripts/2018_neon_beetles_get_individual_images.py \
98+
--xml_file annotations/2018_neon_beetles_bbox.xml \
99+
--images_dir /path/to/group_images/ \
100+
--output_dir /path/to/individual_beetles/ \
101+
--padding 0
102+
```
103+
104+
Outputs individual beetle images named `{original_name}_specimen_{N}.png`.
105+
89106
### 3. **Image Resizing with Uniform Scaling**
90107

91108
**Script:** `resizing_individual_beetle_images.py`
92109

93-
Aligns individual beetle crops with BeetlePalooza's Zooniverse-processed group images by applying uniform scaling factors. This enables accurate transfer of citizen science measurements from resized group images to individual specimens.
110+
Aligns individual beetle crops with the 2018-NEON-Beetles Zooniverse-processed group images by applying uniform scaling factors. This enables accurate transfer of citizen science measurements from resized group images to individual specimens. Set proper base directories at the top of the script before use.
94111

95112
**Workflow:**
96113
1. Calculate uniform scaling factors (average of x and y) between original and resized group images
97114
2. Apply scaling to all individual specimen images
98115
3. Save scaling metadata and processing statistics to JSON
99116

100-
### 4. **Dataset Upload to Hugging Face**
101-
102-
**Script:** `upload_dataset_to_hf.py`
103-
104-
Utility script for uploading processed beetle datasets to Hugging Face Hub for public access and reproducibility.
105-
106-
**Usage:**
107-
```bash
108-
export HF_TOKEN="your_hugging_face_token"
109-
110-
python upload_dataset_to_hf.py \
111-
--folder_path /path/to/local/images \
112-
--repo_id imageomics/dataset-name \
113-
--path_in_repo images \
114-
--branch main
115-
```
116-
117-
**Parameters:**
118-
- `--folder_path`: Local directory containing files to upload
119-
- `--repo_id`: Hugging Face repository identifier (org/repo-name)
120-
- `--path_in_repo`: Subdirectory within the repository (default: "images")
121-
- `--repo_type`: Repository type - "dataset" or "model" (default: "dataset")
122-
- `--branch`: Target branch name (default: "main")
123-
124-
### 5. **Zero-Shot Object Detection**
117+
### 4. **Zero-Shot Object Detection**
125118

126119
**Script:** `beetle_detection.py` | **Notebook:** `grounding_dino.ipynb`
127120

@@ -141,43 +134,87 @@ Optional parameters: `--model_id` (default: `IDEA-Research/grounding-dino-base`)
141134

142135
The pipeline detects beetles using text prompts, filters by adaptive area thresholds, validates measurement points, applies NMS to remove duplicates, and selects optimal bounding boxes before saving crops and metadata.
143136

144-
### 6. **Inter-Annotator Agreement**
137+
### 5. Quality Control and Validation
138+
139+
#### Inter-Annotator Agreement
145140

146141
**Script:** `inter_annotator.py`
147142

148143
Quantifies measurement consistency between human annotators using three pairwise comparisons. Computes RMSE (measurement disagreement), R² (correlation strength), and average bias (systematic tendencies). Generates `InterAnnotatorAgreement.pdf` with scatter plots and console metrics report.
149144

150-
### 7. **Human vs. Automated System Validation**
145+
```bash
146+
python scripts/inter_annotator.py
147+
```
148+
149+
Edit `DATA_PATH` and `ANNOTATOR_PAIRS` in the script to configure input data and comparisons. Outputs `InterAnnotatorAgreement.pdf` and console metrics.
150+
151+
#### Human vs. Automated System
151152

152153
**Script:** `calipers_vs_toras.py`
153154

154155
Validates automated TORAS measurements against human caliper measurements (gold standard). Compares three annotators individually and averaged against the automated system using RMSE, R², and bias metrics. Generates `CalipersVsToras.pdf` with comparison plots.
155156

157+
```bash
158+
python scripts/calipers_vs_toras.py
159+
```
160+
161+
Edit configuration variables in the script for data paths and comparison pairs. Generates `CalipersVsToras.pdf` with validation metrics.
156162

157-
### 8. **NEON Data Analysis and Visualization**
163+
### 6. **NEON Data Analysis and Visualization**
158164

159165
**Script:** `Figure6and10.R`
160166

161167
Analyzes NEON beetle data from PUUM site (Pu'u Maka'ala Natural Area Reserve, Hawaii) integrated with BeetlePalooza citizen science measurements. Retrieves data via NEON API, merges taxonomic identifications with morphometric measurements, and generates species abundance visualizations. Produces `BeetlePUUM_abundance.png` showing imaging status and merged analysis dataset.
162168

169+
Run R script for NEON data analysis:
170+
171+
```bash
172+
Rscript scripts/Figure6and10.R
173+
```
174+
175+
Requires NEON API token saved in `NEON_Token.txt` (see [NEON token instructions](#neon-api-token)) and BeetlePalooza metadata (2018-NEON-Beetles `individual_metadata.csv`). Edit paths in script as needed. Produces `BeetlePUUM_abundance.png` showing species distributions.
176+
163177
**Requirements:** R packages: `ggplot2`, `dplyr`, `ggpubr`, `neonUtilities`
164178

179+
### 7. **Dataset Upload to Hugging Face**
180+
181+
**Script:** `upload_dataset_to_hf.py`
182+
183+
Utility script used to upload the processed beetle datasets to Hugging Face Hub for public access and reproducibility.
184+
185+
**Usage:**
186+
```bash
187+
export HF_TOKEN="your_hugging_face_token"
188+
189+
python upload_dataset_to_hf.py \
190+
--folder_path /path/to/local/images \
191+
--repo_id imageomics/dataset-name \
192+
--path_in_repo images \
193+
--branch main
194+
```
195+
196+
**Parameters:**
197+
- `--folder_path`: Local directory containing files to upload
198+
- `--repo_id`: Hugging Face repository identifier (org/repo-name)
199+
- `--path_in_repo`: Subdirectory within the repository (default: "images")
200+
- `--repo_type`: Repository type - "dataset" or "model" (default: "dataset")
201+
- `--branch`: Target branch name (default: "main")
202+
165203
---
166204

167205
## 🛠️ Installation
168206

169207
### Prerequisites
170208

171-
- **Python 3.10+** (for Python scripts and notebooks)
172-
- **R 4.0+** (for R scripts)
173-
- **Git** (for version control)
209+
- **Python 3.10+**
210+
- **R 4.0+**
174211
- **CUDA-capable GPU** (recommended for Grounding DINO, but not required)
175212

176213
### Python Setup
177214

178215
1. **Clone the repository:**
179216
```bash
180-
git clone https://github.com/mridulk97/carabidae_beetle_processing.git
217+
git clone git@github.com:Imageomics/carabidae_beetle_processing.git
181218
cd carabidae_beetle_processing
182219
```
183220

@@ -212,71 +249,11 @@ For R script (`Figure6and10.R`):
212249

213250
---
214251

215-
## 🚀 Usage
216-
217-
### 1. Individual Beetle Extraction
218-
219-
Extract individual beetles from group images using CVAT annotations:
220-
221-
```bash
222-
python scripts/2018_neon_beetles_get_individual_images.py \
223-
--xml_file annotations/2018_neon_beetles_bbox.xml \
224-
--images_dir /path/to/group_images/ \
225-
--output_dir /path/to/individual_beetles/ \
226-
--padding 0
227-
```
228-
229-
Outputs individual beetle images named `{original_name}_specimen_{N}.png`.
230-
231-
### 2. Zero-Shot Object Detection
232-
233-
Run automated beetle detection:
234-
235-
```bash
236-
python scripts/beetle_detection.py \
237-
--csv_path data/metadata.csv \
238-
--image_dir data/group_images \
239-
--save_folder data/individual_images \
240-
--output_csv data/processed.csv
241-
```
242-
243-
Optional parameters include `--model_id`, `--text` (detection prompt), `--box_threshold`, `--text_threshold`, `--iou_threshold`, and `--padding`. See Pipeline Components section for parameter details.
244-
245-
### 3. Quality Control and Validation
246-
247-
#### Inter-Annotator Agreement
248-
249-
```bash
250-
python scripts/inter_annotator.py
251-
```
252-
253-
Edit `DATA_PATH` and `ANNOTATOR_PAIRS` in the script to configure input data and comparisons. Outputs `InterAnnotatorAgreement.pdf` and console metrics.
254-
255-
#### Human vs. Automated System
256-
257-
```bash
258-
python scripts/calipers_vs_toras.py
259-
```
260-
261-
Edit configuration variables in the script for data paths and comparison pairs. Generates `CalipersVsToras.pdf` with validation metrics.
262-
263-
### 4. Data Visualization
264-
265-
Run R script for NEON data analysis:
266-
267-
```bash
268-
Rscript scripts/Figure6and10.R
269-
```
270-
271-
Requires NEON API token saved in `NEON_Token.txt` and BeetlePalooza metadata. Edit paths in script as needed. Produces `BeetlePUUM_abundance.png` showing species distributions.
272-
273-
---
274-
275252
## 📊 Data Sources
276253

277254
### Hugging Face Datasets (Primary Access Point)
278255

279-
The processed datasets from this pipeline are available on Hugging Face:
256+
The processed datasets from this pipeline are available on Hugging Face along with the original data:
280257

281258
#### 1. Hawaii Beetles Dataset
282259
**Repository:** [imageomics/Hawaii-beetles](https://huggingface.co/datasets/imageomics/Hawaii-beetles)
@@ -313,7 +290,7 @@ If you use this code or methodology, please cite both this repository and our pa
313290
@software{Rayeed_Carabidae_Beetle_Processing_2025,
314291
author = {Rayeed, S M and Khurana, Mridul and East, Alyson and Campolongo, Elizabeth G. and Stevens, Samuel and Wu, Jiaman and Taylor, Graham W.},
315292
license = {MIT},
316-
month = nov,
293+
month = dec,
317294
title = {{Carabidae Beetle Processing Pipeline}},
318295
url = {https://github.com/Imageomics/carabidae_beetle_processing},
319296
version = {1.0.0},

0 commit comments

Comments
 (0)