Skip to content

Commit 6dd7e04

Browse files
committed
restructure presentation to remove redundancy
1 parent 7ac4060 commit 6dd7e04

File tree

1 file changed

+77
-100
lines changed

1 file changed

+77
-100
lines changed

README.md

Lines changed: 77 additions & 100 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,14 @@
99
- [Overview](#-overview)
1010
- [Project Structure](#-project-structure)
1111
- [Pipeline Components](#-pipeline-components)
12+
- [1. Image Annotation and Extraction](#1-image-annotation-and-extraction)
13+
- [2. Traditional Bouding Box Cropping (Individual Beetle Extraction)](#2-traditional-bounding-box-cropping)
14+
- [3. Image Resizing with Uniform Scaling](#3-image-resizing-with-uniform-scaling)
15+
- [4. Zero-Shot Object Detection](#4-zero-shot-object-detection)
16+
- [5. Quality Control and Validation](#5-quality-control-and-validation)
17+
- [6. NEON Data Analysis and Visualization](#6-neon-data-analysis-and-visualization)
18+
- [7. Dataset Upload to Hugging Face](#7-dataset-upload-to-hugging-face)
1219
- [Installation](#%EF%B8%8F-installation)
13-
- [Usage](#-usage)
14-
- [1. Individual Beetle Extraction](#1-individual-beetle-extraction)
15-
- [2. Zero-Shot Object Detection](#2-zero-shot-object-detection)
16-
- [3. Quality Control and Validation](#3-quality-control-and-validation)
17-
- [4. Data Visualization](#4-data-visualization)
1820
- [Data Sources](#-data-sources)
1921
- [Citation](#-citation)
2022
- [Acknowledgements](#acknowledgments)
@@ -62,6 +64,8 @@ carabidae_beetle_processing/
6264

6365
## 🔬 Pipeline Components
6466

67+
The pipeline and usage instructions are provided below. Please be sure to set up your coding environments appropriately for the needed portion of the pipeline (see [Installation](#%EF%B8%8F-installation) for detailed guidance).
68+
6569
### 1. **Image Annotation and Extraction**
6670

6771
**File:** `2018_neon_beetles_bbox.xml`
@@ -70,7 +74,6 @@ CVAT (Computer Vision Annotation Tool) annotations containing:
7074
- 577 annotated images
7175
- Bounding box coordinates for individual beetles in group images
7276
- Image dimensions (5568 × 3712 pixels)
73-
- Created: April 2025
7477

7578
**Format:**
7679
```xml
@@ -86,42 +89,32 @@ CVAT (Computer Vision Annotation Tool) annotations containing:
8689

8790
Extracts individual beetle specimens from group images using CVAT XML bounding box annotations. Parses coordinates, crops specimens with optional padding, and saves as numbered PNG files with progress tracking.
8891

92+
#### Usage Instructions
93+
94+
Extract individual beetles from group images using CVAT annotations:
95+
96+
```bash
97+
python scripts/2018_neon_beetles_get_individual_images.py \
98+
--xml_file annotations/2018_neon_beetles_bbox.xml \
99+
--images_dir /path/to/group_images/ \
100+
--output_dir /path/to/individual_beetles/ \
101+
--padding 0
102+
```
103+
104+
Outputs individual beetle images named `{original_name}_specimen_{N}.png`.
105+
89106
### 3. **Image Resizing with Uniform Scaling**
90107

91108
**Script:** `resizing_individual_beetle_images.py`
92109

93-
Aligns individual beetle crops with BeetlePalooza's Zooniverse-processed group images by applying uniform scaling factors. This enables accurate transfer of citizen science measurements from resized group images to individual specimens.
110+
Aligns individual beetle crops with the 2018-NEON-Beetles Zooniverse-processed group images by applying uniform scaling factors. This enables accurate transfer of citizen science measurements from resized group images to individual specimens. Set proper base directories at the top of the script before use.
94111

95112
**Workflow:**
96113
1. Calculate uniform scaling factors (average of x and y) between original and resized group images
97114
2. Apply scaling to all individual specimen images
98115
3. Save scaling metadata and processing statistics to JSON
99116

100-
### 4. **Dataset Upload to Hugging Face**
101-
102-
**Script:** `upload_dataset_to_hf.py`
103-
104-
Utility script for uploading processed beetle datasets to Hugging Face Hub for public access and reproducibility.
105-
106-
**Usage:**
107-
```bash
108-
export HF_TOKEN="your_hugging_face_token"
109-
110-
python upload_dataset_to_hf.py \
111-
--folder_path /path/to/local/images \
112-
--repo_id imageomics/dataset-name \
113-
--path_in_repo images \
114-
--branch main
115-
```
116-
117-
**Parameters:**
118-
- `--folder_path`: Local directory containing files to upload
119-
- `--repo_id`: Hugging Face repository identifier (org/repo-name)
120-
- `--path_in_repo`: Subdirectory within the repository (default: "images")
121-
- `--repo_type`: Repository type - "dataset" or "model" (default: "dataset")
122-
- `--branch`: Target branch name (default: "main")
123-
124-
### 5. **Zero-Shot Object Detection**
117+
### 4. **Zero-Shot Object Detection**
125118

126119
**Script:** `beetle_detection.py` | **Notebook:** `grounding_dino.ipynb`
127120

@@ -141,43 +134,87 @@ Optional parameters: `--model_id` (default: `IDEA-Research/grounding-dino-base`)
141134

142135
The pipeline detects beetles using text prompts, filters by adaptive area thresholds, validates measurement points, applies NMS to remove duplicates, and selects optimal bounding boxes before saving crops and metadata.
143136

144-
### 6. **Inter-Annotator Agreement**
137+
### 5. Quality Control and Validation
138+
139+
#### Inter-Annotator Agreement
145140

146141
**Script:** `inter_annotator.py`
147142

148143
Quantifies measurement consistency between human annotators using three pairwise comparisons. Computes RMSE (measurement disagreement), R² (correlation strength), and average bias (systematic tendencies). Generates `InterAnnotatorAgreement.pdf` with scatter plots and console metrics report.
149144

150-
### 7. **Human vs. Automated System Validation**
145+
```bash
146+
python scripts/inter_annotator.py
147+
```
148+
149+
Edit `DATA_PATH` and `ANNOTATOR_PAIRS` in the script to configure input data and comparisons. Outputs `InterAnnotatorAgreement.pdf` and console metrics.
150+
151+
#### Human vs. Automated System
151152

152153
**Script:** `calipers_vs_toras.py`
153154

154155
Validates automated TORAS measurements against human caliper measurements (gold standard). Compares three annotators individually and averaged against the automated system using RMSE, R², and bias metrics. Generates `CalipersVsToras.pdf` with comparison plots.
155156

157+
```bash
158+
python scripts/calipers_vs_toras.py
159+
```
160+
161+
Edit configuration variables in the script for data paths and comparison pairs. Generates `CalipersVsToras.pdf` with validation metrics.
156162

157-
### 8. **NEON Data Analysis and Visualization**
163+
### 6. **NEON Data Analysis and Visualization**
158164

159165
**Script:** `Figure6and10.R`
160166

161167
Analyzes NEON beetle data from PUUM site (Pu'u Maka'ala Natural Area Reserve, Hawaii) integrated with BeetlePalooza citizen science measurements. Retrieves data via NEON API, merges taxonomic identifications with morphometric measurements, and generates species abundance visualizations. Produces `BeetlePUUM_abundance.png` showing imaging status and merged analysis dataset.
162168

169+
Run R script for NEON data analysis:
170+
171+
```bash
172+
Rscript scripts/Figure6and10.R
173+
```
174+
175+
Requires NEON API token saved in `NEON_Token.txt` (see [NEON token instructions](#neon-api-token)) and BeetlePalooza metadata (2018-NEON-Beetles `individual_metadata.csv`). Edit paths in script as needed. Produces `BeetlePUUM_abundance.png` showing species distributions.
176+
163177
**Requirements:** R packages: `ggplot2`, `dplyr`, `ggpubr`, `neonUtilities`
164178

179+
### 7. **Dataset Upload to Hugging Face**
180+
181+
**Script:** `upload_dataset_to_hf.py`
182+
183+
Utility script used to upload the processed beetle datasets to Hugging Face Hub for public access and reproducibility.
184+
185+
**Usage:**
186+
```bash
187+
export HF_TOKEN="your_hugging_face_token"
188+
189+
python upload_dataset_to_hf.py \
190+
--folder_path /path/to/local/images \
191+
--repo_id imageomics/dataset-name \
192+
--path_in_repo images \
193+
--branch main
194+
```
195+
196+
**Parameters:**
197+
- `--folder_path`: Local directory containing files to upload
198+
- `--repo_id`: Hugging Face repository identifier (org/repo-name)
199+
- `--path_in_repo`: Subdirectory within the repository (default: "images")
200+
- `--repo_type`: Repository type - "dataset" or "model" (default: "dataset")
201+
- `--branch`: Target branch name (default: "main")
202+
165203
---
166204

167205
## 🛠️ Installation
168206

169207
### Prerequisites
170208

171-
- **Python 3.10+** (for Python scripts and notebooks)
172-
- **R 4.0+** (for R scripts)
173-
- **Git** (for version control)
209+
- **Python 3.10+**
210+
- **R 4.0+**
174211
- **CUDA-capable GPU** (recommended for Grounding DINO, but not required)
175212

176213
### Python Setup
177214

178215
1. **Clone the repository:**
179216
```bash
180-
git clone https://github.com/mridulk97/carabidae_beetle_processing.git
217+
git clone git@github.com:Imageomics/carabidae_beetle_processing.git
181218
cd carabidae_beetle_processing
182219
```
183220

@@ -212,71 +249,11 @@ For R script (`Figure6and10.R`):
212249

213250
---
214251

215-
## 🚀 Usage
216-
217-
### 1. Individual Beetle Extraction
218-
219-
Extract individual beetles from group images using CVAT annotations:
220-
221-
```bash
222-
python scripts/2018_neon_beetles_get_individual_images.py \
223-
--xml_file annotations/2018_neon_beetles_bbox.xml \
224-
--images_dir /path/to/group_images/ \
225-
--output_dir /path/to/individual_beetles/ \
226-
--padding 0
227-
```
228-
229-
Outputs individual beetle images named `{original_name}_specimen_{N}.png`.
230-
231-
### 2. Zero-Shot Object Detection
232-
233-
Run automated beetle detection:
234-
235-
```bash
236-
python scripts/beetle_detection.py \
237-
--csv_path data/metadata.csv \
238-
--image_dir data/group_images \
239-
--save_folder data/individual_images \
240-
--output_csv data/processed.csv
241-
```
242-
243-
Optional parameters include `--model_id`, `--text` (detection prompt), `--box_threshold`, `--text_threshold`, `--iou_threshold`, and `--padding`. See Pipeline Components section for parameter details.
244-
245-
### 3. Quality Control and Validation
246-
247-
#### Inter-Annotator Agreement
248-
249-
```bash
250-
python scripts/inter_annotator.py
251-
```
252-
253-
Edit `DATA_PATH` and `ANNOTATOR_PAIRS` in the script to configure input data and comparisons. Outputs `InterAnnotatorAgreement.pdf` and console metrics.
254-
255-
#### Human vs. Automated System
256-
257-
```bash
258-
python scripts/calipers_vs_toras.py
259-
```
260-
261-
Edit configuration variables in the script for data paths and comparison pairs. Generates `CalipersVsToras.pdf` with validation metrics.
262-
263-
### 4. Data Visualization
264-
265-
Run R script for NEON data analysis:
266-
267-
```bash
268-
Rscript scripts/Figure6and10.R
269-
```
270-
271-
Requires NEON API token saved in `NEON_Token.txt` and BeetlePalooza metadata. Edit paths in script as needed. Produces `BeetlePUUM_abundance.png` showing species distributions.
272-
273-
---
274-
275252
## 📊 Data Sources
276253

277254
### Hugging Face Datasets (Primary Access Point)
278255

279-
The processed datasets from this pipeline are available on Hugging Face:
256+
The processed datasets from this pipeline are available on Hugging Face along with the original data:
280257

281258
#### 1. Hawaii Beetles Dataset
282259
**Repository:** [imageomics/Hawaii-beetles](https://huggingface.co/datasets/imageomics/Hawaii-beetles)

0 commit comments

Comments
 (0)