Skip to content

Commit c9816df

Browse files
authored
Merge pull request #3 from Tvenver/collect-prepare-train
Collect prepare train
2 parents 4a8b519 + c3dbb25 commit c9816df

File tree

100 files changed

+16548
-751
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

100 files changed

+16548
-751
lines changed

B++ CV model/CV_model_notes.md

Lines changed: 0 additions & 6 deletions
This file was deleted.
-55.3 KB
Binary file not shown.

README.md

Lines changed: 121 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -12,66 +12,142 @@ This repo can be used to quickly generate YOLOv8 models for biodiversity monitor
1212

1313
All code is tested on macOS and Python 3.12, without GPU. GPU would obviously accelerate the below steps, Ultralytics should automatically select the available GPU if there is any.
1414

15-
## GitHub
16-
https://github.com/Tvenver/Bplusplus
15+
# How does it work?
1716

17+
To use the bplusplus package to train your own insect detection model, we provide four functions: `collect()`, `prepare()`, `train()`, `validate()`. When training an object detection model, you need a dataset with labeled data of insect species. For this package in the `collect()` function, we use images from the GBIF database (https://doi.org/10.15468/dl.dk9czq) and run them through a pretrained *insect detection model*, defining the bounding boxes for the insects, and add the scientific name from the file path. In that way, we are able to prepare a full dataset of labeled and classified insect data for training.
1818

19-
# How does it work?
19+
![Bplusplus overview](./bplusplus2-overview.png)
20+
21+
### Install package
22+
23+
```python
24+
pip install bplusplus
25+
```
26+
27+
### bplusplus.collect()
28+
29+
This function takes three arguments:
30+
- **search_parameters: dict[str, Any]** - List of scientific names of the species you want to collect from the GBIF database
31+
- **images_per_group: int** - Number of images per species collected for training
32+
- **output_directory: str** - Directory to store collected images
33+
34+
Example run:
35+
```python
36+
species_list=[ "Vanessa atalanta", "Gonepteryx rhamni", "Bombus hortorum"]
37+
images_per_group=20
38+
output_directory="/dataset/selected-species"
39+
40+
# Collect data from GBIF
41+
bplusplus.collect(
42+
search_parameters=species_list,
43+
images_per_group=images_per_group,
44+
output_directory=output_directory
45+
)
46+
```
47+
48+
### bplusplus.prepare()
49+
50+
Prepares the dataset for training by performing the following steps:
51+
1. Copies images from the input directory to a temporary directory.
52+
2. Deletes corrupted images.
53+
3. Downloads YOLOv5 weights for *insect detection* if not already present.
54+
4. Runs YOLOv5 inference to generate labels for the images.
55+
5. Deletes orphaned images and inferences.
56+
6. Updates labels based on class mapping.
57+
7. Splits the data into train, test, and validation sets.
58+
8. Counts the total number of images across all splits.
59+
9. Makes a YAML configuration file for YOLOv8.
60+
61+
This function takes three arguments:
62+
- **input_directory: str** - The path to the input directory containing the images.
63+
- **output_directory: str** - The path to the output directory where the prepared dataset will be saved.
64+
- **with_background: bool = False** - Set to False if you don't want to include/download background images
65+
66+
```python
67+
# Prepare data
68+
bplusplus.prepare(
69+
input_directory='/dataset/selected-species',
70+
output_directory='/dataset/prepared-data',
71+
with_background=False
72+
)
73+
```
2074

21-
![Figure 9](https://github.com/user-attachments/assets/a01f513b-0609-412d-a633-3aee1e5dded6)
75+
### bplusplus.train()
2276

23-
1. Select scientific names you want to train your model on. For now, only scientific names are supported as training categories.
24-
2. Select the parameters you want to use to filter your dataset (using the [parameters available in the GBIF Occurrence Search API](https://techdocs.gbif.org/en/openapi/v1/occurrence)).
25-
3. Decide how many images you want to use for training and validation per category.
26-
4. Select a directory to output the model information.
27-
5. Pass the above information to the `build_model` function.
77+
This function takes five arguments:
78+
- **input_yaml: str** - yaml file created to train the model
79+
- **output_directory: str**
80+
- **epochs: int = 30** - Number of epochs to train the model
81+
- **imgsz: int = 640** - Image size
82+
- **batch: int = 16** - Batch size for training
2883

29-
You have created a YOLOv8 model for bug classification.
84+
```python
85+
# Train model
86+
model = bplusplus.train(
87+
input_yaml="/dataset/prepared-data/dataset.yaml", # Make sure to add the correct path
88+
output_directory="trained-model",
89+
epochs=30,
90+
batch=16
91+
)
92+
```
3093

31-
The training and validation is done using Ultralytics. Please visit the Ultralytics YOLOv8 documentation for more information.
94+
### bplusplus.validate()
3295

33-
# Pretrained Model
96+
This function takes two arguments:
97+
- **model** - The trained YOLO model
98+
- **Path to yaml file**
99+
100+
```python
101+
metrics = bplusplus.validate(model, '/dataset/prepared-data/dataset.yaml')
102+
print(metrics)
103+
```
34104

35-
There is also a pretrained YOLOv8 classification model, containing 2584 species, included in this repo under B++ CV Model. The included species are listed in a separate file.
36-
1. Download the pretrained model from the Google Drive link listed in the folder B++ CV Model
37-
2. Take the notebooks/run_model.py script, specify the path to the downloaded .pt file, and run the model.
38105

39-
# Example Usage
40-
## Using search options
106+
107+
## Full example
41108
```python
42-
import os
43109
import bplusplus
44-
from typing import Any
45-
46-
names = [
47-
"Nabis rugosus",
48-
"Forficula auricularia",
49-
"Calosoma inquisitor",
50-
"Bombus veteranus",
51-
"Glyphotaelius pellucidus",
52-
"Notoxus monoceros",
53-
"Cacoxenus indagator",
54-
"Chorthippus mollis",
55-
"Trioza remota"
56-
]
57-
58-
search: dict[str, Any] = {
59-
"scientificName": names,
60-
"country": ["US", "NL"]
61-
}
62-
63-
bplusplus.build_model(
64-
group_by_key=bplusplus.Group.scientificName,
65-
search_parameters=search,
66-
images_per_group=150,
67-
model_output_folder=os.path.join('model')
110+
111+
species_list=[ "Vanessa atalanta", "Gonepteryx rhamni", "Bombus hortorum"]
112+
images_per_group=20
113+
output_directory="/dataset/selected-species"
114+
115+
# Collect data from GBIF
116+
bplusplus.collect(
117+
search_parameters=species_list,
118+
images_per_group=images_per_group,
119+
output_directory=output_directory
120+
)
121+
122+
# Prepare data
123+
bplusplus.prepare(
124+
input_directory='/dataset/selected-species',
125+
output_directory='/dataset/prepared-data',
126+
with_background=False
127+
)
128+
129+
# Train model
130+
model = bplusplus.train(
131+
input_yaml="/dataset/prepared-data/dataset.yaml", # Make sure to add the correct path
132+
output_directory="trained-model",
133+
epochs=30,
134+
batch=16
68135
)
136+
137+
# Validate model
138+
metrics = bplusplus.validate(model, '/dataset/prepared-data/dataset.yaml')
139+
print(metrics)
140+
69141
```
70142

71-
# Pending Improvements
143+
You have created a YOLOv8 model for insect detection.
144+
145+
# Earlier releases
146+
147+
There is also a pretrained YOLOv8 classification model, containing 2584 species, from an earlier release and paper.
148+
The CV model as presented in the paper can be downloaded from: https://drive.google.com/file/d/1wxAIdSzx5nhTOk4izc0RIycoecSdug_Q/view?usp=sharing
72149

73-
* The Ultralytics parameters should be surfaced to the user of the package so they have more control over the training process.
74-
* The GBIF API documentation claims that you can filter on a dataset in your search, however it does not work in my current testing. This would be nice to allow users to create datasets on the GBIF website then pass that DOI directly here, so may warrant a closer look.
150+
To run/use the model, please consult the Ultralytics documentation.
75151

76152

77153
# Citation

bplusplus2-overview.png

456 KB
Loading

0 commit comments

Comments
 (0)