Skip to content

Commit 0b182fe

Browse files
johnbradleyhlappegrace479
committed
Improve performance and prediction results
Introduces two classes that will cache the model so creating predictions for multiple images will only load the model and the tree of life embeddings once. Changes the prediction returned values to be more consistent. One deviation from what was requested with issue #1 is flattening the results to simplify creating a pandas DataFrame. Fixes #1 Simplifies the command line interface to default to csv format and process multiple images. Adds tests that create predictions. These test require downloading the model so may be inappropriate for automated testing. Tests can be run with pytest. Co-authored-by: Hilmar Lapp <[email protected]> Co-authored-by: Elizabeth Campolongo <[email protected]>
1 parent e001ed1 commit 0b182fe

File tree

7 files changed

+379
-207
lines changed

7 files changed

+379
-207
lines changed

README.md

Lines changed: 120 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,8 @@ Command line tool and python package to simplify using [BioCLIP](https://imageom
1212
**Table of Contents**
1313

1414
- [Installation](#installation)
15-
- [Command Line Usage](#command-line-usage)
1615
- [Python Package Usage](#python-package-usage)
17-
- [License](#license)
16+
- [Command Line Usage](#command-line-usage)
1817
- [Acknowledgments](#acknowledgments)
1918
- [License](#license)
2019

@@ -29,132 +28,162 @@ pip install git+https://github.com/Imageomics/pybioclip
2928

3029
If you have any issues with installation, please first upgrade pip by running `pip install --upgrade pip`.
3130

32-
## Command Line Usage
31+
## Python Package Usage
32+
### Predict species classification
3333

34-
### Predict classification
34+
```python
35+
from bioclip import TreeOfLifeClassifier, Rank
3536

36-
#### Example: Predict species for an image
37-
The example image used below is [`Ursus-arctos.jpeg`](https://huggingface.co/spaces/imageomics/bioclip-demo/blob/ef075807a55687b320427196ac1662b9383f988f/examples/Ursus-arctos.jpeg) from the [bioclip-demo](https://huggingface.co/spaces/imageomics/bioclip-demo).
37+
classifier = TreeOfLifeClassifier()
38+
predictions = classifier.predict("Ursus-arctos.jpeg", Rank.SPECIES)
3839

39-
Predict species for an `Ursus-arctos.jpeg` file:
40-
```console
41-
bioclip predict Ursus-arctos.jpeg
42-
```
43-
Output:
40+
for prediction in predictions:
41+
print(prediction["species"], "-", prediction["score"])
4442
```
45-
+----------------------------------------------------------------------------------------+-----------------------+
46-
| Taxon | Probability |
47-
+----------------------------------------------------------------------------------------+-----------------------+
48-
| Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos (Kodiak bear) | 0.9356034994125366 |
49-
| Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos syriacus (syrian brown bear) | 0.05616999790072441 |
50-
| Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos bruinosus | 0.004126196261495352 |
51-
| Animalia Chordata Mammalia Carnivora Ursidae Ursus arctus | 0.0024959812872111797 |
52-
| Animalia Chordata Mammalia Carnivora Ursidae Ursus americanus (Louisiana black bear) | 0.0005009894957765937 |
53-
+----------------------------------------------------------------------------------------+-----------------------+
54-
```
55-
56-
---
5743

58-
To save as a CSV or JSON file you can use the `--format <file type>` and `--output <filename>` arguments with `csv` or `json`, respectively.
59-
60-
To save the JSON output to `ursus.json` run:
44+
Output:
6145
```console
62-
bioclip predict --format json --output ursus.json Ursus-arctos.jpeg
63-
```
46+
Ursus arctos - 0.9356034994125366
47+
Ursus arctos syriacus - 0.05616999790072441
48+
Ursus arctos bruinosus - 0.004126196261495352
49+
Ursus arctus - 0.0024959812872111797
50+
Ursus americanus - 0.0005009894957765937
51+
```
52+
53+
Output from the `predict()` method showing the dictionary structure:
54+
```
55+
[{
56+
'kingdom': 'Animalia',
57+
'phylum': 'Chordata',
58+
'class': 'Mammalia',
59+
'order': 'Carnivora',
60+
'family': 'Ursidae',
61+
'genus': 'Ursus',
62+
'species_epithet': 'arctos',
63+
'species': 'Ursus arctos',
64+
'common_name': 'Kodiak bear'
65+
'score': 0.9356034994125366
66+
}]
67+
```
68+
69+
The output from the predict function can be converted into a [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) like so:
70+
```python
71+
import pandas as pd
72+
from bioclip import TreeOfLifeClassifier, Rank
6473

65-
To save the CSV output to `ursus.csv` run:
66-
```console
67-
bioclip predict --format csv --output ursus.csv Ursus-arctos.jpeg
74+
classifier = TreeOfLifeClassifier()
75+
predictions = classifier.predict("Ursus-arctos.jpeg", Rank.SPECIES)
76+
df = pd.DataFrame(predictions)
6877
```
6978

70-
#### Predict genus for an image
79+
### Predict from a list of classes
80+
```python
81+
from bioclip import CustomLabelsClassifier
7182

72-
Predict genus for image `Ursus-arctos.jpeg`, restricted to the top 3 predictions:
73-
```console
74-
bioclip predict --rank genus --k 3 Ursus-arctos.jpeg
83+
classifier = CustomLabelsClassifier()
84+
predictions = classifier.predict("Ursus-arctos.jpeg", ["duck","fish","bear"])
85+
for prediction in predictions:
86+
print(prediction["classification"], prediction["score"])
7587
```
7688
Output:
89+
```console
90+
duck 1.0306726583309e-09
91+
fish 2.932403668845507e-12
92+
bear 1.0
7793
```
78-
+---------------------------------------------------------+------------------------+
79-
| Taxon | Probability |
80-
+---------------------------------------------------------+------------------------+
81-
| Animalia Chordata Mammalia Carnivora Ursidae Ursus | 0.9994320273399353 |
82-
| Animalia Chordata Mammalia Artiodactyla Cervidae Cervus | 0.00032594642834737897 |
83-
| Animalia Chordata Mammalia Artiodactyla Cervidae Alces | 7.803700282238424e-05 |
84-
+---------------------------------------------------------+------------------------+
94+
95+
## Command Line Usage
8596
```
97+
bioclip predict [options] [IMAGE_FILE...]
8698
87-
#### Optional arguments for predicting classifications:
88-
- `--rank RANK` - rank of the classification (kingdom, phylum, class, order, family, genus, species) [default: species]
89-
- `--k K` - number of top predictions to show [default: 5]
90-
- `--format FORMAT` - format of the output (table, json, or csv) [default: table]
91-
- `--output OUTPUT` - save output to a filename instead of printing it [default: stdout]
99+
Arguments:
100+
IMAGE_FILE input image file
92101
102+
Options:
103+
-h --help
104+
--format=FORMAT format of the output (table or csv) [default: csv]
105+
--rank=RANK rank of the classification (kingdom, phylum, class, order, family, genus, species)
106+
[default: species]
107+
--k=K number of top predictions to show [default: 5]
108+
--cls=CLS comma separated list of classes to predict, when specified the --rank and
109+
--k arguments are ignored [default: all]
110+
--output=OUTFILE print output to file OUTFILE [default: stdout]
111+
```
93112

94-
### Predict from a list of classes
113+
### Predict classification
95114

96-
Create predictions for 3 classes (cat, bird, and bear) for image `Ursus-arctos.jpeg`:
115+
#### Predict species for an image
116+
The example images used below are [`Ursus-arctos.jpeg`](https://huggingface.co/spaces/imageomics/bioclip-demo/blob/ef075807a55687b320427196ac1662b9383f988f/examples/Ursus-arctos.jpeg)
117+
and [`Felis-catus.jpeg`](https://huggingface.co/spaces/imageomics/bioclip-demo/blob/ef075807a55687b320427196ac1662b9383f988f/examples/Felis-catus.jpeg) both from the [bioclip-demo](https://huggingface.co/spaces/imageomics/bioclip-demo).
118+
119+
Predict species for an `Ursus-arctos.jpeg` file:
97120
```console
98-
bioclip predict --cls cat,bird,bear Ursus-arctos.jpeg
121+
bioclip predict Ursus-arctos.jpeg
99122
```
100123
Output:
101124
```
102-
+-------+-----------------------+
103-
| Taxon | Probability |
104-
+-------+-----------------------+
105-
| cat | 4.581644930112816e-08 |
106-
| bird | 3.051998476166773e-08 |
107-
| bear | 0.9999998807907104 |
108-
+-------+-----------------------+%
125+
bioclip predict Ursus-arctos.jpeg
126+
file_name,kingdom,phylum,class,order,family,genus,species_epithet,species,common_name,score
127+
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos,Ursus arctos,Kodiak bear,0.9356034994125366
128+
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos syriacus,Ursus arctos syriacus,syrian brown bear,0.05616999790072441
129+
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos bruinosus,Ursus arctos bruinosus,,0.004126196261495352
130+
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctus,Ursus arctus,,0.0024959812872111797
131+
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,americanus,Ursus americanus,Louisiana black bear,0.0005009894957765937
109132
```
110133

111-
#### Optional arguments for predicting from a list of classes:
112-
- `--format FORMAT` - format of the output (table, json, or csv) [default: table]
113-
- `--output OUTPUT` - save output to a filename instead of printing it [default: stdout]
114-
- `--cls CLS` - comma separated list of classes to predict, when specified the `--rank` and `--k` arguments are ignored [default: all]
134+
#### Predict species for multiple images saving to a file
115135

116-
117-
### View command line help
136+
To make predictions for files `Ursus-arctos.jpeg` and `Felis-catus.jpeg` saving the output to a file named `predictions.csv`:
118137
```console
119-
bioclip --help
138+
bioclip predict --output predictions.csv Ursus-arctos.jpeg Felis-catus.jpeg
139+
```
140+
The contents of `predictions.csv` will look like this:
141+
```
142+
file_name,kingdom,phylum,class,order,family,genus,species_epithet,species,common_name,score
143+
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos,Ursus arctos,Kodiak bear,0.9356034994125366
144+
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos syriacus,Ursus arctos syriacus,syrian brown bear,0.05616999790072441
145+
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos bruinosus,Ursus arctos bruinosus,,0.004126196261495352
146+
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctus,Ursus arctus,,0.0024959812872111797
147+
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,americanus,Ursus americanus,Louisiana black bear,0.0005009894957765937
148+
Felis-catus.jpeg,Animalia,Chordata,Mammalia,Carnivora,Felidae,Felis,silvestris,Felis silvestris,European Wildcat,0.7221033573150635
149+
Felis-catus.jpeg,Animalia,Chordata,Mammalia,Carnivora,Felidae,Felis,catus,Felis catus,Domestic Cat,0.19810837507247925
150+
Felis-catus.jpeg,Animalia,Chordata,Mammalia,Carnivora,Felidae,Felis,margarita,Felis margarita,Sand Cat,0.02798456884920597
151+
Felis-catus.jpeg,Animalia,Chordata,Mammalia,Carnivora,Felidae,Lynx,felis,Lynx felis,,0.021829601377248764
152+
Felis-catus.jpeg,Animalia,Chordata,Mammalia,Carnivora,Felidae,Felis,bieti,Felis bieti,Chinese desert cat,0.010979168117046356
120153
```
121154

122-
## Python Package Usage
123-
### Predict species classification
124-
125-
```python
126-
from bioclip import predict_classification, Rank
127-
128-
predictions = predict_classification("Ursus-arctos.jpeg", Rank.SPECIES)
129-
130-
for species_name, probability in predictions.items():
131-
print(species_name, probability)
155+
#### Predict top 3 genera for an image and display output as a table
156+
```console
157+
bioclip predict --format table --k 3 --rank=genus Ursus-arctos.jpeg
132158
```
133159

134160
Output:
135-
```console
136-
Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos (Kodiak bear) 0.9356034994125366
137-
Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos syriacus (syrian brown bear) 0.05616999790072441
138-
Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos bruinosus 0.004126196261495352
139-
Animalia Chordata Mammalia Carnivora Ursidae Ursus arctus 0.0024959812872111797
140-
Animalia Chordata Mammalia Carnivora Ursidae Ursus americanus (Louisiana black bear) 0.0005009894957765937
161+
```
162+
+-------------------+----------+----------+----------+--------------+----------+--------+------------------------+
163+
| file_name | kingdom | phylum | class | order | family | genus | score |
164+
+-------------------+----------+----------+----------+--------------+----------+--------+------------------------+
165+
| Ursus-arctos.jpeg | Animalia | Chordata | Mammalia | Carnivora | Ursidae | Ursus | 0.9994320273399353 |
166+
| Ursus-arctos.jpeg | Animalia | Chordata | Mammalia | Artiodactyla | Cervidae | Cervus | 0.00032594642834737897 |
167+
| Ursus-arctos.jpeg | Animalia | Chordata | Mammalia | Artiodactyla | Cervidae | Alces | 7.803700282238424e-05 |
168+
+-------------------+----------+----------+----------+--------------+----------+--------+------------------------+
141169
```
142170

143171
### Predict from a list of classes
144-
```python
145-
from bioclip import predict_classifications_from_list, Rank
146-
147-
predictions = predict_classifications_from_list("Ursus-arctos.jpeg",
148-
["duck","fish","bear"])
149-
150-
for cls, probability in predictions.items():
151-
print(cls, probability)
172+
Create predictions for 3 classes (cat, bird, and bear) for image `Ursus-arctos.jpeg`:
173+
```console
174+
bioclip predict --cls cat,bird,bear Ursus-arctos.jpeg
152175
```
153176
Output:
177+
```
178+
file_name,classification,score
179+
Ursus-arctos.jpeg,cat,4.581644930112816e-08
180+
Ursus-arctos.jpeg,bird,3.051998476166773e-08
181+
Ursus-arctos.jpeg,bear,0.9999998807907104
182+
```
183+
184+
### View command line help
154185
```console
155-
duck 1.0306726583309e-09
156-
fish 2.932403668845507e-12
157-
bear 1.0
186+
bioclip --help
158187
```
159188

160189
## License

pyproject.toml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ dependencies = [
3232
'torch',
3333
'docopt-ng',
3434
'prettytable',
35+
'pandas',
3536
]
3637

3738
[project.urls]
@@ -90,3 +91,8 @@ exclude_lines = [
9091
"if __name__ == .__main__.:",
9192
"if TYPE_CHECKING:",
9293
]
94+
95+
[tool.pytest.ini_options]
96+
pythonpath = [
97+
"src"
98+
]

src/bioclip/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# SPDX-FileCopyrightText: 2024-present John Bradley <[email protected]>
22
#
33
# SPDX-License-Identifier: MIT
4-
from bioclip.predict import predict_classification, Rank, predict_classifications_from_list
4+
from bioclip.predict import TreeOfLifeClassifier, Rank, CustomLabelsClassifier
55

6-
__all__ = ["predict_classification", "Rank", "predict_classifications_from_list"]
6+
__all__ = ["TreeOfLifeClassifier", "Rank", "CustomLabelsClassifier"]

src/bioclip/__main__.py

Lines changed: 30 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""Usage: bioclip predict [options] IMAGE_FILE
1+
"""Usage: bioclip predict [options] [IMAGE_FILE...]
22
33
Use BioCLIP to generate predictions for an IMAGE_FILE.
44
@@ -7,36 +7,41 @@
77
88
Options:
99
-h --help
10-
--format=FORMAT format of the output (table, json, or csv) [default: table]
10+
--format=FORMAT format of the output (table or csv) [default: csv]
1111
--rank=RANK rank of the classification (kingdom, phylum, class, order, family, genus, species) [default: species]
1212
--k=K number of top predictions to show [default: 5]
1313
--cls=CLS comma separated list of classes to predict, when specified the --rank and --k arguments are ignored [default: all]
14-
--output=OUTFILE save output to a filename instead of printing it [default: stdout]
14+
--output=OUTFILE print output to file OUTFILE [default: stdout]
1515
1616
"""
1717
from docopt import docopt
18-
from bioclip import predict_classification, predict_classifications_from_list, Rank
18+
from bioclip import TreeOfLifeClassifier, Rank, CustomLabelsClassifier
1919
import json
2020
import sys
2121
import prettytable as pt
2222
import csv
23+
import pandas as pd
2324

2425

25-
def write_results(result, format, outfile):
26+
def write_results(data, format, output):
27+
df = pd.DataFrame(data)
28+
if output == 'stdout':
29+
write_results_to_file(df, format, sys.stdout)
30+
else:
31+
with open(output, 'w') as outfile:
32+
write_results_to_file(df, format, outfile)
33+
34+
35+
def write_results_to_file(df, format, outfile):
2636
if format == 'table':
2737
table = pt.PrettyTable()
28-
table.field_names = ['Taxon', 'Probability']
29-
for taxon, prob in result.items():
30-
table.add_row([taxon, prob])
38+
table.field_names = df.columns
39+
for index, row in df.iterrows():
40+
table.add_row(row)
3141
outfile.write(str(table))
3242
outfile.write('\n')
33-
elif format == 'json':
34-
json.dump(result, outfile, indent=2)
3543
elif format == 'csv':
36-
writer = csv.writer(outfile)
37-
writer.writerow(['Taxon', 'Probability'])
38-
for taxon, prob in result.items():
39-
writer.writerow([taxon, prob])
44+
df.to_csv(outfile, index=False)
4045
else:
4146
raise ValueError(f"Invalid format: {format}")
4247

@@ -48,22 +53,21 @@ def main():
4853
output = x['--output']
4954
image_file = x['IMAGE_FILE']
5055
cls = x['--cls']
51-
if not format in ['table', 'json', 'csv']:
56+
if not format in ['table', 'csv']:
5257
raise ValueError(f"Invalid format: {format}")
5358
rank = Rank[x['--rank'].upper()]
5459
if cls == 'all':
55-
result = predict_classification(img=image_file,
56-
rank=rank,
57-
k=int(x['--k']))
58-
else:
59-
result = predict_classifications_from_list(img=image_file,
60-
cls_ary=cls.split(','))
61-
outfile = sys.stdout
62-
if output == 'stdout':
63-
write_results(result, format, sys.stdout)
60+
classifier = TreeOfLifeClassifier()
61+
data = []
62+
for image_path in image_file:
63+
data.extend(classifier.predict(image_path=image_path, rank=rank, k=int(x['--k'])))
64+
write_results(data, format, output)
6465
else:
65-
with open(output, 'w') as outfile:
66-
write_results(result, format, outfile)
66+
classifier = CustomLabelsClassifier()
67+
data = []
68+
for image_path in image_file:
69+
data.extend(classifier.predict(image_path=image_path, cls_ary=cls.split(',')))
70+
write_results(data, format, output)
6771

6872

6973
if __name__ == '__main__':

0 commit comments

Comments
 (0)