Skip to content

Commit ca624fe

Browse files
johnbradleyhlappegrace479
committed
Add bioclip command line tool
Adds python package with command line bioclip tool. The code in predict.py is based on app.py and supporting code from https://huggingface.co/spaces/imageomics/bioclip-demo commit ef075807a55687b320427196ac1662b9383f988f. Excluding python 3.12 since pytorch doesn't support it. Co-authored-by: Hilmar Lapp <[email protected]> Co-authored-by: Elizabeth Campolongo <[email protected]>
1 parent 02080b8 commit ca624fe

File tree

7 files changed

+561
-1
lines changed

7 files changed

+561
-1
lines changed

README.md

Lines changed: 164 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,165 @@
11
# pybioclip
2-
Python package to simplify use of BioCLIP
2+
3+
4+
[![PyPI - Version](https://img.shields.io/pypi/v/bioclip.svg)](https://pypi.org/project/bioclip)
5+
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/bioclip.svg)](https://pypi.org/project/bioclip)
6+
7+
-----
8+
9+
Command line tool and python package to simplify using [BioCLIP](https://imageomics.github.io/bioclip/).
10+
11+
12+
**Table of Contents**
13+
14+
- [Installation](#installation)
15+
- [Command Line Usage](#command-line-usage)
16+
- [Python Package Usage](#python-package-usage)
17+
- [License](#license)
18+
- [Acknowledgments](#acknowledgments)
19+
- [License](#license)
20+
21+
## Requirements
22+
- Python compatible with [PyTorch](https://pytorch.org/get-started/locally/#linux-python)
23+
24+
## Installation
25+
26+
```console
27+
pip install git+https://github.com/Imageomics/pybioclip
28+
```
29+
30+
If you have any issues with installation, please first upgrade pip by running `pip install --upgrade pip`.
31+
32+
## Command Line Usage
33+
34+
### Predict classification
35+
36+
#### Example: Predict species for an image
37+
The example image used below is [`Ursus-arctos.jpeg`](https://huggingface.co/spaces/imageomics/bioclip-demo/blob/ef075807a55687b320427196ac1662b9383f988f/examples/Ursus-arctos.jpeg) from the [bioclip-demo](https://huggingface.co/spaces/imageomics/bioclip-demo).
38+
39+
Predict species for an `Ursus-arctos.jpeg` file:
40+
```console
41+
bioclip predict Ursus-arctos.jpeg
42+
```
43+
Output:
44+
```
45+
+----------------------------------------------------------------------------------------+-----------------------+
46+
| Taxon | Probability |
47+
+----------------------------------------------------------------------------------------+-----------------------+
48+
| Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos (Kodiak bear) | 0.9356034994125366 |
49+
| Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos syriacus (syrian brown bear) | 0.05616999790072441 |
50+
| Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos bruinosus | 0.004126196261495352 |
51+
| Animalia Chordata Mammalia Carnivora Ursidae Ursus arctus | 0.0024959812872111797 |
52+
| Animalia Chordata Mammalia Carnivora Ursidae Ursus americanus (Louisiana black bear) | 0.0005009894957765937 |
53+
+----------------------------------------------------------------------------------------+-----------------------+
54+
```
55+
56+
---
57+
58+
To save as a CSV or JSON file you can use the `--format <file type>` and `--output <filename>` arguments with `csv` or `json`, respectively.
59+
60+
To save the JSON output to `ursus.json` run:
61+
```console
62+
bioclip predict --format json --output ursus.json Ursus-arctos.jpeg
63+
```
64+
65+
To save the CSV output to `ursus.csv` run:
66+
```console
67+
bioclip predict --format csv --output ursus.csv Ursus-arctos.jpeg
68+
```
69+
70+
#### Predict genus for an image
71+
72+
Predict genus for image `Ursus-arctos.jpeg`, restricted to the top 3 predictions:
73+
```console
74+
bioclip predict --rank genus --k 3 Ursus-arctos.jpeg
75+
```
76+
Output:
77+
```
78+
+---------------------------------------------------------+------------------------+
79+
| Taxon | Probability |
80+
+---------------------------------------------------------+------------------------+
81+
| Animalia Chordata Mammalia Carnivora Ursidae Ursus | 0.9994320273399353 |
82+
| Animalia Chordata Mammalia Artiodactyla Cervidae Cervus | 0.00032594642834737897 |
83+
| Animalia Chordata Mammalia Artiodactyla Cervidae Alces | 7.803700282238424e-05 |
84+
+---------------------------------------------------------+------------------------+
85+
```
86+
87+
#### Optional arguments for predicting classifications:
88+
- `--rank RANK` - rank of the classification (kingdom, phylum, class, order, family, genus, species) [default: species]
89+
- `--k K` - number of top predictions to show [default: 5]
90+
- `--format FORMAT` - format of the output (table, json, or csv) [default: table]
91+
- `--output OUTPUT` - save output to a filename instead of printing it [default: stdout]
92+
93+
94+
### Predict from a list of classes
95+
96+
Create predictions for 3 classes (cat, bird, and bear) for image `Ursus-arctos.jpeg`:
97+
```console
98+
bioclip predict --cls cat,bird,bear Ursus-arctos.jpeg
99+
```
100+
Output:
101+
```
102+
+-------+-----------------------+
103+
| Taxon | Probability |
104+
+-------+-----------------------+
105+
| cat | 4.581644930112816e-08 |
106+
| bird | 3.051998476166773e-08 |
107+
| bear | 0.9999998807907104 |
108+
+-------+-----------------------+%
109+
```
110+
111+
#### Optional arguments for predicting from a list of classes:
112+
- `--format FORMAT` - format of the output (table, json, or csv) [default: table]
113+
- `--output OUTPUT` - save output to a filename instead of printing it [default: stdout]
114+
- `--cls CLS` - comma separated list of classes to predict, when specified the `--rank` and `--k` arguments are ignored [default: all]
115+
116+
117+
### View command line help
118+
```console
119+
bioclip --help
120+
```
121+
122+
## Python Package Usage
123+
### Predict species classification
124+
125+
```python
126+
from bioclip import predict_classification, Rank
127+
128+
predictions = predict_classification("Ursus-arctos.jpeg", Rank.SPECIES)
129+
130+
for species_name, probability in predictions.items():
131+
print(species_name, probability)
132+
```
133+
134+
Output:
135+
```console
136+
Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos (Kodiak bear) 0.9356034994125366
137+
Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos syriacus (syrian brown bear) 0.05616999790072441
138+
Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos bruinosus 0.004126196261495352
139+
Animalia Chordata Mammalia Carnivora Ursidae Ursus arctus 0.0024959812872111797
140+
Animalia Chordata Mammalia Carnivora Ursidae Ursus americanus (Louisiana black bear) 0.0005009894957765937
141+
```
142+
143+
### Predict from a list of classes
144+
```python
145+
from bioclip import predict_classifications_from_list, Rank
146+
147+
predictions = predict_classifications_from_list("Ursus-arctos.jpeg",
148+
["duck","fish","bear"])
149+
150+
for cls, probability in predictions.items():
151+
print(cls, probability)
152+
```
153+
Output:
154+
```console
155+
duck 1.0306726583309e-09
156+
fish 2.932403668845507e-12
157+
bear 1.0
158+
```
159+
160+
## License
161+
162+
`pybioclip` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.
163+
164+
## Acknowledgments
165+
The [prediction code in this repo](src/bioclip/predict.py) is based on work by [@samuelstevens](https://github.com/samuelstevens) in [bioclip-demo](https://huggingface.co/spaces/imageomics/bioclip-demo/tree/ef075807a55687b320427196ac1662b9383f988f).

pyproject.toml

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
[build-system]
2+
requires = ["hatchling"]
3+
build-backend = "hatchling.build"
4+
5+
[tool.hatch.build.targets.wheel]
6+
packages = ["src/bioclip"]
7+
8+
[project]
9+
name = "pybioclip"
10+
dynamic = ["version"]
11+
description = 'Python package that simplifies using the BioCLIP foundation model.'
12+
readme = "README.md"
13+
requires-python = ">=3.8"
14+
license = "MIT"
15+
keywords = []
16+
authors = [
17+
{ name = "John Bradley", email = "[email protected]" },
18+
]
19+
classifiers = [
20+
"Development Status :: 4 - Beta",
21+
"Programming Language :: Python",
22+
"Programming Language :: Python :: 3.8",
23+
"Programming Language :: Python :: 3.9",
24+
"Programming Language :: Python :: 3.10",
25+
"Programming Language :: Python :: 3.11",
26+
"Programming Language :: Python :: Implementation :: CPython",
27+
"Programming Language :: Python :: Implementation :: PyPy"
28+
]
29+
dependencies = [
30+
'open_clip_torch',
31+
'torchvision',
32+
'torch',
33+
'docopt-ng',
34+
'prettytable',
35+
]
36+
37+
[project.urls]
38+
Documentation = "https://github.com/Imageomics/pybioclip#readme"
39+
Issues = "https://github.com/Imageomics/pybioclip/issues"
40+
Source = "https://github.com/Imageomics/pybioclip"
41+
42+
[project.scripts]
43+
bioclip = "bioclip.__main__:main"
44+
45+
[tool.hatch.version]
46+
path = "src/bioclip/__about__.py"
47+
48+
[tool.hatch.envs.default]
49+
dependencies = [
50+
"coverage[toml]>=6.5",
51+
"pytest",
52+
]
53+
[tool.hatch.envs.default.scripts]
54+
test = "pytest {args:tests}"
55+
test-cov = "coverage run -m pytest {args:tests}"
56+
cov-report = [
57+
"- coverage combine",
58+
"coverage report",
59+
]
60+
cov = [
61+
"test-cov",
62+
"cov-report",
63+
]
64+
65+
[[tool.hatch.envs.all.matrix]]
66+
python = ["3.8", "3.9", "3.10", "3.11"]
67+
68+
[tool.hatch.envs.types]
69+
dependencies = [
70+
"mypy>=1.0.0",
71+
]
72+
[tool.hatch.envs.types.scripts]
73+
check = "mypy --install-types --non-interactive {args:src/bioclip tests}"
74+
75+
[tool.coverage.run]
76+
source_pkgs = ["bioclip", "tests"]
77+
branch = true
78+
parallel = true
79+
omit = [
80+
"src/bioclip/__about__.py",
81+
]
82+
83+
[tool.coverage.paths]
84+
bioclip = ["src/bioclip", "*/bioclip/src/bioclip"]
85+
tests = ["tests", "*/bioclip/tests"]
86+
87+
[tool.coverage.report]
88+
exclude_lines = [
89+
"no cov",
90+
"if __name__ == .__main__.:",
91+
"if TYPE_CHECKING:",
92+
]

src/bioclip/__about__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# SPDX-FileCopyrightText: 2024-present John Bradley <[email protected]>
2+
#
3+
# SPDX-License-Identifier: MIT
4+
__version__ = "0.0.1"

src/bioclip/__init__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# SPDX-FileCopyrightText: 2024-present John Bradley <[email protected]>
2+
#
3+
# SPDX-License-Identifier: MIT
4+
from bioclip.predict import predict_classification, Rank, predict_classifications_from_list
5+
6+
__all__ = ["predict_classification", "Rank", "predict_classifications_from_list"]

src/bioclip/__main__.py

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
"""Usage: bioclip predict [options] IMAGE_FILE
2+
3+
Use BioCLIP to generate predictions for an IMAGE_FILE.
4+
5+
Arguments:
6+
IMAGE_FILE input image file
7+
8+
Options:
9+
-h --help
10+
--format=FORMAT format of the output (table, json, or csv) [default: table]
11+
--rank=RANK rank of the classification (kingdom, phylum, class, order, family, genus, species) [default: species]
12+
--k=K number of top predictions to show [default: 5]
13+
--cls=CLS comma separated list of classes to predict, when specified the --rank and --k arguments are ignored [default: all]
14+
--output=OUTFILE save output to a filename instead of printing it [default: stdout]
15+
16+
"""
17+
from docopt import docopt
18+
from bioclip import predict_classification, predict_classifications_from_list, Rank
19+
import json
20+
import sys
21+
import prettytable as pt
22+
import csv
23+
24+
25+
def write_results(result, format, outfile):
26+
if format == 'table':
27+
table = pt.PrettyTable()
28+
table.field_names = ['Taxon', 'Probability']
29+
for taxon, prob in result.items():
30+
table.add_row([taxon, prob])
31+
outfile.write(str(table))
32+
outfile.write('\n')
33+
elif format == 'json':
34+
json.dump(result, outfile, indent=2)
35+
elif format == 'csv':
36+
writer = csv.writer(outfile)
37+
writer.writerow(['Taxon', 'Probability'])
38+
for taxon, prob in result.items():
39+
writer.writerow([taxon, prob])
40+
else:
41+
raise ValueError(f"Invalid format: {format}")
42+
43+
44+
def main():
45+
# execute only if run as the entry point into the program
46+
x = docopt(__doc__) # parse arguments based on docstring above
47+
format = x['--format']
48+
output = x['--output']
49+
image_file = x['IMAGE_FILE']
50+
cls = x['--cls']
51+
if not format in ['table', 'json', 'csv']:
52+
raise ValueError(f"Invalid format: {format}")
53+
rank = Rank[x['--rank'].upper()]
54+
if cls == 'all':
55+
result = predict_classification(img=image_file,
56+
rank=rank,
57+
k=int(x['--k']))
58+
else:
59+
result = predict_classifications_from_list(img=image_file,
60+
cls_ary=cls.split(','))
61+
outfile = sys.stdout
62+
if output == 'stdout':
63+
write_results(result, format, sys.stdout)
64+
else:
65+
with open(output, 'w') as outfile:
66+
write_results(result, format, outfile)
67+
68+
69+
if __name__ == '__main__':
70+
main()

0 commit comments

Comments
 (0)