Skip to content

Commit e001ed1

Browse files
authored
Merge pull request #3 from Imageomics/initial-version
Add bioclip command line tool
2 parents 02080b8 + ca624fe commit e001ed1

File tree

7 files changed

+561
-1
lines changed

7 files changed

+561
-1
lines changed

README.md

Lines changed: 164 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,165 @@
11
# pybioclip
2-
Python package to simplify use of BioCLIP
2+
3+
4+
[![PyPI - Version](https://img.shields.io/pypi/v/bioclip.svg)](https://pypi.org/project/bioclip)
5+
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/bioclip.svg)](https://pypi.org/project/bioclip)
6+
7+
-----
8+
9+
Command line tool and python package to simplify using [BioCLIP](https://imageomics.github.io/bioclip/).
10+
11+
12+
**Table of Contents**
13+
14+
- [Installation](#installation)
15+
- [Command Line Usage](#command-line-usage)
16+
- [Python Package Usage](#python-package-usage)
17+
- [License](#license)
18+
- [Acknowledgments](#acknowledgments)
19+
- [License](#license)
20+
21+
## Requirements
22+
- Python compatible with [PyTorch](https://pytorch.org/get-started/locally/#linux-python)
23+
24+
## Installation
25+
26+
```console
27+
pip install git+https://github.com/Imageomics/pybioclip
28+
```
29+
30+
If you have any issues with installation, please first upgrade pip by running `pip install --upgrade pip`.
31+
32+
## Command Line Usage
33+
34+
### Predict classification
35+
36+
#### Example: Predict species for an image
37+
The example image used below is [`Ursus-arctos.jpeg`](https://huggingface.co/spaces/imageomics/bioclip-demo/blob/ef075807a55687b320427196ac1662b9383f988f/examples/Ursus-arctos.jpeg) from the [bioclip-demo](https://huggingface.co/spaces/imageomics/bioclip-demo).
38+
39+
Predict species for an `Ursus-arctos.jpeg` file:
40+
```console
41+
bioclip predict Ursus-arctos.jpeg
42+
```
43+
Output:
44+
```
45+
+----------------------------------------------------------------------------------------+-----------------------+
46+
| Taxon | Probability |
47+
+----------------------------------------------------------------------------------------+-----------------------+
48+
| Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos (Kodiak bear) | 0.9356034994125366 |
49+
| Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos syriacus (syrian brown bear) | 0.05616999790072441 |
50+
| Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos bruinosus | 0.004126196261495352 |
51+
| Animalia Chordata Mammalia Carnivora Ursidae Ursus arctus | 0.0024959812872111797 |
52+
| Animalia Chordata Mammalia Carnivora Ursidae Ursus americanus (Louisiana black bear) | 0.0005009894957765937 |
53+
+----------------------------------------------------------------------------------------+-----------------------+
54+
```
55+
56+
---
57+
58+
To save as a CSV or JSON file you can use the `--format <file type>` and `--output <filename>` arguments with `csv` or `json`, respectively.
59+
60+
To save the JSON output to `ursus.json` run:
61+
```console
62+
bioclip predict --format json --output ursus.json Ursus-arctos.jpeg
63+
```
64+
65+
To save the CSV output to `ursus.csv` run:
66+
```console
67+
bioclip predict --format csv --output ursus.csv Ursus-arctos.jpeg
68+
```
69+
70+
#### Predict genus for an image
71+
72+
Predict genus for image `Ursus-arctos.jpeg`, restricted to the top 3 predictions:
73+
```console
74+
bioclip predict --rank genus --k 3 Ursus-arctos.jpeg
75+
```
76+
Output:
77+
```
78+
+---------------------------------------------------------+------------------------+
79+
| Taxon | Probability |
80+
+---------------------------------------------------------+------------------------+
81+
| Animalia Chordata Mammalia Carnivora Ursidae Ursus | 0.9994320273399353 |
82+
| Animalia Chordata Mammalia Artiodactyla Cervidae Cervus | 0.00032594642834737897 |
83+
| Animalia Chordata Mammalia Artiodactyla Cervidae Alces | 7.803700282238424e-05 |
84+
+---------------------------------------------------------+------------------------+
85+
```
86+
87+
#### Optional arguments for predicting classifications:
88+
- `--rank RANK` - rank of the classification (kingdom, phylum, class, order, family, genus, species) [default: species]
89+
- `--k K` - number of top predictions to show [default: 5]
90+
- `--format FORMAT` - format of the output (table, json, or csv) [default: table]
91+
- `--output OUTPUT` - save output to a filename instead of printing it [default: stdout]
92+
93+
94+
### Predict from a list of classes
95+
96+
Create predictions for 3 classes (cat, bird, and bear) for image `Ursus-arctos.jpeg`:
97+
```console
98+
bioclip predict --cls cat,bird,bear Ursus-arctos.jpeg
99+
```
100+
Output:
101+
```
102+
+-------+-----------------------+
103+
| Taxon | Probability |
104+
+-------+-----------------------+
105+
| cat | 4.581644930112816e-08 |
106+
| bird | 3.051998476166773e-08 |
107+
| bear | 0.9999998807907104 |
108+
+-------+-----------------------+%
109+
```
110+
111+
#### Optional arguments for predicting from a list of classes:
112+
- `--format FORMAT` - format of the output (table, json, or csv) [default: table]
113+
- `--output OUTPUT` - save output to a filename instead of printing it [default: stdout]
114+
- `--cls CLS` - comma separated list of classes to predict, when specified the `--rank` and `--k` arguments are ignored [default: all]
115+
116+
117+
### View command line help
118+
```console
119+
bioclip --help
120+
```
121+
122+
## Python Package Usage
123+
### Predict species classification
124+
125+
```python
126+
from bioclip import predict_classification, Rank
127+
128+
predictions = predict_classification("Ursus-arctos.jpeg", Rank.SPECIES)
129+
130+
for species_name, probability in predictions.items():
131+
print(species_name, probability)
132+
```
133+
134+
Output:
135+
```console
136+
Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos (Kodiak bear) 0.9356034994125366
137+
Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos syriacus (syrian brown bear) 0.05616999790072441
138+
Animalia Chordata Mammalia Carnivora Ursidae Ursus arctos bruinosus 0.004126196261495352
139+
Animalia Chordata Mammalia Carnivora Ursidae Ursus arctus 0.0024959812872111797
140+
Animalia Chordata Mammalia Carnivora Ursidae Ursus americanus (Louisiana black bear) 0.0005009894957765937
141+
```
142+
143+
### Predict from a list of classes
144+
```python
145+
from bioclip import predict_classifications_from_list, Rank
146+
147+
predictions = predict_classifications_from_list("Ursus-arctos.jpeg",
148+
["duck","fish","bear"])
149+
150+
for cls, probability in predictions.items():
151+
print(cls, probability)
152+
```
153+
Output:
154+
```console
155+
duck 1.0306726583309e-09
156+
fish 2.932403668845507e-12
157+
bear 1.0
158+
```
159+
160+
## License
161+
162+
`pybioclip` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.
163+
164+
## Acknowledgments
165+
The [prediction code in this repo](src/bioclip/predict.py) is based on work by [@samuelstevens](https://github.com/samuelstevens) in [bioclip-demo](https://huggingface.co/spaces/imageomics/bioclip-demo/tree/ef075807a55687b320427196ac1662b9383f988f).

pyproject.toml

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
[build-system]
2+
requires = ["hatchling"]
3+
build-backend = "hatchling.build"
4+
5+
[tool.hatch.build.targets.wheel]
6+
packages = ["src/bioclip"]
7+
8+
[project]
9+
name = "pybioclip"
10+
dynamic = ["version"]
11+
description = 'Python package that simplifies using the BioCLIP foundation model.'
12+
readme = "README.md"
13+
requires-python = ">=3.8"
14+
license = "MIT"
15+
keywords = []
16+
authors = [
17+
{ name = "John Bradley", email = "[email protected]" },
18+
]
19+
classifiers = [
20+
"Development Status :: 4 - Beta",
21+
"Programming Language :: Python",
22+
"Programming Language :: Python :: 3.8",
23+
"Programming Language :: Python :: 3.9",
24+
"Programming Language :: Python :: 3.10",
25+
"Programming Language :: Python :: 3.11",
26+
"Programming Language :: Python :: Implementation :: CPython",
27+
"Programming Language :: Python :: Implementation :: PyPy"
28+
]
29+
dependencies = [
30+
'open_clip_torch',
31+
'torchvision',
32+
'torch',
33+
'docopt-ng',
34+
'prettytable',
35+
]
36+
37+
[project.urls]
38+
Documentation = "https://github.com/Imageomics/pybioclip#readme"
39+
Issues = "https://github.com/Imageomics/pybioclip/issues"
40+
Source = "https://github.com/Imageomics/pybioclip"
41+
42+
[project.scripts]
43+
bioclip = "bioclip.__main__:main"
44+
45+
[tool.hatch.version]
46+
path = "src/bioclip/__about__.py"
47+
48+
[tool.hatch.envs.default]
49+
dependencies = [
50+
"coverage[toml]>=6.5",
51+
"pytest",
52+
]
53+
[tool.hatch.envs.default.scripts]
54+
test = "pytest {args:tests}"
55+
test-cov = "coverage run -m pytest {args:tests}"
56+
cov-report = [
57+
"- coverage combine",
58+
"coverage report",
59+
]
60+
cov = [
61+
"test-cov",
62+
"cov-report",
63+
]
64+
65+
[[tool.hatch.envs.all.matrix]]
66+
python = ["3.8", "3.9", "3.10", "3.11"]
67+
68+
[tool.hatch.envs.types]
69+
dependencies = [
70+
"mypy>=1.0.0",
71+
]
72+
[tool.hatch.envs.types.scripts]
73+
check = "mypy --install-types --non-interactive {args:src/bioclip tests}"
74+
75+
[tool.coverage.run]
76+
source_pkgs = ["bioclip", "tests"]
77+
branch = true
78+
parallel = true
79+
omit = [
80+
"src/bioclip/__about__.py",
81+
]
82+
83+
[tool.coverage.paths]
84+
bioclip = ["src/bioclip", "*/bioclip/src/bioclip"]
85+
tests = ["tests", "*/bioclip/tests"]
86+
87+
[tool.coverage.report]
88+
exclude_lines = [
89+
"no cov",
90+
"if __name__ == .__main__.:",
91+
"if TYPE_CHECKING:",
92+
]

src/bioclip/__about__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# SPDX-FileCopyrightText: 2024-present John Bradley <[email protected]>
2+
#
3+
# SPDX-License-Identifier: MIT
4+
__version__ = "0.0.1"

src/bioclip/__init__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# SPDX-FileCopyrightText: 2024-present John Bradley <[email protected]>
2+
#
3+
# SPDX-License-Identifier: MIT
4+
from bioclip.predict import predict_classification, Rank, predict_classifications_from_list
5+
6+
__all__ = ["predict_classification", "Rank", "predict_classifications_from_list"]

src/bioclip/__main__.py

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
"""Usage: bioclip predict [options] IMAGE_FILE
2+
3+
Use BioCLIP to generate predictions for an IMAGE_FILE.
4+
5+
Arguments:
6+
IMAGE_FILE input image file
7+
8+
Options:
9+
-h --help
10+
--format=FORMAT format of the output (table, json, or csv) [default: table]
11+
--rank=RANK rank of the classification (kingdom, phylum, class, order, family, genus, species) [default: species]
12+
--k=K number of top predictions to show [default: 5]
13+
--cls=CLS comma separated list of classes to predict, when specified the --rank and --k arguments are ignored [default: all]
14+
--output=OUTFILE save output to a filename instead of printing it [default: stdout]
15+
16+
"""
17+
from docopt import docopt
18+
from bioclip import predict_classification, predict_classifications_from_list, Rank
19+
import json
20+
import sys
21+
import prettytable as pt
22+
import csv
23+
24+
25+
def write_results(result, format, outfile):
26+
if format == 'table':
27+
table = pt.PrettyTable()
28+
table.field_names = ['Taxon', 'Probability']
29+
for taxon, prob in result.items():
30+
table.add_row([taxon, prob])
31+
outfile.write(str(table))
32+
outfile.write('\n')
33+
elif format == 'json':
34+
json.dump(result, outfile, indent=2)
35+
elif format == 'csv':
36+
writer = csv.writer(outfile)
37+
writer.writerow(['Taxon', 'Probability'])
38+
for taxon, prob in result.items():
39+
writer.writerow([taxon, prob])
40+
else:
41+
raise ValueError(f"Invalid format: {format}")
42+
43+
44+
def main():
45+
# execute only if run as the entry point into the program
46+
x = docopt(__doc__) # parse arguments based on docstring above
47+
format = x['--format']
48+
output = x['--output']
49+
image_file = x['IMAGE_FILE']
50+
cls = x['--cls']
51+
if not format in ['table', 'json', 'csv']:
52+
raise ValueError(f"Invalid format: {format}")
53+
rank = Rank[x['--rank'].upper()]
54+
if cls == 'all':
55+
result = predict_classification(img=image_file,
56+
rank=rank,
57+
k=int(x['--k']))
58+
else:
59+
result = predict_classifications_from_list(img=image_file,
60+
cls_ary=cls.split(','))
61+
outfile = sys.stdout
62+
if output == 'stdout':
63+
write_results(result, format, sys.stdout)
64+
else:
65+
with open(output, 'w') as outfile:
66+
write_results(result, format, outfile)
67+
68+
69+
if __name__ == '__main__':
70+
main()

0 commit comments

Comments
 (0)