Skip to content

Commit f9add06

Browse files
authored
Merge pull request #22 from bioscan-ml/fix/pin-datasets-version
fix: pin dependency versions for Python and datasets compatibility
2 parents 35bd3b4 + 0b58505 commit f9add06

File tree

3 files changed

+18
-2
lines changed

3 files changed

+18
-2
lines changed

README.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,10 +37,21 @@ features = output.mean(1)
3737

3838
### Reproducing the results from the paper
3939

40-
0. Clone this repository and install the required libraries by running
40+
0. Clone this repository and install the required libraries.
41+
The instructions below assume a working `pip`- or
42+
[`uv`](https://docs.astral.sh/uv/)-managed Python environment.
43+
Requires **Python 3.11 or 3.12**
44+
(`torchtext`, a deprecated dependency,
45+
does not provide wheels for 3.13+).
46+
See [`pyproject.toml`](pyproject.toml) for the full list of
47+
pinned dependencies.
4148
```shell
4249
pip install -e .
4350
```
51+
Or, using `uv`:
52+
```shell
53+
uv sync
54+
```
4455

4556
1. Download the data from our Hugging Face Dataset [repository](https://huggingface.co/datasets/bioscan-ml/CanadianInvertebrates-ML)
4657
```shell

data/download_HF_CanInv.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
from datasets import load_dataset
22

3+
# trust_remote_code=True is required because the HF dataset uses a custom
4+
# loading script. This requires datasets<4 (v4+ removed script support).
5+
# See: https://github.com/bioscan-ml/BarcodeBERT/issues/21
36
ds = load_dataset("bioscan-ml/CanadianInvertebrates-ML", trust_remote_code=True)
47
print("Formatting the data into CSV files ...")
58

pyproject.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,16 @@ authors = [
99
]
1010
description = "repository for training and evaluating BarcodeBERT"
1111
readme = "README.md"
12-
requires-python = ">=3.11"
12+
requires-python = ">=3.11,<3.13"
1313
dependencies = [
1414
"accelerate>=0.23.0",
1515
"boto3>=1.28.51",
1616
"botocore>=1.31.51",
1717
"cmake>=3.27.5",
18+
"datasets>=2.16,<4",
1819
"einops>=0.6.1",
1920
"matplotlib",
21+
"numba>=0.59",
2022
"numpy>=1.25.0",
2123
"omegaconf>=2.3.0",
2224
"opt-einsum>=3.3.0",

0 commit comments

Comments
 (0)