Skip to content

Commit d79771f

Browse files
committed
DOC: update README.md
1 parent f777140 commit d79771f

File tree

2 files changed

+24
-19
lines changed

2 files changed

+24
-19
lines changed

README.md

Lines changed: 24 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -10,36 +10,43 @@ A PyTorch implementation of BarcodeMAE, a model for enhancing DNA foundation mod
1010

1111
#### Model checkpoint is available here: [BarcodeMAE](https://drive.google.com/file/d/18TqKC_gLYYDZEFfkMBRvWTHTT8Vb74Wv/view?usp=drive_link)
1212

13-
### Reproducing the results
14-
15-
0. Download the checkpoint and copy it to the model_checkpoints directory
16-
1. Run KNN evaluation
17-
18-
```shell
19-
python barcodebert/knn_probing.py \
20-
--run-name knn_evaluation \
21-
--data-dir ./data/ \
22-
--pretrained-checkpoint "./model_checkpoints/best_pretraining.pt"\
23-
--log-wandb \
24-
--dataset BIOSCAN-5M \
25-
```
13+
### Setup
2614

27-
### Pretraining from scratch
28-
29-
0. Clone this repository and install the required libraries by running.
15+
0. Clone this repository
16+
1. Install the required libraries
3017

3118
```shell
19+
pip install -r requirements.txt
3220
pip install -e .
3321
```
3422

23+
### Preparing the data
24+
3525
1. Download the [metadata file](https://drive.google.com/drive/u/0/folders/1TLVw0P4MT_5lPrgjMCMREiP8KW-V4nTb) and copy it into the data folder
3626
2. Split the metadata file into smaller files according to the different partitions as presented in the [BIOSCAN-5M paper](https://arxiv.org/abs/2406.12723)
3727

3828
```shell
3929
cd data/
4030
python data_split.py BIOSCAN-5M_Dataset_metadata.tsv
4131
```
42-
3. Pretrain BarcodeMAE
32+
33+
### Reproducing the results
34+
35+
1. Download the checkpoint and copy it to the model_checkpoints directory
36+
2. Run KNN evaluation
37+
38+
```shell
39+
python barcodebert/knn_probing.py \
40+
--run-name knn_evaluation \
41+
--data-dir ./data/ \
42+
--pretrained-checkpoint "./model_checkpoints/best_pretraining.pt"\
43+
--log-wandb \
44+
--dataset BIOSCAN-5M
45+
```
46+
47+
### Pretraining from scratch
48+
49+
1. Run pretraining
4350

4451
```shell
4552
python barcodebert/pretraining.py \

data/data_split.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
#!/usr/bin/env python
2-
31
import warnings
42

53
import pandas as pd

0 commit comments

Comments
 (0)