Skip to content

Commit 332f724

Browse files
authored
Merge pull request #23 from bioscan-ml/docs/add-contributing
DOC: Add CONTRIBUTING.md
2 parents f9add06 + 7ac7796 commit 332f724

File tree

2 files changed

+152
-0
lines changed

2 files changed

+152
-0
lines changed

CONTRIBUTING.md

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
# Contributing to BarcodeBERT
2+
3+
Thank you for your interest in contributing to BarcodeBERT!
4+
This document covers the standards and process for contributing.
5+
6+
## Getting started
7+
8+
### Prerequisites
9+
10+
- **Python 3.11 or 3.12** (`torchtext` lacks wheels for 3.13+)
11+
- `pip` or [`uv`](https://docs.astral.sh/uv/)
12+
13+
### Local setup
14+
15+
Clone the repository using HTTPS or SSH:
16+
17+
```bash
18+
# HTTPS
19+
git clone https://github.com/bioscan-ml/BarcodeBERT.git
20+
21+
# SSH
22+
git clone git@github.com:bioscan-ml/BarcodeBERT.git
23+
```
24+
25+
Using `pip`, activate a virtual environment first, then:
26+
27+
```bash
28+
cd BarcodeBERT
29+
pip install -e .
30+
```
31+
32+
Or, using `uv` (creates and manages the environment automatically):
33+
34+
```bash
35+
cd BarcodeBERT
36+
uv sync
37+
```
38+
39+
Install the pre-commit hooks:
40+
41+
```bash
42+
pip install pre-commit # or: uv tool install pre-commit
43+
pre-commit install
44+
```
45+
46+
## Code style
47+
48+
Pre-commit hooks enforce all formatting automatically on commit.
49+
You can also run them manually:
50+
51+
```bash
52+
pre-commit run --all-files # full repo
53+
pre-commit run --files <file> ... # specific files
54+
```
55+
56+
Key settings:
57+
58+
| Tool | Config | Notes |
59+
| ------ | ---------------- | --------------------------------- |
60+
| black | `pyproject.toml` | Line length 120 |
61+
| isort | pre-commit | `--profile=black` |
62+
| flake8 | `.flake8` | Line length 140, numpy docstrings |
63+
64+
### Flake8 suppressions
65+
66+
Certain warnings are suppressed project-wide (see `.flake8`):
67+
68+
- **E203**: whitespace before `:` (conflicts with black)
69+
- **E402**: module-level import not at top (lazy imports)
70+
- **E731**: lambda assignments
71+
- **D100–D107**: missing docstrings
72+
73+
## Repository structure
74+
75+
The `barcodebert/` directory is the core Python package.
76+
The editable install (`pip install -e .` or `uv sync`)
77+
makes it importable so that scripts elsewhere in the repo
78+
can use it.
79+
The `baselines/` directory contains standalone evaluation scripts
80+
that are **not** part of the package —
81+
they import from `barcodebert` but are run directly.
82+
83+
## Contribution process
84+
85+
1. **Open an issue** for significant changes
86+
(bug reports, feature proposals, refactors).
87+
2. **Create a feature branch** from `main`
88+
using a conventional prefix
89+
(e.g., `fix/`, `feat/`, `docs/`).
90+
3. **Make your changes** following the code style above.
91+
4. **Run pre-commit** to ensure formatting passes.
92+
5. **Submit a pull request** against `main` with:
93+
- A clear description of the change
94+
- Reference to the related issue (if any)
95+
96+
### Commit messages
97+
98+
This project follows the
99+
[NumPy-style commit message convention](https://numpy.org/doc/stable/dev/development_workflow.html#writing-the-commit-message):
100+
101+
```
102+
PREFIX: Short description
103+
```
104+
105+
Common prefixes
106+
(following [NumPy's list](https://numpy.org/doc/stable/dev/development_workflow.html#writing-the-commit-message)):
107+
108+
| Prefix | Meaning |
109+
| ------ | --------------------------------------- |
110+
| `API` | An (incompatible) API change |
111+
| `BUG` | Bug fix |
112+
| `CI` | Continuous integration |
113+
| `DEP` | Deprecate something, or remove a deprecated object |
114+
| `DEV` | Development tool or utility |
115+
| `DOC` | Documentation |
116+
| `ENH` | Enhancement |
117+
| `MNT` | Maintenance (refactoring, typos, etc.) |
118+
| `REL` | Related to releasing |
119+
| `REV` | Revert an earlier commit |
120+
| `STY` | Style fix (whitespace, PEP 8) |
121+
| `TST` | Addition or modification of tests |
122+
| `TYP` | Static typing |
123+
| `WIP` | Work in progress, do not merge |
124+
125+
### Pull request guidelines
126+
127+
- Keep PRs focused on a single topic.
128+
- Ensure pre-commit checks pass.
129+
- PRs are squash-merged to maintain a clean `main` history.
130+
131+
## Dependency notes
132+
133+
Version constraints in `pyproject.toml` exist
134+
for specific compatibility reasons:
135+
136+
- `datasets>=2.16,<4`:
137+
The Hugging Face dataset uses a custom loading script
138+
removed in `datasets` v4+.
139+
- `numba>=0.59`:
140+
Prevents the `uv` resolver from backtracking
141+
to old `llvmlite` versions incompatible with Python 3.11+.
142+
- `torchtext>=0.15.2`:
143+
Deprecated and archived; constrains Python to <3.13.
144+
145+
See [issue #21](https://github.com/bioscan-ml/BarcodeBERT/issues/21)
146+
for background.

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,12 @@ python barcodebert/pretraining.py
112112
--checkpoint=model_checkpoints/CANADA-1.5M/4_4_4/checkpoint_pretraining.pt
113113
```
114114

115+
## Contributing
116+
117+
If you'd like to contribute to BarcodeBERT,
118+
please read our [Contributing Guidelines](CONTRIBUTING.md)
119+
for information about setup, code style, and submission process.
120+
115121
## Citation
116122

117123
If you find BarcodeBERT useful in your research please consider citing:

0 commit comments

Comments
 (0)