Automated Validation Framework for Conservation AI
Bio-Sentinel is a standardised, open-source test framework that prevents regression and ensures model robustness across environmental edge cases for conservation AI systems — camera-trap detectors, bioacoustic classifiers, and more.
Conservation models like MegaDetector ship with aggregate "Average Precision" scores, but a field biologist needs to know: "Does this version perform 10 % worse in heavy rain than the last one?"
Bio-Sentinel answers that question automatically, on every commit.
┌─────────────────────────────────────────────────┐
│ Bio-Sentinel │
├──────────┬──────────────┬───────────────────────┤
│ Models │ Distorters │ Datasets │
│ (plugin │ (plugin │ golden/ │
│ wrappers) wrappers) │ synthetic fallback │
├──────────┴──────────────┴───────────────────────┤
│ Test Pyramid (pytest) │
│ L1 Unit — tensor shapes, contracts │
│ L2 Regression — golden dataset baselines │
│ L3 Robustness — rain, fog, low-light, etc. │
│ L4 Edge Cases — iWildCam, hard sets │
├─────────────────────────────────────────────────┤
│ Reporting: pytest-html │ CI: GitHub Actions │
└─────────────────────────────────────────────────┘
# 1. Clone
git clone https://github.com/isaksmith/Bio-Sentinel.git
cd Bio-Sentinel
# 2. Create a virtual environment
python -m venv .venv && source .venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Run the full test suite
pytest
# HTML report will be at reports/report.htmlCompare models under environmental distortions with a single command:
# Quick demo with mock model
python -m bio_sentinel compare --models mock
# Compare real MegaDetector versions (requires requirements-models.txt)
python -m bio_sentinel compare \
--models mdv5a,mdv6-yolov9c \
--dataset data/golden \
--output reports/comparison.json
# List available model keys
python -m bio_sentinel list-modelsOutput:
Model: MockMegaDetector (v0.1.0-mock)
[PASS] baseline mean=0.878 min=0.783
[PASS] rain@0.5 mean=0.889 min=0.812
[PASS] fog@0.5 mean=0.895 min=0.838
[PASS] low_light@0.5 mean=0.920 min=0.920
[PASS] occlusion@0.5 mean=0.874 min=0.775
bio_sentinel/
├── core/ # ABC, Prediction dataclass, plugin registry
├── distorters/ # Rain, fog, low-light, occlusion plugins
├── models/ # Model wrappers (mock + MegaDetector v5/v6/v6-MIT/v6-Apache)
├── datasets/ # Golden dataset loader + synthetic generator
├── reporting/ # pytest-html hooks + JSON comparison reports
└── cli.py # Command-line interface
tests/
├── level1_unit/ # Tensor shapes, prediction contracts
├── level2_regression/ # Golden dataset baseline checks
├── level3_robustness/ # Environmental distortion smoke tests
├── level4_edge_cases/ # Hard-dataset integration (placeholders)
└── phase2_integration/ # CLI, report builder, real model tests
docs/ # Architecture docs & contribution guides
examples/ # Quick-start demo script
- Subclass
bio_sentinel.core.ConservationModel - Implement
name,version, andpredict(image) -> Prediction - Register it:
ModelRegistry.register("my_model", MyModel) - Write tests or reuse the existing parametrised suite
- Subclass
bio_sentinel.distorters.BaseDistorter - Implement
nameandapply(image) -> image - Register it:
DistorterRegistry.register("my_distortion", MyDistorter)
Run specific test levels:
pytest -m unit # Level 1 only
pytest -m regression # Level 2 only
pytest -m robustness # Level 3 only
pytest -m edge_case # Level 4 only- Architecture — design overview, data flow, and key decisions
- Adding a Model — step-by-step guide to wrapping a new model
- Adding a Distorter — how to create a new environmental plugin
- Contributing — how to report bugs, suggest features, and submit PRs
| Phase | Focus | Status |
|---|---|---|
| 1 | Core engine — distorters, mock model, test pyramid | ✅ Done |
| 2 | Real model wrappers (MegaDetector v5/v6/v6-MIT/v6-Apache), JSON comparison CLI | ✅ Done |
| 3 | Open-source launch, docs, pip-installable package | ✅ Current |
MIT — see LICENSE for details.