Skip to content

Commit 3be3c5c

Browse files
committed
refactor: rename EDGE_CASE to OTHER_EYE_DATA, clean training data, bump to 0.1.2
- Rename EDGE_CASE label to OTHER_EYE_DATA across source, tests, and docs - Rename prob_edge output key to prob_other_eye - Clean and expand training data from 452 to 474 examples - Add Zenodo classification results to README
1 parent 03305ee commit 3be3c5c

File tree

8 files changed

+46
-24
lines changed

8 files changed

+46
-24
lines changed

CHANGELOG.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,20 @@
11
# Changelog
22

3+
## [0.1.2] - 2026-03-17
4+
5+
### Changed
6+
7+
- Renamed `EDGE_CASE` class to `OTHER_EYE_DATA`
8+
- Renamed `prob_edge` output key to `prob_other_eye`
9+
- Training data cleaned and expanded to 474 examples
10+
- Improved spot-check accuracy to 87.9% (29/33)
11+
312
## [0.1.0] - 2026-03-03
413

514
### Added
615

716
- Initial beta scaffold
8-
- 4-class SetFit classifier (EYE_IMAGING, EYE_SOFTWARE, EDGE_CASE, NEGATIVE)
17+
- 4-class SetFit classifier (EYE_IMAGING, EYE_SOFTWARE, OTHER_EYE_DATA, NEGATIVE)
918
- CLI with `classify`, `train`, and `info` commands
1019
- Auto-download of model weights from HuggingFace
1120
- Batch classification support

README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,11 +67,24 @@ envision-classifier info
6767
## Model
6868

6969
- **Base model**: `sentence-transformers/all-mpnet-base-v2` (768-dim)
70-
- **Training data**: 474 curated examples (77 EYE_IMAGING, 48 EYE_SOFTWARE, 79 EDGE_CASE, 270 NEGATIVE)
70+
- **Training data**: 474 curated examples (77 EYE_IMAGING, 48 EYE_SOFTWARE, 79 OTHER_EYE_DATA, 270 NEGATIVE)
7171
- **Test accuracy**: 0.937, **macro F1**: 0.902
7272
- **Spot-check**: 29/33 (87.9%)
7373
- **Model weights**: [fairdataihub/envision-eye-imaging-classifier](https://huggingface.co/fairdataihub/envision-eye-imaging-classifier)
7474

75+
## Zenodo Classification Results
76+
77+
Applied to 515 Zenodo dataset records via [envision-discovery](https://github.com/EyeACT/envision-discovery):
78+
79+
| Class | Count |
80+
|-------|-------|
81+
| EYE_IMAGING | 120 |
82+
| EYE_SOFTWARE | 66 |
83+
| OTHER_EYE_DATA | 3 |
84+
| NEGATIVE | 325 |
85+
86+
Classification is based on metadata only (titles, descriptions, keywords, and file types inspected inside archives via HTTP Range requests) — no dataset files are downloaded.
87+
7588
## Related
7689

7790
- [envision-discovery](https://github.com/EyeACT/envision-discovery) -- Full pipeline (scraping + classification + export)

docs/modules/classifier.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ print(result)
2121
|-------|-------------|
2222
| `EYE_IMAGING` | Actual eye imaging datasets (fundus, OCT, OCTA, cornea, etc.) |
2323
| `EYE_SOFTWARE` | Code, tools, models for eye imaging (no actual data) |
24-
| `EDGE_CASE` | Eye research papers, reviews, borderline items |
24+
| `OTHER_EYE_DATA` | Eye research papers, reviews, non-imaging data |
2525
| `NEGATIVE` | Unrelated domains |
2626

2727
### Batch Classification

envision_classifier/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
A 4-class SetFit classifier for detecting eye imaging datasets:
55
- EYE_IMAGING: Actual eye imaging datasets (fundus, OCT, OCTA, etc.)
66
- EYE_SOFTWARE: Code, models, tools for eye imaging
7-
- EDGE_CASE: Eye research papers, reviews, borderline items
7+
- OTHER_EYE_DATA: Eye research papers, reviews, non-imaging data
88
- NEGATIVE: Unrelated domains
99
1010
Usage:
@@ -14,7 +14,7 @@
1414
{'label': 'EYE_IMAGING', 'confidence': 0.999, 'probabilities': {...}}
1515
"""
1616

17-
__version__ = "0.1.1"
17+
__version__ = "0.1.2"
1818
__author__ = "James O'Neill"
1919

2020
from .classifier import EyeImagingClassifier, LABELS

envision_classifier/classifier.py

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
Uses sentence-transformers/all-mpnet-base-v2 sentence transformer with 4-class classification:
66
- 3: EYE_IMAGING - Actual eye imaging datasets (fundus, OCT, OCTA, cornea, etc.)
77
- 2: EYE_SOFTWARE - Code, tools, models for eye imaging (no actual data)
8-
- 1: EDGE_CASE - Eye research (papers, reviews, non-imaging data)
8+
- 1: OTHER_EYE_DATA - Eye research (papers, reviews, non-imaging data)
99
- 0: NEGATIVE - Not eye-related at all
1010
"""
1111

@@ -22,7 +22,7 @@
2222
# Model configuration
2323
BASE_MODEL_NAME = "sentence-transformers/all-mpnet-base-v2"
2424
HF_MODEL_REPO = "fairdataihub/envision-eye-imaging-classifier"
25-
LABELS = ["NEGATIVE", "EDGE_CASE", "EYE_SOFTWARE", "EYE_IMAGING"]
25+
LABELS = ["NEGATIVE", "OTHER_EYE_DATA", "EYE_SOFTWARE", "EYE_IMAGING"]
2626

2727
# ============================================================
2828
# TRAINING DATA - Curated examples for few-shot learning
@@ -111,7 +111,7 @@
111111
]
112112

113113
# EYE_SOFTWARE (label=2): Code, tools, models for eye imaging (NOT actual data)
114-
# Added: misplaced software from EYE_IMAGING + EDGE_CASE, spot-check examples
114+
# Added: misplaced software from EYE_IMAGING + OTHER_EYE_DATA, spot-check examples
115115
EYE_SOFTWARE_EXAMPLES = [
116116
"linchundan88/Fundus-image-preprocessing: fundus image preprocessing Python code",
117117
"NIH-NEI/oct-image-segmentation-models: v0.8.2 trained model weights",
@@ -150,7 +150,7 @@
150150
"ResNet-50 classifiers and diffusion models trained on retinal fundus images",
151151
"AMikroulis/octopus OCT image processing dataset",
152152
"anithaj17/RetinoNet-DR-Classification fundus image dataset",
153-
# Moved from EDGE_CASE (clearly software/tools)
153+
# Moved from OTHER_EYE_DATA (clearly software/tools)
154154
"Python package for retinal image preprocessing",
155155
"Deep learning framework for fundus image segmentation code only",
156156
"OCT image reconstruction algorithm implementation",
@@ -166,9 +166,9 @@
166166
"Flexible corneal neurotechnology reveals in-vivo pathological retinal oscillations recording device",
167167
]
168168

169-
# EDGE_CASE (label=1): Eye/vision research but NOT actual imaging datasets
169+
# OTHER_EYE_DATA (label=1): Eye/vision research but NOT actual imaging datasets
170170
# Cleaned: removed misplaced software→EYE_SOFTWARE, non-eye→NEGATIVE; added eye metabolomics
171-
EDGE_CASE_EXAMPLES = [
171+
OTHER_EYE_DATA_EXAMPLES = [
172172
"A Review of Deep Learning Methods for Diabetic Retinopathy Detection",
173173
"Survey of Machine Learning Techniques for Glaucoma Diagnosis",
174174
"Advances in Optical Coherence Tomography Technology Review Article",
@@ -257,7 +257,7 @@
257257
]
258258

259259
# NEGATIVE (label=0): Clearly not eye-related
260-
# Added: non-eye medical imaging from EDGE_CASE, spot-check confounders
260+
# Added: non-eye medical imaging from OTHER_EYE_DATA, spot-check confounders
261261
NEGATIVE_EXAMPLES = [
262262
"Climate change impact on coral reef ecosystems dataset",
263263
"COVID-19 genome sequencing and variant analysis",
@@ -492,7 +492,7 @@
492492
"Dataset_1 of AF driver detection in pulmonary vein area cardiac arrhythmia",
493493
"Data from Dichoptic metacontrast masking functions to infer transmission delay",
494494
"IRIS Carbon Mapping Project Curated Dataset carbon emissions",
495-
# Moved from EDGE_CASE (non-eye medical imaging — clearly NEGATIVE)
495+
# Moved from OTHER_EYE_DATA (non-eye medical imaging — clearly NEGATIVE)
496496
"Brain MRI analysis for Alzheimer's disease detection",
497497
"Cardiac CT angiography for coronary artery disease",
498498
"Dermatology skin lesion classification dataset",
@@ -503,7 +503,7 @@
503503
"Ultrasound imaging for liver disease assessment",
504504
"PET scan analysis for neurological disorders",
505505
"Spine MRI for degenerative disc disease",
506-
# Moved from EDGE_CASE (non-eye OCT — clearly NEGATIVE)
506+
# Moved from OTHER_EYE_DATA (non-eye OCT — clearly NEGATIVE)
507507
"OCT for industrial material inspection dataset",
508508
"Optical coherence tomography in dermatology skin imaging",
509509
"OCT imaging of atherosclerotic plaque in arteries",
@@ -545,7 +545,7 @@ class EyeImagingClassifier:
545545
Classifies metadata records into 4 classes:
546546
- EYE_IMAGING: Actual eye imaging datasets (fundus, OCT, OCTA, etc.)
547547
- EYE_SOFTWARE: Code, tools, models for eye imaging (no actual data)
548-
- EDGE_CASE: Eye research papers, reviews, borderline items
548+
- OTHER_EYE_DATA: Eye research papers, reviews, borderline items
549549
- NEGATIVE: Unrelated domains
550550
551551
Usage:
@@ -679,7 +679,7 @@ def _predict_batch(self, texts):
679679
else:
680680
pred_int = {
681681
"NEGATIVE": 0,
682-
"EDGE_CASE": 1,
682+
"OTHER_EYE_DATA": 1,
683683
"EYE_SOFTWARE": 2,
684684
"EYE_IMAGING": 3,
685685
}.get(str(pred), 0)
@@ -692,7 +692,7 @@ def _predict_batch(self, texts):
692692
"confidence": float(max(probs)),
693693
"probabilities": {
694694
"NEGATIVE": float(probs[0]),
695-
"EDGE_CASE": float(probs[1]),
695+
"OTHER_EYE_DATA": float(probs[1]),
696696
"EYE_SOFTWARE": float(probs[2]),
697697
"EYE_IMAGING": float(probs[3]),
698698
},
@@ -733,13 +733,13 @@ def train(cls, output_dir=None, device=None, base_model_name=None,
733733
train_texts = (
734734
EYE_IMAGING_EXAMPLES
735735
+ EYE_SOFTWARE_EXAMPLES
736-
+ EDGE_CASE_EXAMPLES
736+
+ OTHER_EYE_DATA_EXAMPLES
737737
+ NEGATIVE_EXAMPLES
738738
)
739739
train_labels = (
740740
[3] * len(EYE_IMAGING_EXAMPLES)
741741
+ [2] * len(EYE_SOFTWARE_EXAMPLES)
742-
+ [1] * len(EDGE_CASE_EXAMPLES)
742+
+ [1] * len(OTHER_EYE_DATA_EXAMPLES)
743743
+ [0] * len(NEGATIVE_EXAMPLES)
744744
)
745745

envision_classifier/cli.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ def info():
7676
LABELS,
7777
EYE_IMAGING_EXAMPLES,
7878
EYE_SOFTWARE_EXAMPLES,
79-
EDGE_CASE_EXAMPLES,
79+
OTHER_EYE_DATA_EXAMPLES,
8080
NEGATIVE_EXAMPLES,
8181
)
8282

@@ -86,5 +86,5 @@ def info():
8686
click.echo(f"Labels: {', '.join(LABELS)}")
8787
click.echo(f"Training data: {len(EYE_IMAGING_EXAMPLES)} eye_imaging, "
8888
f"{len(EYE_SOFTWARE_EXAMPLES)} eye_software, "
89-
f"{len(EDGE_CASE_EXAMPLES)} edge_case, "
89+
f"{len(OTHER_EYE_DATA_EXAMPLES)} other_eye_data, "
9090
f"{len(NEGATIVE_EXAMPLES)} negative")

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[tool.poetry]
22

33
name = "envision-classifier"
4-
version = "0.1.1"
4+
version = "0.1.2"
55
description = "Few-shot classifier for detecting eye imaging datasets"
66

77
packages = [{ include = "envision_classifier" }]

tests/test_classifier.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
LABELS,
55
EYE_IMAGING_EXAMPLES,
66
EYE_SOFTWARE_EXAMPLES,
7-
EDGE_CASE_EXAMPLES,
7+
OTHER_EYE_DATA_EXAMPLES,
88
NEGATIVE_EXAMPLES,
99
)
1010

@@ -18,7 +18,7 @@ def test_labels():
1818
def test_training_data_not_empty():
1919
assert len(EYE_IMAGING_EXAMPLES) > 0
2020
assert len(EYE_SOFTWARE_EXAMPLES) > 0
21-
assert len(EDGE_CASE_EXAMPLES) > 0
21+
assert len(OTHER_EYE_DATA_EXAMPLES) > 0
2222
assert len(NEGATIVE_EXAMPLES) > 0
2323

2424

0 commit comments

Comments
 (0)