Robust Classification under Noisy Labels: A Geometry-Aware Reliability Framework for Foundation Models
This repository provides a unified, extensible framework for evaluating and comparing multiple reliability estimation methods on foundational vision model embeddings (e.g., CLIP‑ViT, DINOv2) across benchmark datasets.
-
Backbone Support: CLIP‑ViT/32 and DINOv2‑base
-
Datasets: CIFAR‑10, STL‑10, DermaMNIST (easily add more)
-
Reliability Metrics:
- k‑NN confidence
- NNK graph diameter ratio (D/Dc)
- Per‑class NNK geometric reliability
- Ensemble NNK reliability (subsample aggregation)
- Ensemble‑classification NNK (max‑support voting)
- ANN & WANN baselines
- NNK‑Means & k‑Means clustering reliability
-
Two‑Stage Pipeline:
- Compute per‑sample reliability score
- Classification weighted by reliability
-
Noise Robustness: inject symmetric and asymmetric noise
-
Visualization: accuracy plots
-
Clone this repo
git clone <repository_url>
-
Create & activate a virtualenv
python3 -m venv venv source venv/bin/activate -
Install dependencies
pip install -r requirements.txt
-
Run once to auto-download datasets (CIFAR‑10, STL‑10, DermaMNIST)
# Benchmark on CIFAR-10 with CLIP and DINO
python run_benchmark.py \
--dataset cifar10 --models clip dino \
--n-per-class 100 --batch-size 32 \
--nnk-K 15 --nnk-chunk 64 \
--noise-ratio 0.2 --vote-mode weighted \
--outfile results.csvpython run_benchmark.py \
--dataset cifar10 stl10 dermamnist \
--models clip dino \
--n-per-class 100 \
--batch-size 32 \
--nnk-K 15 \
--nnk-chunk 64 \
--no-cache # force re-extraction
--noise-ratio 0.2 \
--vote-mode unweighted \
--outfile results.csv| Flag | Description |
|---|---|
--dataset |
cifar10, stl10, dermamnist |
--models |
clip, dino |
--n-per-class |
Samples per class for quick tests |
--batch-size |
Backbone forward batch size |
--nnk-K |
NNK neighbor count |
--nnk-chunk |
Chunk size for feature extraction |
--no-cache |
Skip loading cached features |
--noise-ratio |
Label noise fraction (0.0–0.4) |
--vote-mode |
weighted or unweighted voting |
--outfile |
CSV file to append summary results |
After each run, per-model folders like nnk_outputs_cifar10_CLIP-ViT/ contain:
acc_nnk.npy
y_pred.npy
reliability.npy
reliability_alt.npy
reliability_ens.npy
reliability_nnkmeans.npy
reliability_kmeans.npy
W_te_mean.npy
W_te_std.npy
err_te_mean.npy
err_te_std.npy
ind_te.npy
The summary CSV (results.csv) is appended with a new row per model.
Use the helper script for quick analysis:
python visualize_nnk_features.py \
--dataset cifar10 stl10 dermamnist \
--model CLIP-ViT DINOv2 \
--out nnk_vis_plots- Fork & clone
- Create a branch (
git checkout -b feature/my-feature) - Commit changes (
git commit -m 'Add my feature') - Push (
git push origin feature/my-feature) - Open a Pull Request
Bozkurt, Ecem, and Antonio Ortega. "Robust Classification under Noisy Labels: A Geometry-Aware Reliability Framework for Foundation Models." arXiv preprint arXiv:2508.00202 (2025).
