Mind the Gap: Evaluating Vision Systems in Small Data Applications

By Samuel Stevens, S M Rayeed, and Jenna Kline.

Code to reproduce our findings in Mind the Gap: Evaluating Vision Systems in Small Data Applications.

We looked a lot of recent AI methods papers (DINOv2, Gemini Flash 1.5, Claude Sonnet 3.7, V-JEPA, etc) and measured how many training samples were used in each reported evaluation task. We found that no papers use any tasks between 100 and 1K training samples.


Image Credit: arxiv.org/pdf/2504.06486

We decided to use NeWT to evaluate recent AI methods in this regime of 100-1K training samples and reported our findings.

If you want our raw data, you can download it: data/results.sqlite.gz (96 MB) and data/existing-work.csv (5 KB). Unzip the sqlite3 file and move it to results/, then run the below scripts.

To reproduce our work, follow the instructions below:

With uv

uv run benchmark.py --help

This will run the benchmark script, which will force a .venv to be created.

Set the $NFS and $USER variables or edit the configs in configs/*.toml to point towards your NeWt data. You can download the data using:

uv run mindthegap/download.py --help

Then run

uv run benchmark.py --cfg configs/mllms.toml

This will create the results database and check if you have any results (you probably don't). It will report what jobs it has to run.

To actually run the jobs, use:

uv run benchmark.py --cfg configs/mllms.toml --no-dry-run

To recreate our figures, run the notebookes/figures.py notebook with marimo.

Name		Name	Last commit message	Last commit date
Latest commit History 221 Commits
configs		configs
data		data
docs		docs
mindthegap		mindthegap
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
.nojekyll		.nojekyll
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
justfile		justfile
logbook.md		logbook.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mind the Gap: Evaluating Vision Systems in Small Data Applications

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

samuelstevens/mindthegap

Folders and files

Latest commit

History

Repository files navigation

Mind the Gap: Evaluating Vision Systems in Small Data Applications

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages