Historical Photograph annotation extraction model benchmark

Preview of results

This project aims to evaluate different AI models' performance in analyzing automated transcriptions to structured data from historical art photographs. It includes tools for processing images, comparing model outputs, updating ground truth, and visualizing results through an interactive web interface.

There is a static version for viewing on github in the directory '/static'. Installing it locally you can update ground truth files to tune model assesment/performance.

Repository: historical-photograph-AI-annotator-benchmark

Project Overview

The system analyzes photographs of pre-1700 artworks, processing both front and back images to extract detailed metadata including:

Artwork details (title, artist, date, inscriptions)
Repository information
Dimensions
Material information
Historical data (exhibitions, provenance, literature)
Photographer details

The benchmark compares different AI models including:

gpt-4o
gpt-4o-mini
o1
Claude 3.5 Sonnet
Gemeni 2.5 Preview

Prerequisites

Python 3.8+
Node.js 18+
npm
API keys for:
- OpenAI (for GPT-4 Vision models)
- Anthropic (for Claude)

Installation

Clone the repository:

git clone git@github.com:lklic/historical-photograph-AI-annotator-benchmark.git
cd historical-photograph-AI-annotator-benchmark

Create and set up API key files:

# For OpenAI API key
echo "your-openai-key" > key.secret

# For Anthropic API key
echo "your-anthropic-key" > claudekey.secret

Install Python dependencies:

pip install anthropic openai httpx

Project Structure

.
├── analysis_script.py      # Main analysis script
├── process_images.py       # Image processing script
├── prompt.txt              # Prompt template for models
├── test-images.md         # List of test image IDs
├── App.jsx               # React frontend application
├── package.json          # Node.js dependencies
├── vite.config.js        # Vite configuration
└── run-benchmark.sh      # Benchmark runner script

Usage

The workflow consists of three main steps:

Generate ground truth files:

python produce_ground_truth.py

This will create ground truth annotations for the test images using Claude 3.5 Sonnet.

Process images with different models:

python process_images.py benchmark prompt.txt

This will process all images in test-images.md with each configured model.

Run the benchmark and start the visualization server:

./run-benchmark.sh

This will:

Set up the web application structure
Run the analysis comparing all model outputs
Start a local development server

Alternative: Processing Individual Images

For processing individual images:

python process_images.py single <model> <image_id> <prompt_file>

Example:

python process_images.py single claude3.5 "32044103326807!32044156028839" prompt.txt

Output Format

The analysis generates:

Individual JSON files for each image analysis
A comprehensive analysis.json file with comparative metrics
A web interface for visualizing results

Web Interface Features

The web interface provides:

Model comparison summary with accuracy and cost metrics
Detailed view of individual image analyses
Side-by-side comparison of model outputs
Interactive image viewer with zoom capability

Configuration

Model Configurations

Models can be configured in process_images.py:

MODEL_CONFIGS = {
    'gpt-4o': ProcessingConfig(
        api_type='openai',
        model='gpt-4o',
        input_cost_per_million=2.5,
        output_cost_per_million=10.0
    ),
    'claude3.5': ProcessingConfig(
        api_type='claude',
        model='claude-3-5-sonnet-20241022',
        input_cost_per_million=3.0,
        output_cost_per_million=15.0
    )
    # ... other models
}

Analysis Parameters

The analysis script (analysis_script.py) includes configurable parameters for:

Field comparison logic
Metrics calculation
Output formatting

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
__pycache__		__pycache__
benchmark_data		benchmark_data
ground_truth/output		ground_truth/output
node_modules		node_modules
static		static
web		web
.DS_Store		.DS_Store
.gitignore		.gitignore
App.jsx		App.jsx
App.static.jsx		App.static.jsx
README.md		README.md
analysis (1).json		analysis (1).json
analysis_script.py		analysis_script.py
build-static.sh		build-static.sh
models_config.py		models_config.py
package.json		package.json
process_images.py		process_images.py
produce_ground_truth.py		produce_ground_truth.py
prompt.txt		prompt.txt
run-benchmark.sh		run-benchmark.sh
server.js		server.js
static-build-README.md		static-build-README.md
test-images.md		test-images.md
vite.config.js		vite.config.js
vite.static.config.js		vite.static.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Historical Photograph annotation extraction model benchmark

Preview of results

Project Overview

Prerequisites

Installation

Project Structure

Usage

Alternative: Processing Individual Images

Output Format

Web Interface Features

Configuration

Model Configurations

Analysis Parameters

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Historical Photograph annotation extraction model benchmark

Preview of results

Project Overview

Prerequisites

Installation

Project Structure

Usage

Alternative: Processing Individual Images

Output Format

Web Interface Features

Configuration

Model Configurations

Analysis Parameters

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages