reliability: implement checkpoint and resume for interrupted runs

If processing crashes at image 800/1246, the next run restarts from zero. For large datasets this wastes significant compute time and is a production reliability gap.

**Scope:**
- Create `internal/checkpoint/checkpoint.go` managing a `checkpoint.json` in the output directory
- Track completed file paths with their output hash
- On engine startup, load existing checkpoint and skip already-processed files
- Atomic checkpoint writes (write to temp file, rename) to prevent corruption on crash
- Add `-no-resume` flag to force full reprocessing

**checkpoint.json shape:**
```json
{
  "version": 1,
  "started_at": "2026-02-18T10:00:00Z",
  "completed": ["images/photo1.jpg", "images/photo2.png"],
  "total_processed": 800
}
```

**Acceptance Criteria:**
- Interrupted run resumes from last checkpoint on restart
- `checkpoint.json` is never left in a corrupted state
- `-no-resume` flag bypasses checkpoint entirely
- Checkpoint file excluded from output metrics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reliability: implement checkpoint and resume for interrupted runs #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

reliability: implement checkpoint and resume for interrupted runs #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions