Calibre Awards Tagger

Automatically enrich your Calibre eBook library by identifying and tagging books that have won or been shortlisted for major literary awards.

Overview

This tool uses a deterministic approach to award tagging:

It ships with award data up to 2025, but also includes tools
Builds a canonical awards database
Uses fuzzy string matching to correlate your Calibre library against this database
Applies award tags to matched books

Supported Awards:

Sci-Fi/Fantasy: Hugo, Nebula, Locus, Arthur C. Clarke
Literary/General: Booker Prize, International Booker, National Book Award, Scotiabank Giller Prize
Mystery: Agatha Award

Included Award Database

This repository includes a pre-built canonical awards database (data/canonical_awards.csv) with:

Agatha: 437 records (1988–2024)
Arthur C. Clarke: 239 records (1987–2025)
Booker: 470 records (1969–2025)
Giller: 327 records (1994–2025)
Hugo: 363 records (1953–2025)
International Booker: 92 records (2016–2025)
Locus: 2512 records (1978–2025)
National Book Award: 469 records (1950–2024)
Nebula: 376 records (1966–2025)

You can use this database immediately without scraping. To refresh it with the latest awards, run python build_db.py --scrape-all (per-award caching), or selectively update with python build_db.py --scrape hugo,nebula.

Requirements

Python 3.10+
Calibre installed with calibredb command available in PATH
Internet connection (optional - only needed to rebuild the award database; a pre-built database is included)

Installation

Clone this repository:

git clone <repository-url>
cd calibre-awards-tagger

Install dependencies:

pip install -r requirements.txt

Verify Calibre is installed:

calibredb --version

Safety & Backups

Automatic Backup: When applying tags, this tool automatically creates a backup of your Calibre metadata.db file with a timestamp (e.g., metadata.db.backup-20260106_143022). This backup is only a few MB and takes less than a second.

How to Restore: If anything goes wrong:

Close Calibre
Navigate to your Calibre Library folder
Copy metadata.db.backup-TIMESTAMP over metadata.db

Alternative Recovery: Calibre automatically maintains OPF backup files for each book in the book's folder. You can restore the entire database from these backups:

Open Calibre → Preferences → Library Maintenance → Restore database

What This Tool Does: This tool only adds tags to books - it never deletes books, removes existing tags, or modifies other metadata. The risk of data loss is minimal, but we backup metadata.db as a precaution.

Dry-Run Mode: Always use --dry-run first to preview changes before applying them.

Usage

Basic Usage

Run in dry-run mode to see proposed changes without applying them:

python main.py --skip-scrape --dry-run

Apply tags to your Calibre library:

python main.py --skip-scrape

Advanced Options

python main.py [OPTIONS]

Options:
  --dry-run       Show proposed changes without applying them
  --rebuild-db    Rebuild the awards database from scratch (otherwise updates incrementally)
  --skip-scrape   Use existing database without re-scraping (faster for testing)

Examples

# First run: Preview changes using included database
python main.py --skip-scrape --dry-run

# Apply tags to library
python main.py --skip-scrape

# Update database with latest awards and re-tag
python main.py --rebuild-db

# Test matching without scraping again
python main.py --skip-scrape --dry-run

Updating the Database (Optional)

A pre-built database is included, so this is optional. Use it to get the latest award winners:

# Refresh caches and scrape all awards
python build_db.py --scrape-all

# Update just a couple awards and load everything else from cache
python build_db.py --scrape hugo,nebula

# Rebuild from cache only (no network calls)
python build_db.py

This is useful for:

Updating award data after new winners are announced
Running on a schedule (e.g., monthly cron job to stay current)
Contributing updated database back to the repository

The included database already has all awards through 2025.

How It Works

Phase 1: Knowledge Acquisition

Scraping: Fetches award data from Wikipedia pages
Normalization: Cleans and standardizes book titles and author names
Database Building: Creates data/canonical_awards.csv

Phase 2: Library Enrichment

Extraction: Reads your Calibre library metadata
Matching: Uses fuzzy string matching with high thresholds:
- Author matching (95%+ threshold)
- Title matching (90%+ threshold, only after author match)
Tag Generation: Creates tags in format: Award: Hugo [Winner]
Application: Appends tags to matched books

Tag Format

Tags follow this pattern:

Award: Hugo [Winner] - Hugo Award winner
Award: Booker [Shortlist] - Booker Prize shortlist
Award: Agatha, Best First Novel [Nominee] - Agatha Award nominee

Project Structure

calibre-awards-tagger/
├── main.py                 # Main orchestration script
├── build_db.py            # Database building and scraping
├── lib_interface.py       # Calibre library interface
├── matcher.py             # Fuzzy matching engine
├── report.py              # Report generation
├── schema.py              # Data schema definitions
├── author_normalizer.py   # Author name normalization
├── scrapers/              # Web scraping modules
│   ├── __init__.py
│   ├── base_wiki.py       # Base Wikipedia utilities
│   ├── general_fiction.py # General fiction awards
│   ├── scifi.py          # Sci-fi/fantasy awards
│   └── specialized.py     # Specialized awards
├── tests/                 # Test suite
│   ├── test_base_wiki.py
│   ├── test_matcher.py
│   ├── test_author_normalizer.py
│   └── test_integration.py
├── data/                  # Generated data files
│   ├── canonical_awards.csv
│   ├── author_aliases.json
│   └── proposed_changes_*.json
└── logs/                  # Log files

Data Files

data/canonical_awards.csv - Canonical awards database
data/author_aliases.json - Cached author name normalizations
data/proposed_changes_*.json - Reports of proposed tag changes (generated whenever main.py finds matches, in both dry-run and apply modes)
logs/calibre_tagger_*.log - Timestamped execution logs

Testing

Run the test suite:

pytest tests/

Run specific test files:

pytest tests/test_matcher.py
pytest tests/test_integration.py

Configuration

Edit schema.py to:

Add or remove awards
Modify tag format
Adjust award categories

Edit matcher.py to adjust matching thresholds:

AUTHOR_EXACT_THRESHOLD = 95  # Author matching
TITLE_FUZZY_THRESHOLD = 90   # Title matching

Troubleshooting

calibredb not found

Ensure Calibre is installed and calibredb is in your PATH:

macOS: Add /Applications/calibre.app/Contents/MacOS/ to PATH
Linux: Install via package manager
Windows: Add Calibre installation directory to PATH

No matches found

Ensure your Calibre library has books that won awards
Check that author names and titles are accurate in your library
Try lowering matching thresholds in matcher.py

Scraping fails

Check internet connection
Wikipedia structure may have changed (inspect logs)
Use --skip-scrape to use existing database

Incremental Updates

The tool supports incremental updates:

Awards database is cached in data/canonical_awards.csv
Re-running without --rebuild-db adds new records to existing database
Author aliases are cached to avoid redundant normalization

Contributing

When adding new awards:

Add award configuration to schema.py in AWARDS_CONFIG
Create scraper in appropriate module (scrapers/)
Add tests for the new scraper
Update this README

License

This project is for personal use. Award data is sourced from Wikipedia under Creative Commons licenses.

Credits

Built following the blueprint in plan.md. Uses:

BeautifulSoup4 for web scraping
pandas for data processing
rapidfuzz for fuzzy string matching
pytest for testing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Calibre Awards Tagger

Overview

Included Award Database

Requirements

Installation

Safety & Backups

Usage

Basic Usage

Advanced Options

Examples

Updating the Database (Optional)

How It Works

Phase 1: Knowledge Acquisition

Phase 2: Library Enrichment

Tag Format

Project Structure

Data Files

Testing

Configuration

Troubleshooting

calibredb not found

No matches found

Scraping fails

Incremental Updates

Contributing

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
scrapers		scrapers
tests		tests
.gitignore		.gitignore
EFFICIENT_SCRAPING.md		EFFICIENT_SCRAPING.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
author_normalizer.py		author_normalizer.py
build_db.py		build_db.py
lib_interface.py		lib_interface.py
main.py		main.py
matcher.py		matcher.py
plan.md		plan.md
report.py		report.py
requirements.txt		requirements.txt
schema.py		schema.py

Folders and files

Latest commit

History

Repository files navigation

Calibre Awards Tagger

Overview

Included Award Database

Requirements

Installation

Safety & Backups

Usage

Basic Usage

Advanced Options

Examples

Updating the Database (Optional)

How It Works

Phase 1: Knowledge Acquisition

Phase 2: Library Enrichment

Tag Format

Project Structure

Data Files

Testing

Configuration

Troubleshooting

calibredb not found

No matches found

Scraping fails

Incremental Updates

Contributing

License

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages