Automatically enrich your Calibre eBook library by identifying and tagging books that have won or been shortlisted for major literary awards.
This tool uses a deterministic approach to award tagging:
- It ships with award data up to 2025, but also includes tools
- Builds a canonical awards database
- Uses fuzzy string matching to correlate your Calibre library against this database
- Applies award tags to matched books
Supported Awards:
- Sci-Fi/Fantasy: Hugo, Nebula, Locus, Arthur C. Clarke
- Literary/General: Booker Prize, International Booker, National Book Award, Scotiabank Giller Prize
- Mystery: Agatha Award
This repository includes a pre-built canonical awards database (data/canonical_awards.csv) with:
- Agatha: 437 records (1988–2024)
- Arthur C. Clarke: 239 records (1987–2025)
- Booker: 470 records (1969–2025)
- Giller: 327 records (1994–2025)
- Hugo: 363 records (1953–2025)
- International Booker: 92 records (2016–2025)
- Locus: 2512 records (1978–2025)
- National Book Award: 469 records (1950–2024)
- Nebula: 376 records (1966–2025)
You can use this database immediately without scraping. To refresh it with the latest awards, run python build_db.py --scrape-all (per-award caching), or selectively update with python build_db.py --scrape hugo,nebula.
- Python 3.10+
- Calibre installed with
calibredbcommand available in PATH - Internet connection (optional - only needed to rebuild the award database; a pre-built database is included)
- Clone this repository:
git clone <repository-url>
cd calibre-awards-tagger- Install dependencies:
pip install -r requirements.txt- Verify Calibre is installed:
calibredb --versionAutomatic Backup: When applying tags, this tool automatically creates a backup of your Calibre metadata.db file with a timestamp (e.g., metadata.db.backup-20260106_143022). This backup is only a few MB and takes less than a second.
How to Restore: If anything goes wrong:
- Close Calibre
- Navigate to your Calibre Library folder
- Copy
metadata.db.backup-TIMESTAMPovermetadata.db
Alternative Recovery: Calibre automatically maintains OPF backup files for each book in the book's folder. You can restore the entire database from these backups:
- Open Calibre → Preferences → Library Maintenance → Restore database
What This Tool Does: This tool only adds tags to books - it never deletes books, removes existing tags, or modifies other metadata. The risk of data loss is minimal, but we backup metadata.db as a precaution.
Dry-Run Mode: Always use --dry-run first to preview changes before applying them.
Run in dry-run mode to see proposed changes without applying them:
python main.py --skip-scrape --dry-runApply tags to your Calibre library:
python main.py --skip-scrapepython main.py [OPTIONS]
Options:
--dry-run Show proposed changes without applying them
--rebuild-db Rebuild the awards database from scratch (otherwise updates incrementally)
--skip-scrape Use existing database without re-scraping (faster for testing)# First run: Preview changes using included database
python main.py --skip-scrape --dry-run
# Apply tags to library
python main.py --skip-scrape
# Update database with latest awards and re-tag
python main.py --rebuild-db
# Test matching without scraping again
python main.py --skip-scrape --dry-runA pre-built database is included, so this is optional. Use it to get the latest award winners:
# Refresh caches and scrape all awards
python build_db.py --scrape-all
# Update just a couple awards and load everything else from cache
python build_db.py --scrape hugo,nebula
# Rebuild from cache only (no network calls)
python build_db.pyThis is useful for:
- Updating award data after new winners are announced
- Running on a schedule (e.g., monthly cron job to stay current)
- Contributing updated database back to the repository
The included database already has all awards through 2025.
- Scraping: Fetches award data from Wikipedia pages
- Normalization: Cleans and standardizes book titles and author names
- Database Building: Creates
data/canonical_awards.csv
- Extraction: Reads your Calibre library metadata
- Matching: Uses fuzzy string matching with high thresholds:
- Author matching (95%+ threshold)
- Title matching (90%+ threshold, only after author match)
- Tag Generation: Creates tags in format:
Award: Hugo [Winner] - Application: Appends tags to matched books
Tags follow this pattern:
Award: Hugo [Winner]- Hugo Award winnerAward: Booker [Shortlist]- Booker Prize shortlistAward: Agatha, Best First Novel [Nominee]- Agatha Award nominee
calibre-awards-tagger/
├── main.py # Main orchestration script
├── build_db.py # Database building and scraping
├── lib_interface.py # Calibre library interface
├── matcher.py # Fuzzy matching engine
├── report.py # Report generation
├── schema.py # Data schema definitions
├── author_normalizer.py # Author name normalization
├── scrapers/ # Web scraping modules
│ ├── __init__.py
│ ├── base_wiki.py # Base Wikipedia utilities
│ ├── general_fiction.py # General fiction awards
│ ├── scifi.py # Sci-fi/fantasy awards
│ └── specialized.py # Specialized awards
├── tests/ # Test suite
│ ├── test_base_wiki.py
│ ├── test_matcher.py
│ ├── test_author_normalizer.py
│ └── test_integration.py
├── data/ # Generated data files
│ ├── canonical_awards.csv
│ ├── author_aliases.json
│ └── proposed_changes_*.json
└── logs/ # Log files
data/canonical_awards.csv- Canonical awards databasedata/author_aliases.json- Cached author name normalizationsdata/proposed_changes_*.json- Reports of proposed tag changes (generated whenevermain.pyfinds matches, in both dry-run and apply modes)logs/calibre_tagger_*.log- Timestamped execution logs
Run the test suite:
pytest tests/Run specific test files:
pytest tests/test_matcher.py
pytest tests/test_integration.pyEdit schema.py to:
- Add or remove awards
- Modify tag format
- Adjust award categories
Edit matcher.py to adjust matching thresholds:
AUTHOR_EXACT_THRESHOLD = 95 # Author matching
TITLE_FUZZY_THRESHOLD = 90 # Title matchingEnsure Calibre is installed and calibredb is in your PATH:
- macOS: Add
/Applications/calibre.app/Contents/MacOS/to PATH - Linux: Install via package manager
- Windows: Add Calibre installation directory to PATH
- Ensure your Calibre library has books that won awards
- Check that author names and titles are accurate in your library
- Try lowering matching thresholds in
matcher.py
- Check internet connection
- Wikipedia structure may have changed (inspect logs)
- Use
--skip-scrapeto use existing database
The tool supports incremental updates:
- Awards database is cached in
data/canonical_awards.csv - Re-running without
--rebuild-dbadds new records to existing database - Author aliases are cached to avoid redundant normalization
When adding new awards:
- Add award configuration to
schema.pyinAWARDS_CONFIG - Create scraper in appropriate module (
scrapers/) - Add tests for the new scraper
- Update this README
This project is for personal use. Award data is sourced from Wikipedia under Creative Commons licenses.
Built following the blueprint in plan.md. Uses:
- BeautifulSoup4 for web scraping
- pandas for data processing
- rapidfuzz for fuzzy string matching
- pytest for testing