Skip to content

Auto Metadata Fetch System

Jo McMillan edited this page Nov 9, 2025 · 2 revisions

🏷️ Auto-Metadata Fetch System

Currently only in :dev version

The Auto-Metadata Fetch System automatically enriches newly ingested books with comprehensive metadata from multiple online sources, ensuring your library has complete and accurate book information.

🌟 Overview

When books are added to your Calibre-Web Automated library, the Auto-Metadata Fetch System automatically searches multiple metadata providers to find and apply detailed information including titles, authors, descriptions, covers, publication dates, and more. This eliminates the need for manual metadata entry and ensures consistent, high-quality book information.

⚙️ How It Works

  1. 📚 Book Detection: System identifies newly ingested books with incomplete or poor metadata
  2. 🔍 Provider Search: Searches configured metadata providers in priority order
  3. 📝 Metadata Application: Applies found metadata based on administrator-configured rules
  4. ⭐ Quality Enhancement: Improves book discoverability and organization

🔧 Administrator Configuration

🔄 Enabling Auto-Metadata Fetch

⚠️ Important: Metadata fetching is controlled entirely by server administrators. Users cannot enable or disable this feature individually.

Administrators can enable the system in CWA Settings:

  1. Navigate to CWA Settings
  2. Check "Enable Auto-Metadata Fetch"
  3. Configure Smart Metadata Application (optional)
  4. Set up Metadata Provider Hierarchy
  5. Save settings

🧠 Smart Metadata Application

The system offers two metadata application modes with granular field control:

🎯 Direct Replacement (Default)

  • ⚡ Behavior: Takes metadata from the preferred provider exactly as provided
  • 💭 Philosophy: "Just take the metadata that comes from the preferred metadata provider as is"
  • 🎯 Use Case: When you trust your primary provider and want consistent results
  • 📋 Result: Complete replacement of existing metadata with provider data for selected fields

🤖 Smart Application (Optional)

  • 🧠 Behavior: Applies intelligent criteria when replacing existing metadata
  • 💭 Philosophy: Only improve metadata when the new data is demonstrably better
  • 🎯 Use Case: When you want to preserve good existing metadata while enhancing poor data
  • 📏 Criteria: Applied only to selected fields:
    • 📖 Titles: Only replace if new title is longer/more descriptive
    • 📝 Descriptions: Only replace if new description is longer/more detailed
    • 🏢 Publishers: Only replace if current publisher field is empty
    • 🖼️ Covers: Only replace if new cover has higher resolution
    • ✍️ Authors: Always update (typically improves consistency)
    • 🏷️ Tags/Series: Always add (enhances discoverability)

✅ Selective Field Updates

New Feature: Administrators can now choose exactly which metadata fields should be updated during automatic fetching.

📝 Available Field Controls

Each metadata field can be individually enabled or disabled:

  • 📖 Title - Book title and subtitle
  • ✍️ Authors - Author names and contributors
  • 📝 Description - Plot summary and book description
  • 🏢 Publisher - Publishing house and imprint
  • 🏷️ Tags/Genres - Subject tags and genre classifications
  • 📚 Series - Series name and position/index
  • ⭐ Rating - Star ratings and reviews
  • 📅 Publication Date - Original publication date
  • 🔢 Identifiers - ISBN, ASIN, and other book IDs
  • 🖼️ Cover Image - Book cover artwork

🎯 Default Behavior

  • All fields are enabled by default - Maintains existing functionality
  • Unchecked fields are never modified - Preserves your existing data
  • Works with both Smart and Direct modes - Applies to your chosen application method

💡 Use Cases

🏛️ Academic Libraries:

✅ Authors, Publishers, Identifiers, Publication Date
❌ Title, Description, Tags, Rating, Cover

Preserve manual cataloging while updating bibliographic data

📚 Fiction Collections:

✅ Description, Tags, Series, Cover, Rating
❌ Title, Authors, Publisher, Publication Date

Enhance discoverability while keeping core attribution intact

🦸 Comic Collections:

✅ Series, Cover, Tags, Rating
❌ Title, Authors, Description, Publisher

Update series information and artwork while preserving original titles

🔒 Curated Libraries:

✅ Identifiers, Publication Date
❌ All other fields

Only add missing bibliographic identifiers

📋 Metadata Provider Hierarchy

Configure the order in which providers are searched:

  1. 🖱️ Drag and Drop: Reorder providers by dragging them in the CWA Settings interface
  2. 🏆 Priority System: System tries providers from top to bottom until successful
  3. 🥇 First Success Wins: Once a provider returns usable data, the search stops

🌐 Available Providers

📚 Google Books

  • 💪 Strengths: Comprehensive database, good for popular books, excellent cover images
  • 🎯 Best For: Fiction, popular non-fiction, recently published books
  • 🌍 Coverage: International, multiple languages

📖 Internet Archive (archive.org)

  • 💪 Strengths: Extensive catalog, good for older/rare books, academic works
  • 🎯 Best For: Classic literature, academic texts, out-of-print books
  • 📚 Coverage: Historical and academic works

🇩🇪 Deutsche Nationalbibliothek (DNB)

  • 💪 Strengths: Authoritative German library catalog, excellent for German-language books
  • 🎯 Best For: German books, academic works, official publication data
  • 🌍 Coverage: German-language publications

🦸 ComicVine

  • 💪 Strengths: Specialized comic book database, detailed series information
  • 🎯 Best For: Comic books, graphic novels, manga
  • 📖 Coverage: Comics and graphic literature

🇨🇳 Douban

  • 💪 Strengths: Chinese book database, good for Asian literature
  • 🎯 Best For: Chinese books, Asian literature, translated works
  • 🌏 Coverage: Chinese and East Asian publications

🏆 Recommended Provider Orders

🏠 General Purpose Libraries:

  1. 📚 Google Books (broad coverage)
  2. 📖 Internet Archive (older/rare books)
  3. 🇩🇪 DNB (German books)
  4. 🦸 ComicVine (comics)
  5. 🇨🇳 Douban (Asian literature)

🎓 Academic Libraries:

  1. 🇩🇪 DNB (authoritative data)
  2. 📖 Internet Archive (academic works)
  3. 📚 Google Books (recent publications)

🦸 Comic/Graphic Novel Collections:

  1. 🦸 ComicVine (specialized)
  2. 📚 Google Books (mainstream comics)
  3. 📖 Internet Archive (older comics)

👤 User Experience

👀 What Users See

Users benefit from auto-metadata fetch without any configuration:

  • 📖 Complete Book Information: Books automatically have proper titles, authors, descriptions
  • 📂 Better Organization: Consistent metadata improves browsing and searching
  • 🔍 Enhanced Discovery: Tags and series information help find related books
  • 🎨 Professional Appearance: High-quality covers and complete details

🚀 When Metadata is Fetched

The system fetches metadata for:

  • ✅ Newly uploaded books with minimal metadata
  • ✅ Books imported from external sources
  • ✅ Books with obviously incorrect or incomplete information
  • ❌ Books that already have complete, high-quality metadata (in Smart mode)

🎯 System Behavior

🔍 Search Process

  1. 🔧 Build Search Query: Combines existing title and author information
  2. 🌐 Provider Search: Queries providers in configured order
  3. ⭐ Result Evaluation: Analyzes returned metadata for quality and relevance
  4. 🥇 First Success: Stops searching once a provider returns usable data
  5. 📝 Metadata Application: Applies metadata according to configured mode

📊 Quality Criteria

The system evaluates metadata quality based on:

  • ✅ Completeness: Number of filled fields
  • 🎯 Accuracy: Relevance to search terms
  • 📏 Detail Level: Depth of description and information
  • 🖼️ Image Quality: Cover image resolution and clarity

🔗 Processing Integration

Metadata fetching integrates with other CWA systems:

  • 📧 Before Auto-Send: Metadata is fetched before books are sent to eReaders
  • 🔄 After Auto-Convert: Metadata is applied after format conversion
  • 📥 During Ingest: Runs as part of the book ingestion pipeline

⚙️ Technical Details

🔍 Search Algorithm

For each newly ingested book:
  If metadata_fetch_enabled:
    For each provider in hierarchy:
      🔍 Search provider with book title + author
      If results found:
        📝 Apply metadata based on application mode
        📊 Log success and stop searching
      Else:
        ⏭️ Try next provider
    If no providers returned data:
      ⚠️ Log failure, book keeps original metadata

📋 Metadata Fields

The system can fetch and apply:

📚 Core Information:

  • 📖 Title and subtitle
  • ✍️ Author(s) and contributors
  • 📅 Publication date
  • 🏢 Publisher and imprint
  • 🔢 ISBN and other identifiers

📝 Descriptive Data:

  • 📖 Plot summary/description
  • 🏷️ Tags and genres
  • 📚 Series information and position
  • 🌍 Language and edition details

🎨 Visual Elements:

  • 🖼️ Cover images (high resolution preferred)
  • 🏢 Publisher logos and branding

📊 Cataloging Data:

  • 📚 Library classifications
  • 🏷️ Subject headings
  • 📖 Academic citations

💾 Database Integration

  • 💾 Metadata Storage: Integrated with Calibre's metadata database
  • 🔍 Search Indexing: New metadata immediately improves search capabilities
  • 📊 Version Tracking: Changes are logged for audit purposes

🔧 Troubleshooting

📚 Metadata Not Being Fetched

🔧 Check Administrator Settings:

  • Verify "Enable Auto-Metadata Fetch" is checked in CWA Settings
  • Confirm at least one provider is configured in the hierarchy
  • Check field selections - Ensure desired fields are enabled
  • Check system logs for provider connectivity issues

📝 Field-Specific Issues:

  • No updates happening: Check if all relevant fields are disabled
  • Partial updates only: Verify which fields are enabled in settings
  • Core fields not updating: Ensure Title, Authors, Description are enabled
  • Missing cover images: Check if Cover Image field is enabled

🌐 Provider-Specific Issues:

  • Network Connectivity: Ensure server can reach external metadata sources
  • ⏱️ API Limits: Some providers have rate limits or access restrictions
  • 🔍 Search Terms: Very obscure books may not be found by any provider

⭐ Poor Quality Results

🔍 Search Term Issues:

  • Books with very short or generic titles may return incorrect matches
  • Non-English books may not be found by English-language providers
  • Academic or technical books may need specialized providers

📋 Provider Selection:

  • Reorder providers to prioritize sources better suited to your collection
  • Consider disabling providers that consistently return poor results
  • Add provider-specific delays if rate limiting occurs

🧠 Smart Mode Behavior

🤔 Understanding Smart Criteria:

  • Smart mode is conservative - it only replaces data when confident the new data is better
  • Some fields (like authors) are always updated for consistency
  • Covers are only replaced if demonstrably higher quality

❓ When Smart Mode Doesn't Update:

  • Existing metadata may already be good quality
  • New metadata may not meet the improvement criteria
  • This is normal behavior - the system is preserving your existing data

💡 Best Practices

🔧 For Administrators

📋 Provider Configuration:

  • Order providers based on your collection's characteristics
  • Test with representative books before enabling system-wide
  • Monitor logs to identify provider success rates

🎯 Application Mode Selection:

  • Use 🎯 Direct Replacement for new libraries or when starting fresh
  • Use 🧠 Smart Application for established libraries with existing good metadata
  • Consider your collection's metadata quality when choosing

✅ Field Selection Strategy:

  • Start with all fields enabled - Test with a small subset of books first
  • Disable fields you've manually curated - Preserve your hard work
  • Enable fields that are often missing - Like descriptions, tags, series information
  • Consider your workflow - Match field selection to your cataloging practices

⚡ Performance Optimization:

  • Limit the number of active providers to reduce processing time
  • Configure appropriate delays between provider requests
  • Monitor system resources during peak ingestion periods
  • Fewer enabled fields = faster processing

📚 Collection-Specific Recommendations

📖 Fiction Libraries:

  • Providers: Prioritize 📚 Google Books and 📖 Internet Archive
  • Mode: Enable 🧠 Smart Application to preserve manual curation
  • Fields: Focus on ✅ Description, Tags, Series, Cover, Rating
  • Skip: ❌ Title, Authors if you have good existing data

🎓 Academic Collections:

  • Providers: Lead with 🇩🇪 DNB and 📖 Internet Archive
  • Mode: Use 🎯 Direct Replacement for consistency
  • Fields: Prioritize ✅ Authors, Publisher, Identifiers, Publication Date
  • Skip: ❌ Descriptions, Tags that may not match academic standards

🦸 Comic Collections:

  • Providers: Put 🦸 ComicVine first
  • Mode: Use 🧠 Smart Application to preserve series organization
  • Fields: Focus on ✅ Series, Cover, Tags, Rating
  • Skip: ❌ Titles, Authors for complex numbering systems

🌍 Multilingual Libraries:

  • Providers: Include language-appropriate providers (🇩🇪 DNB for German, 🇨🇳 Douban for Chinese)
  • Mode: Test both modes with each language
  • Fields: Enable ✅ All fields but test quality per language
  • Strategy: Consider separate configurations per language section

🔒 Privacy and Legal Considerations

🌐 Data Sources

  • All metadata providers are public services
  • No personal information is transmitted to providers
  • Only book title and author information is used for searches

📚 Intellectual Property

  • Metadata fetching uses publicly available bibliographic data
  • Cover images are linked, not stored, respecting copyright
  • All provider terms of service are respected

⏱️ Rate Limiting

  • System respects provider API limits and usage policies
  • Automatic delays prevent overwhelming external services
  • Failed requests are logged but not indefinitely retried

🚀 Advanced Configuration

🔧 Custom Provider Integration

  • System architecture supports adding new metadata providers
  • Contact your administrator for custom provider development
  • API documentation available for advanced integrations

📊 Bulk Metadata Updates

  • System focuses on newly ingested books
  • Existing books can be updated through Calibre-Web's standard metadata editing
  • Consider batch operations for large collection updates

🏷️ The Auto-Metadata Fetch System ensures your library maintains high-quality, complete metadata automatically. 🛠️ For technical support or advanced configuration options, please contact your system administrator.

Clone this wiki locally