-
-
Notifications
You must be signed in to change notification settings - Fork 281
Auto Metadata Fetch System
Currently only in :dev version
The Auto-Metadata Fetch System automatically enriches newly ingested books with comprehensive metadata from multiple online sources, ensuring your library has complete and accurate book information.
When books are added to your Calibre-Web Automated library, the Auto-Metadata Fetch System automatically searches multiple metadata providers to find and apply detailed information including titles, authors, descriptions, covers, publication dates, and more. This eliminates the need for manual metadata entry and ensures consistent, high-quality book information.
- 📚 Book Detection: System identifies newly ingested books with incomplete or poor metadata
- 🔍 Provider Search: Searches configured metadata providers in priority order
- 📝 Metadata Application: Applies found metadata based on administrator-configured rules
- ⭐ Quality Enhancement: Improves book discoverability and organization
Administrators can enable the system in CWA Settings:
- Navigate to CWA Settings
- Check "Enable Auto-Metadata Fetch"
- Configure Smart Metadata Application (optional)
- Set up Metadata Provider Hierarchy
- Save settings
The system offers two metadata application modes with granular field control:
- ⚡ Behavior: Takes metadata from the preferred provider exactly as provided
- 💭 Philosophy: "Just take the metadata that comes from the preferred metadata provider as is"
- 🎯 Use Case: When you trust your primary provider and want consistent results
- 📋 Result: Complete replacement of existing metadata with provider data for selected fields
- 🧠 Behavior: Applies intelligent criteria when replacing existing metadata
- 💭 Philosophy: Only improve metadata when the new data is demonstrably better
- 🎯 Use Case: When you want to preserve good existing metadata while enhancing poor data
-
📏 Criteria: Applied only to selected fields:
- 📖 Titles: Only replace if new title is longer/more descriptive
- 📝 Descriptions: Only replace if new description is longer/more detailed
- 🏢 Publishers: Only replace if current publisher field is empty
- 🖼️ Covers: Only replace if new cover has higher resolution
- ✍️ Authors: Always update (typically improves consistency)
- 🏷️ Tags/Series: Always add (enhances discoverability)
New Feature: Administrators can now choose exactly which metadata fields should be updated during automatic fetching.
Each metadata field can be individually enabled or disabled:
- 📖 Title - Book title and subtitle
- ✍️ Authors - Author names and contributors
- 📝 Description - Plot summary and book description
- 🏢 Publisher - Publishing house and imprint
- 🏷️ Tags/Genres - Subject tags and genre classifications
- 📚 Series - Series name and position/index
- ⭐ Rating - Star ratings and reviews
- 📅 Publication Date - Original publication date
- 🔢 Identifiers - ISBN, ASIN, and other book IDs
- 🖼️ Cover Image - Book cover artwork
- All fields are enabled by default - Maintains existing functionality
- Unchecked fields are never modified - Preserves your existing data
- Works with both Smart and Direct modes - Applies to your chosen application method
🏛️ Academic Libraries:
✅ Authors, Publishers, Identifiers, Publication Date
❌ Title, Description, Tags, Rating, Cover
Preserve manual cataloging while updating bibliographic data
📚 Fiction Collections:
✅ Description, Tags, Series, Cover, Rating
❌ Title, Authors, Publisher, Publication Date
Enhance discoverability while keeping core attribution intact
🦸 Comic Collections:
✅ Series, Cover, Tags, Rating
❌ Title, Authors, Description, Publisher
Update series information and artwork while preserving original titles
🔒 Curated Libraries:
✅ Identifiers, Publication Date
❌ All other fields
Only add missing bibliographic identifiers
Configure the order in which providers are searched:
- 🖱️ Drag and Drop: Reorder providers by dragging them in the CWA Settings interface
- 🏆 Priority System: System tries providers from top to bottom until successful
- 🥇 First Success Wins: Once a provider returns usable data, the search stops
📚 Google Books
- 💪 Strengths: Comprehensive database, good for popular books, excellent cover images
- 🎯 Best For: Fiction, popular non-fiction, recently published books
- 🌍 Coverage: International, multiple languages
📖 Internet Archive (archive.org)
- 💪 Strengths: Extensive catalog, good for older/rare books, academic works
- 🎯 Best For: Classic literature, academic texts, out-of-print books
- 📚 Coverage: Historical and academic works
🇩🇪 Deutsche Nationalbibliothek (DNB)
- 💪 Strengths: Authoritative German library catalog, excellent for German-language books
- 🎯 Best For: German books, academic works, official publication data
- 🌍 Coverage: German-language publications
🦸 ComicVine
- 💪 Strengths: Specialized comic book database, detailed series information
- 🎯 Best For: Comic books, graphic novels, manga
- 📖 Coverage: Comics and graphic literature
🇨🇳 Douban
- 💪 Strengths: Chinese book database, good for Asian literature
- 🎯 Best For: Chinese books, Asian literature, translated works
- 🌏 Coverage: Chinese and East Asian publications
🏠 General Purpose Libraries:
- 📚 Google Books (broad coverage)
- 📖 Internet Archive (older/rare books)
- 🇩🇪 DNB (German books)
- 🦸 ComicVine (comics)
- 🇨🇳 Douban (Asian literature)
🎓 Academic Libraries:
- 🇩🇪 DNB (authoritative data)
- 📖 Internet Archive (academic works)
- 📚 Google Books (recent publications)
🦸 Comic/Graphic Novel Collections:
- 🦸 ComicVine (specialized)
- 📚 Google Books (mainstream comics)
- 📖 Internet Archive (older comics)
Users benefit from auto-metadata fetch without any configuration:
- ✅ 📖 Complete Book Information: Books automatically have proper titles, authors, descriptions
- ✅ 📂 Better Organization: Consistent metadata improves browsing and searching
- ✅ 🔍 Enhanced Discovery: Tags and series information help find related books
- ✅ 🎨 Professional Appearance: High-quality covers and complete details
The system fetches metadata for:
- ✅ Newly uploaded books with minimal metadata
- ✅ Books imported from external sources
- ✅ Books with obviously incorrect or incomplete information
- ❌ Books that already have complete, high-quality metadata (in Smart mode)
- 🔧 Build Search Query: Combines existing title and author information
- 🌐 Provider Search: Queries providers in configured order
- ⭐ Result Evaluation: Analyzes returned metadata for quality and relevance
- 🥇 First Success: Stops searching once a provider returns usable data
- 📝 Metadata Application: Applies metadata according to configured mode
The system evaluates metadata quality based on:
- ✅ Completeness: Number of filled fields
- 🎯 Accuracy: Relevance to search terms
- 📏 Detail Level: Depth of description and information
- 🖼️ Image Quality: Cover image resolution and clarity
Metadata fetching integrates with other CWA systems:
- 📧 Before Auto-Send: Metadata is fetched before books are sent to eReaders
- 🔄 After Auto-Convert: Metadata is applied after format conversion
- 📥 During Ingest: Runs as part of the book ingestion pipeline
For each newly ingested book:
If metadata_fetch_enabled:
For each provider in hierarchy:
🔍 Search provider with book title + author
If results found:
📝 Apply metadata based on application mode
📊 Log success and stop searching
Else:
⏭️ Try next provider
If no providers returned data:
⚠️ Log failure, book keeps original metadata
The system can fetch and apply:
📚 Core Information:
- 📖 Title and subtitle
- ✍️ Author(s) and contributors
- 📅 Publication date
- 🏢 Publisher and imprint
- 🔢 ISBN and other identifiers
📝 Descriptive Data:
- 📖 Plot summary/description
- 🏷️ Tags and genres
- 📚 Series information and position
- 🌍 Language and edition details
🎨 Visual Elements:
- 🖼️ Cover images (high resolution preferred)
- 🏢 Publisher logos and branding
📊 Cataloging Data:
- 📚 Library classifications
- 🏷️ Subject headings
- 📖 Academic citations
- 💾 Metadata Storage: Integrated with Calibre's metadata database
- 🔍 Search Indexing: New metadata immediately improves search capabilities
- 📊 Version Tracking: Changes are logged for audit purposes
🔧 Check Administrator Settings:
- Verify "Enable Auto-Metadata Fetch" is checked in CWA Settings
- Confirm at least one provider is configured in the hierarchy
- Check field selections - Ensure desired fields are enabled
- Check system logs for provider connectivity issues
📝 Field-Specific Issues:
- No updates happening: Check if all relevant fields are disabled
- Partial updates only: Verify which fields are enabled in settings
- Core fields not updating: Ensure Title, Authors, Description are enabled
- Missing cover images: Check if Cover Image field is enabled
🌐 Provider-Specific Issues:
- Network Connectivity: Ensure server can reach external metadata sources
- ⏱️ API Limits: Some providers have rate limits or access restrictions
- 🔍 Search Terms: Very obscure books may not be found by any provider
🔍 Search Term Issues:
- Books with very short or generic titles may return incorrect matches
- Non-English books may not be found by English-language providers
- Academic or technical books may need specialized providers
📋 Provider Selection:
- Reorder providers to prioritize sources better suited to your collection
- Consider disabling providers that consistently return poor results
- Add provider-specific delays if rate limiting occurs
🤔 Understanding Smart Criteria:
- Smart mode is conservative - it only replaces data when confident the new data is better
- Some fields (like authors) are always updated for consistency
- Covers are only replaced if demonstrably higher quality
❓ When Smart Mode Doesn't Update:
- Existing metadata may already be good quality
- New metadata may not meet the improvement criteria
- This is normal behavior - the system is preserving your existing data
📋 Provider Configuration:
- Order providers based on your collection's characteristics
- Test with representative books before enabling system-wide
- Monitor logs to identify provider success rates
🎯 Application Mode Selection:
- Use 🎯 Direct Replacement for new libraries or when starting fresh
- Use 🧠 Smart Application for established libraries with existing good metadata
- Consider your collection's metadata quality when choosing
✅ Field Selection Strategy:
- Start with all fields enabled - Test with a small subset of books first
- Disable fields you've manually curated - Preserve your hard work
- Enable fields that are often missing - Like descriptions, tags, series information
- Consider your workflow - Match field selection to your cataloging practices
⚡ Performance Optimization:
- Limit the number of active providers to reduce processing time
- Configure appropriate delays between provider requests
- Monitor system resources during peak ingestion periods
- Fewer enabled fields = faster processing
📖 Fiction Libraries:
- Providers: Prioritize 📚 Google Books and 📖 Internet Archive
- Mode: Enable 🧠 Smart Application to preserve manual curation
- Fields: Focus on ✅ Description, Tags, Series, Cover, Rating
- Skip: ❌ Title, Authors if you have good existing data
🎓 Academic Collections:
- Providers: Lead with 🇩🇪 DNB and 📖 Internet Archive
- Mode: Use 🎯 Direct Replacement for consistency
- Fields: Prioritize ✅ Authors, Publisher, Identifiers, Publication Date
- Skip: ❌ Descriptions, Tags that may not match academic standards
🦸 Comic Collections:
- Providers: Put 🦸 ComicVine first
- Mode: Use 🧠 Smart Application to preserve series organization
- Fields: Focus on ✅ Series, Cover, Tags, Rating
- Skip: ❌ Titles, Authors for complex numbering systems
🌍 Multilingual Libraries:
- Providers: Include language-appropriate providers (🇩🇪 DNB for German, 🇨🇳 Douban for Chinese)
- Mode: Test both modes with each language
- Fields: Enable ✅ All fields but test quality per language
- Strategy: Consider separate configurations per language section
- All metadata providers are public services
- No personal information is transmitted to providers
- Only book title and author information is used for searches
- Metadata fetching uses publicly available bibliographic data
- Cover images are linked, not stored, respecting copyright
- All provider terms of service are respected
- System respects provider API limits and usage policies
- Automatic delays prevent overwhelming external services
- Failed requests are logged but not indefinitely retried
- System architecture supports adding new metadata providers
- Contact your administrator for custom provider development
- API documentation available for advanced integrations
- System focuses on newly ingested books
- Existing books can be updated through Calibre-Web's standard metadata editing
- Consider batch operations for large collection updates
🏷️ The Auto-Metadata Fetch System ensures your library maintains high-quality, complete metadata automatically. 🛠️ For technical support or advanced configuration options, please contact your system administrator.