-
-
Notifications
You must be signed in to change notification settings - Fork 19
Work Ingestion Pipeline #164
Copy link
Copy link
Open
Labels
area/cliCLI toolCLI toolarea/sdkCore SDK: database, storage, ebooksCore SDK: database, storage, ebookspriority/highHigh priorityHigh prioritytype/improvementEnhancement to existing featureEnhancement to existing feature
Description
Description
Complete ebook ingestion system with metadata extraction, duplicate detection, and entity creation.
Implemented
- ✅ Metadata extraction (EPUB/MOBI/PDF)
- ✅ Cover image processing with blurhash
- ✅ Asset deduplication by checksum
- ✅ Duplicate detection (checksum, ISBN, ASIN, fuzzy title)
- ✅ Creator/publisher normalization with fuzzy matching
- ✅ Series and tag extraction
- ✅ CLI commands (add, import, inspect)
- ✅ 119+ tests
Remaining Work
- Metadata enrichment integration (wire up providers)
- Enrichment UI (review/accept metadata)
- External identifier linking (VIAF, ISNI)
- Cover fetching from OpenLibrary
- Language detection
- Calibre library import
Files
- Plan:
plans/work-ingestion.md - Ingestion:
packages/sdk/src/ingestion/ - CLI:
apps/cli/src/commands/works/
Getting Started
- Review existing ingestion pipeline
- Wire up metadata enrichment
- Build enrichment review UI
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/cliCLI toolCLI toolarea/sdkCore SDK: database, storage, ebooksCore SDK: database, storage, ebookspriority/highHigh priorityHigh prioritytype/improvementEnhancement to existing featureEnhancement to existing feature