Skip to content

Work Ingestion Pipeline #164

@Radiergummi

Description

@Radiergummi

Description

Complete ebook ingestion system with metadata extraction, duplicate detection, and entity creation.

Implemented

  • ✅ Metadata extraction (EPUB/MOBI/PDF)
  • ✅ Cover image processing with blurhash
  • ✅ Asset deduplication by checksum
  • ✅ Duplicate detection (checksum, ISBN, ASIN, fuzzy title)
  • ✅ Creator/publisher normalization with fuzzy matching
  • ✅ Series and tag extraction
  • ✅ CLI commands (add, import, inspect)
  • ✅ 119+ tests

Remaining Work

  • Metadata enrichment integration (wire up providers)
  • Enrichment UI (review/accept metadata)
  • External identifier linking (VIAF, ISNI)
  • Cover fetching from OpenLibrary
  • Language detection
  • Calibre library import

Files

Getting Started

  1. Review existing ingestion pipeline
  2. Wire up metadata enrichment
  3. Build enrichment review UI

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions