Skip to content

Paperless AI needs to support rescanning / OCR'ing documents via LLMΒ #847

@anaisbetts

Description

@anaisbetts

Is your feature request related to a problem? Please describe.

Every feature in Paperless AI is contingent on the text of the document being correct - tagging, AI Chat, everything. Unfortunately, Tesseract OCR is actually Pretty Bad, and for many documents is just out there generating garbage. This means that everything built on top is functionally unusable - ask "what is my account number for $SERVICE" and it gives you literally the wrong number, because it was scanned poorly.

Describe the solution you'd like
Co-opt the features of Paperless-GPT only better - as part of the processing, also re-OCR the document using modern solutions and save that as the document text, then do tagging based on that.

Describe alternatives you've considered
Fixing paperless-gpt but that is not a very great project overall.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions