-
-
Notifications
You must be signed in to change notification settings - Fork 263
Description
Is your feature request related to a problem? Please describe.
Every feature in Paperless AI is contingent on the text of the document being correct - tagging, AI Chat, everything. Unfortunately, Tesseract OCR is actually Pretty Bad, and for many documents is just out there generating garbage. This means that everything built on top is functionally unusable - ask "what is my account number for $SERVICE" and it gives you literally the wrong number, because it was scanned poorly.
Describe the solution you'd like
Co-opt the features of Paperless-GPT only better - as part of the processing, also re-OCR the document using modern solutions and save that as the document text, then do tagging based on that.
Describe alternatives you've considered
Fixing paperless-gpt but that is not a very great project overall.