how to programmatically determine text from one page is garbled? #3465
Unanswered
animebing
asked this question in
Looking for help
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am processing thousands of pdf files, in some of them, i find the text from some pages are garbled, i hope to detect it, then use OCR to get texts from these pages.
what i have done: based on
https://github.com/pymupdf/PyMuPDF/issues/530
andhttps://github.com/pymupdf/PyMuPDF/issues/365
, i know i can get fonts, then check whether it has/tounicode
, i find it works sometimes, but for some fonts, it has no/tounicode
, but the text is still normal. Is there something i miss to make it work?Beta Was this translation helpful? Give feedback.
All reactions