Ignore hidden text? #446
Unanswered
PmE8HW0KRfqa
asked this question in
Q&A
Replies: 1 comment
-
These may be on two separate content streams. You could try ignoring one. Or, for each letter in page.Letters, check if it overlaps with other letters. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to extract text from a PDF file that is vector but has had OCR added to it (poorly). If I try to copy the text using a GUI tool (like Acrobat), I might select the OCR text, or I might select the original text. It's not predicable since the two text fields are effectively on top of each other. PDFpig picks up everything. Is there a means to only extract text that is visible?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions