Skip to content
Discussion options

You must be logged in to vote

I think I tried to explain this to you in another Discussions post already:
You obviously are dealing with OCR'ed pages. So you are not looking at actual text, but at images!
When you search / extract text, then you will get the information that your OCR engine was capable to detect.

This is always error-prone!

The text rectangles may not exactly match the corresponding image-text (because of whatever reasons), dirt or skewed scanning may have confused the logic. Same is true for drawings: the OCR engine may think this is some text, or otherwise, your redaction / text insertion may destroy text borders that you actually wish to retain, etc., etc., and so on.

So depending on the specific s…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by Muhammadraafat1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants