Replies: 1 comment
-
The problem here is that the font uses non-standard encoding, which leads to incorrect backtranslation of the glyphs (character appearance in viewers) to the originating character unicodes. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello.
I am trying to parse a large PDF document into a single Excel table for further processing.
A page from one of those documents is attached: fragment.pdf
The minimal code I use to extract text from a single page:
and then I get this:
The problem is that the extracted text does not match the text in PDF, it does not display something like ? or TOFU symbols.
Are there any suggestions why this happens and is there any solution for it?
Beta Was this translation helpful? Give feedback.
All reactions