Cannot find text in Page.read_contents() #3019
Replies: 2 comments 7 replies
-
This is no bug, but a Discussions item. |
Beta Was this translation helpful? Give feedback.
-
Things mostly look very different than expected once they have landed in the So in the end, your text is not missing - you just do not recognize it. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Description of the bug
Well I am actually a bit confused. I have a PDF document which was processed using JAVA (pdfbox package) adding a watermark as an overlay on the background (this is for context)
When I open the file and use get_text(), I am able to see the block with the watermark text, but when I use read_contents in order to remove this, the text is missing. Can you please explain what am I missing here?
Also, is there another way to identify this overlayed text? Is there a way to remove a block returned from get_text without having to access the contents directly?
Thanks in advance
How to reproduce the bug
Unfortunately I cannot provide the document because of sensitive data but what I am doing is:
PyMuPDF version
1.23.9
Operating system
Linux
Python version
3.10
Beta Was this translation helpful? Give feedback.
All reactions