Find a block that is neither an image nor text, and get_text() method cannot extract #2587
-
when i try to extract text from this file, i find something that i can choose it in WPS but can not be extract by get_text() method just use page.get_text() and 'BILL OF LADING NO' can not be found my pymupdf version is 1.18.8 and also i provide a picture that this something shows in WPS i hope you guys will answer me soon thank you Additional context (optional)Add any other context about the problem here. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 3 replies
-
On which page is that text "BILL OF LADING NO"? Don't see it anywhere. |
Beta Was this translation helpful? Give feedback.
-
Also, please what are your attached pictures supposed to tell me? |
Beta Was this translation helpful? Give feedback.
-
Well, this obviously is no issue, but a Discussions topic. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
These things are vector graphics. Meaning they consist of drawing commands: lines, curves, rectangles.
For example, you can draw capital Letter "A" by these lines:
/-\
if you know how to do this.PyMuPDF can extract these too via method
page.get_drawings()
. Please consult the documentation.Vector graphics have the great advantage, that they remain smooth when the page image is zoomed in - raster images become pixelized.