rich table(or chart?) content is not fully extracted #2798
-
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
There is no possible enhancement here I'm afraid. |
Beta Was this translation helpful? Give feedback.
-
We are not looking at a table at all on the shown page. There are vector graphics with some explanatory text - that's all. You have to analyze this situation yourself, for example by checking if some vector graphic or image is contained in the detected table's bbox. paths = page.get_drawings()
tab_bbox = fitz.Rect(tab.bbox)
if [p for p in paths if p["rect"] in tab_bbox] != []: # some graphics are in table area!
print("not a table") |
Beta Was this translation helpful? Give feedback.
I believe you were taken in the wrong direction by using the verb "crop" which is not applicable here.
What you seem to mean is taking a picture of only the drawing. This works by using the drawing's bbox as a clip area like this:
pix = page.get_pixmap(clip=bbox)
. Then dopix.save("some.jpg")
(or.png
).If you want, you can improve image resolution by using the dpi parameter when creating the pixmap.