Skip to content
Discussion options

You must be logged in to vote

Yes: method Page.get_text("dict") extracts text and images when using the default flags.
The sequence of the extracted image and text blocks are like in the page's /Contents.

The full sequence of all boundary boxes of everything on the page is reflected by the list page.get_bboxlog(). The items in this list look like (obj-type, bbox).
So you can take the bbox of an image or some text and then determine the index in the bboxlog that contains it.

Replies: 5 comments 6 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
2 replies
@more-strive
Comment options

@more-strive
Comment options

Comment options

You must be logged in to vote
3 replies
@more-strive
Comment options

@JorjMcKie
Comment options

Answer selected by more-strive
@more-strive
Comment options

Comment options

You must be logged in to vote
1 reply
@more-strive
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #2985 on January 08, 2024 08:35.