Skip to content
Discussion options

You must be logged in to vote

How to one on one map image block from get_text() with page.get_images() list item?
Because page.get_images() list item has xref, but image block from get_text() don't have xref property.

A 1:1 correspondence between page.get_images() and the image blocks in page.get_text() cannot exist / be guaranteed, see the documentation here.

To recover images in the same way as a page shows them, you must investigate the transform matrix and use a package like Pillow to back-transform the extracted image accordingly.
I have once written a little function matprop, that does this investigation. Here is a ZIP of it:
matrix_property.zip
Lets look at page 8 of your test file and see how to use it:

import 

Replies: 5 comments 9 replies

Comment options

You must be logged in to vote
1 reply
@buptyyf
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
5 replies
@buptyyf
Comment options

@JorjMcKie
Comment options

@buptyyf
Comment options

@JorjMcKie
Comment options

@buptyyf
Comment options

Answer selected by buptyyf
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@buptyyf
Comment options

@JorjMcKie
Comment options

@buptyyf
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
not a bug not a bug / user error / unable to reproduce
2 participants
Converted from issue

This discussion was converted from issue #2123 on December 14, 2022 18:20.