Match the results of page.get_image_info() and page.get_images() #1659
-
Hi, I've been working on extracting individual images in a pdf file where I store them in a remote store after identifying them individually, However, I've been working on getting the value of the image rotation. whereas the information that I needed which is the transformation matrix can be extracted using the
I am able to calculate the angle to its nearest integer,, using the c and d values of the transform matrix, however, I don't see any identifier like image_name that i can use to reference the results of the is there any way to identify the results of the extracted information in page.get_images to the result of the page.get_image_info? for now, I am storing the results of |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Both methods work with completely different approaches. To support that matching, But you can also use |
Beta Was this translation helpful? Give feedback.
-
OMG!!! you are right!!! I feel stupid now, that I haven't realized that I can use this. Thanks for the help. |
Beta Was this translation helpful? Give feedback.
Both methods work with completely different approaches.
get_images()
only works on PDFs, whereasget_image_info()
works for all document types - just likeget_text()
, on which it is based.The sets of images each of them reports are not equal in general. I am discussing the background in detail in the documentation.
To support that matching,
get_image_info()
supports thexrefs
parameter. IfTrue
thenimage["xref"]
can be used to locate the item inget_images()
.But you can also use
page.get_image_rects(item, transform=True)
to get a list of locations of an image on the page (including the transformation matrix) using one of the items inget_images()
.