image extract error #2848
-
Please provide all mandatory information! Describe the bug (mandatory)You mentioned that it is possible to extract images from a PDF using the get_text('dict') method by setting type==1, and also by using the get_images() method. In a document I am working with, the number of images extracted by these two methods is different. Is this normal? Or is there another way to determine the position of the images extracted by get_images() in the PDF? To Reproduce (mandatory)import fitz If applicable, add screenshots to help explain your problem. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
There indeed is a difference between the two ways:
You can adjust that by using clip=fitz.INFINITE_RECT() in the |
Beta Was this translation helpful? Give feedback.
-
it helps a lot |
Beta Was this translation helpful? Give feedback.
-
thank you |
Beta Was this translation helpful? Give feedback.
There indeed is a difference between the two ways:
get_text("dict")
internally are restricted to a clip rectangle equal to the page itself: any image not completely contained in page.rect is omittedpage.get_image_info()
do not contain this restrictionYou can adjust that by using clip=fitz.INFINITE_RECT() in the
get_text()
method.