after crop page, then get_images() #2140
-
HI I just want to get_images in specify range in a page. So I crop the page, then use function get_images. But it still get all images in this page. page.set_cropbox(fitz.Rect(100, 100, 550, 700)) How to get images in specify range in a page? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 14 replies
-
This is because that method looks at PDF definitions only and does not inspect the page's appearance source code. To at least restrict the above list to images that are actually referenced by this page anywhere on the MediaBox, do a To restrict that list to visible images (CropBox) do this: imglist = page.get_images()
visibles = [item for item in imglist if page.get_image_rects(item[0])[0] in page.cropbox]
|
Beta Was this translation helpful? Give feedback.
-
There must be some misconception. Can you let me have the PDF and the page number in question? |
Beta Was this translation helpful? Give feedback.
-
As I suspected, you have been hiding major information items! |
Beta Was this translation helpful? Give feedback.
page.get_images()
may even list images that are not at all used by the page - not to mention what you are asking.This is because that method looks at PDF definitions only and does not inspect the page's appearance source code.
To at least restrict the above list to images that are actually referenced by this page anywhere on the MediaBox, do a
page.clean_contents()
first.To restrict that list to visible images (CropBox) do this:
get_image_rects
walks through the page's appearance instructions to determine each bbox of one image (given by its xrefitem[0]
) on the page. In…