page.get_images() returns the images from other pages #3898

lokman2k5 · 2024-09-26T17:05:42Z

lokman2k5
Sep 26, 2024

with a certain PDF file, vitas.pdf, when i use page.get_images() on the first page for example, it returns every image in the PDF. I have tested with other PDFs and it works correctly and only returns images from the same page. How can I solve this?
Here is my code:
import fitz doc = fitz.open("vitas.pdf") page = doc[0] for image in page.get_images(): print(image[0]) rawImg = doc.extract_image(image[0]) imageFile = f"image{image[0]}.{rawImg['ext']}" imgOut = open(imageFile, 'wb') imgOut.write(rawImg['image']) print(f"{imageFile} is saved")

Answered by JorjMcKie

Sep 26, 2024

This method returns the contents of the respective part of the page's object definition. It is not a statement about what the page in fact shows.
The PDF creator can enter whatever in that array.
You can use page.get_image_info() instead.
Or use page.clean_contents() before execution. This will make a run through the page's source code and synchronize its object definitions with what really happens during page dislay.

View full answer

JorjMcKie · 2024-09-26T17:16:52Z

JorjMcKie
Sep 26, 2024
Maintainer

This method returns the contents of the respective part of the page's object definition. It is not a statement about what the page in fact shows.
The PDF creator can enter whatever in that array.
You can use page.get_image_info() instead.
Or use page.clean_contents() before execution. This will make a run through the page's source code and synchronize its object definitions with what really happens during page dislay.

1 reply

lokman2k5 Sep 26, 2024
Author

thank you. clean_contents() did the job!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

page.get_images() returns the images from other pages #3898

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

page.get_images() returns the images from other pages #3898

Uh oh!

lokman2k5 Sep 26, 2024

Replies: 1 comment · 1 reply

Uh oh!

JorjMcKie Sep 26, 2024 Maintainer

Uh oh!

lokman2k5 Sep 26, 2024 Author

lokman2k5
Sep 26, 2024

Replies: 1 comment 1 reply

JorjMcKie
Sep 26, 2024
Maintainer

lokman2k5 Sep 26, 2024
Author