How to keep the original resolution of page images? #2631

esraa-abdelmaksoud · 2023-08-28T23:11:47Z

esraa-abdelmaksoud
Aug 28, 2023

Hello,

I've been using page.get_pixmap() to get pages as images for further processing. The problem is that it automatically changes the page height to 842 px. for any document. I controlled the output resolution using the dpi parameter and tried the matrix zooming, but I currently need to keep the original resolution as is. Controlling the output resolution using the dpi parameter results in having blurred scanned images if they were scanned at a low dpi.

How can I get pages as images in their originally scanned resolution?

Please keep in mind that I have read similar discussion questions, but they were mainly about controlling the output resolution not keeping the input resolution.

This is my code snippet:

doc = fitz.open(file_path)
for p in range(len(doc)):
     page = doc.load_page(p)
     pix = page.get_pixmap(dpi=300)

Thank you.

Answered by JorjMcKie

Aug 29, 2023

Document pages in general have no "natural" resolution. You are talking about the (scanned) image from which the page was created, right?
To find the scanned image's resolution, you must first locate it, then extract it to see the resolution values:

In [1]: import fitz
In [2]: doc = fitz.open("ocr-ed.pdf")
In [3]: page = doc[0]
In [4]: page.get_images()
Out[4]: [(12, 0, 1224, 1584, 8, 'DeviceRGB', '', 'R12', 'DCTDecode')]
In [5]: page.get_image_rects(12) # check if covering full page
Out[5]: [Rect(0.0, 0.0, 612.0, 792.0)]
In [6]: # true, so extract image to see its resolutions
In [7]: img = doc.extract_image(12)
In [8]: img["xres"], img["yres"]
Out[8]: (96, 96)
In [9]: # so you can render…

View full answer

JorjMcKie · 2023-08-29T11:08:55Z

JorjMcKie
Aug 29, 2023
Maintainer

Document pages in general have no "natural" resolution. You are talking about the (scanned) image from which the page was created, right?
To find the scanned image's resolution, you must first locate it, then extract it to see the resolution values:

In [1]: import fitz
In [2]: doc = fitz.open("ocr-ed.pdf")
In [3]: page = doc[0]
In [4]: page.get_images()
Out[4]: [(12, 0, 1224, 1584, 8, 'DeviceRGB', '', 'R12', 'DCTDecode')]
In [5]: page.get_image_rects(12) # check if covering full page
Out[5]: [Rect(0.0, 0.0, 612.0, 792.0)]
In [6]: # true, so extract image to see its resolutions
In [7]: img = doc.extract_image(12)
In [8]: img["xres"], img["yres"]
Out[8]: (96, 96)
In [9]: # so you can render the page with dpi=96

1 reply

esraa-abdelmaksoud Aug 31, 2023
Author

Thank you so much. It works perfectly!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to keep the original resolution of page images? #2631

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to keep the original resolution of page images? #2631

Uh oh!

Uh oh!

esraa-abdelmaksoud Aug 28, 2023

Replies: 1 comment · 1 reply

Uh oh!

JorjMcKie Aug 29, 2023 Maintainer

Uh oh!

esraa-abdelmaksoud Aug 31, 2023 Author

esraa-abdelmaksoud
Aug 28, 2023

Replies: 1 comment 1 reply

JorjMcKie
Aug 29, 2023
Maintainer

esraa-abdelmaksoud Aug 31, 2023
Author