-
Notifications
You must be signed in to change notification settings - Fork 677
Closed
Labels
Description
Description of the bug
Continuing the discussion from #479, I'd expected this to work, but it still doesn't:
$ poetry run python
Python 3.11.2 (main, Nov 30 2024, 21:22:50) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import pymupdf
>>> pymupdf.__version__
'1.25.5'
>>> doc = pymupdf.open("i-9-paper-version.pdf")
>>> i = doc.extract_image(4)
>>> i['xres']
96
>>> i['yres']
96
>>> i['width']
201
>>> i['height']
199
>>> i['bpc']
8
vs.
$ pdfimages -list i-9-paper-version.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 200 199 index 1 8 image no 47 0 300 301 16.5K 42%
1 1 image 71 71 index 1 8 image no 48 0 300 304 1510B 30%
1 2 image 71 71 index 1 8 image no 48 0 300 304 1510B 30%
2 3 image 201 199 index 1 8 image no 4 0 300 301 16.3K 42%
At the moment, I am forced to shell out to pdfimages to retrieve the per-image xres/yres. Is this something possible in the future with pymupdf at all?
How to reproduce the bug
Run the code above with the attached PDF file.
PyMuPDF version
1.25.5
Operating system
Linux
Python version
3.11