Skip to content

Still unable to retrieve PDF-extracted image xres/yres #4485

@wohali

Description

@wohali

Description of the bug

Continuing the discussion from #479, I'd expected this to work, but it still doesn't:

$ poetry run python
Python 3.11.2 (main, Nov 30 2024, 21:22:50) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import pymupdf
>>> pymupdf.__version__
'1.25.5'
>>> doc = pymupdf.open("i-9-paper-version.pdf")
>>> i = doc.extract_image(4)
>>> i['xres']
96
>>> i['yres']
96
>>> i['width']
201
>>> i['height']
199
>>> i['bpc']
8

vs.

$ pdfimages -list i-9-paper-version.pdf
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image     200   199  index   1   8  image  no        47  0   300   301 16.5K  42%
   1     1 image      71    71  index   1   8  image  no        48  0   300   304 1510B  30%
   1     2 image      71    71  index   1   8  image  no        48  0   300   304 1510B  30%
   2     3 image     201   199  index   1   8  image  no         4  0   300   301 16.3K  42%

At the moment, I am forced to shell out to pdfimages to retrieve the per-image xres/yres. Is this something possible in the future with pymupdf at all?

How to reproduce the bug

Run the code above with the attached PDF file.

i-9-paper-version.pdf

PyMuPDF version

1.25.5

Operating system

Linux

Python version

3.11

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions