Recovery Image in PDF bug #2492
-
os system : ubuntu bug describtion: PDF file contain some images , but extracted info of this image only contain color not words. The pdf attach below: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Page 0 contains 4 images with masks, where each image and its mask have different resolutions. In [1]: import fitz
In [2]: doc=fitz.open("zf1.pdf")
In [3]: page=doc[0]
In [4]: page.get_images()
Out[4]:
[(28, 29, 2, 2, 1, 'Indexed', '', 'Image28', ''),
(39, 40, 2, 2, 1, 'Indexed', '', 'Image39', ''),
(41, 42, 2, 2, 1, 'Indexed', '', 'Image41', ''),
(43, 44, 2, 2, 1, 'Indexed', '', 'Image43', '')]
In [5]: print(doc.xref_object(28)) # base image
<<
/Type /XObject
/Subtype /Image
/Width 2 # <===
/Height 2 # <===
/ColorSpace [ /Indexed /DeviceRGB 1 <FF0000FFFFFF> ]
/BitsPerComponent 1
/Interpolate false
/SMask 29 0 R
/Length 2
>>
In [6]: print(doc.xref_object(29)) # SMask
<<
/Type /XObject
/Subtype /Image
/Width 3186 # <===
/Height 401 # <===
/ColorSpace /DeviceGray
/BitsPerComponent 1
/Filter /FlateDecode
/Length 10321
>> There is no way in PyMuPDF to combine base image and its SMask in this case. |
Beta Was this translation helpful? Give feedback.
Page 0 contains 4 images with masks, where each image and its mask have different resolutions.
PyMuPDF does not support this.