pixmap must be grayscale or rgb to write as png #1880
-
Hi. I've tried to use this guide. When I try to read all images from the pdf, I got this error.
Do you have any idea or can you help me? |
Beta Was this translation helpful? Give feedback.
Replies: 14 comments 1 reply
-
There obviously are images in the pdf with more than 3 color components. You must either store those in some CMYK image Format (png won’t work) or convert it to RGB first. I’m underway currently, so please have a look at examples extract-imga.py, where this is done, too.
Von meinem iPhone gesendet
Am 18.03.2020 um 12:02 schrieb IT Engineer. <[email protected]>:
Hi. I've tried to use this guide.
https://github.com/pymupdf/PyMuPDF-Utilities/blob/master/examples/extract-imga.py
When I try to read all images from the pdf, I got this error.
mupdf: pixmap must be grayscale or rgb to write as png
Traceback (most recent call last):
File "image.py", line 97, in <module>
imgdata = pix.getPNGData()
File "/home/qwe/ocr-env/lib/python3.6/site-packages/fitz/fitz.py", line 4170, in getPNGData
barray = self._getImageData(1)
File "/home/qwe/ocr-env/lib/python3.6/site-packages/fitz/fitz.py", line 4151, in _getImageData
return _fitz.Pixmap__getImageData(self, format)
RuntimeError: pixmap must be grayscale or rgb to write as png
Do you have any idea or can you help me?
I attached the PDF file what I've tried.
Thank you.
BCBSMI EOB.pdf<https://github.com/pymupdf/PyMuPDF/files/4349984/BCBSMI.EOB.pdf>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#469>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB7IDIU4KMCT35FAA4MZYDTRIDWCVANCNFSM4LOU4AGA>.
|
Beta Was this translation helpful? Give feedback.
-
Hi, I've already run https://github.com/pymupdf/PyMuPDF-Utilities/blob/master/examples/extract-imga.py and It shows the above issues. Please give me advise. |
Beta Was this translation helpful? Give feedback.
-
Ok, I am back home now. Let me check your PDF. |
Beta Was this translation helpful? Give feedback.
-
Thank you. |
Beta Was this translation helpful? Give feedback.
-
Aha, resolved the issue: The latest PyMuPDF also accepts the ICC color system, therefore corlorspaces may be presented which do have the right number of color components but still are neither DeviceGRAY, nor DeviceRGB. This required an adjustment of |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
This is the part which can't detect with this library. |
Beta Was this translation helpful? Give feedback.
-
Your PDF is a complex example! You almost have to rewrite a PDF viewer for a full analysis. Here are my findings:
Here is a more advanced text extraction script, which should extract the text in the correct reading sequence: Here is a script to extract XObjects (like the red box on page 2): |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I've also read fully document, however, I didn't catch how I can get the x,y,width,height information about the image. |
Beta Was this translation helpful? Give feedback.
-
I don't know your full motivation behind all this. But here are a few hints that may help: doc = fitz.open("BCB...")
page=doc[1] # page 2
imglist = doc.getPageImageList(1, True) # full image list of that page
bbox = page.getImageBbox(imglist[0]) # this is img-46.png
# just to demonstrate we do have it:
page.addRectAnnot(bbox) # gives this: When we extract the images, the mask is automatically detected and applied! My own extraction produced this. Not 100% the same colors, but good. The difference is probably caused by the conversion to RGB. You can also try to not convert to RGB in that script. Use an image format which supports CMYK like PAM or Photoshop image (PSD): ...
if pix.colorspace.name not in (fitz.csGRAY.name, fitz.csRGB.name):
pix.writeImage("xxxx.pam") But in your case this does not work either. If you use the MuPDF command line tool So I guess that is what you can get from me ...
|
Beta Was this translation helpful? Give feedback.
-
@devpro9219 - Assuming your questions were answered. |
Beta Was this translation helpful? Give feedback.
-
Hi, this script fo working fine for me but extract the image in Grey in my pdf file all image in CMYK formate can you help how solve this |
Beta Was this translation helpful? Give feedback.
-
You must find a file format supporting CMYK. There are a selected few directly supported by |
Beta Was this translation helpful? Give feedback.
You must find a file format supporting CMYK. There are a selected few directly supported by
Pixmap.save()
. If none suits you, check the Pillow documentation for one such and then usePixmap.pil_save(...)
with the right parameters - again, please consult Pillow docu for choosing the right parameters in place of...
.