Illegal dimensions for pixmap #1327
-
Hi! I am getting a mupdf error for some specific documents: >>> doc[0].get_text("html")
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-17-0ab91b87cb83> in <module>
----> 1 doc[0].get_text("html")
~/miniconda3/envs/layoutlm38/lib/python3.8/site-packages/fitz/utils.py in get_text(page, option, clip, flags, textpage)
672 tp = textpage
673 if tp is None:
--> 674 tp = page.get_textpage(clip=clip, flags=flags)
675 elif getattr(tp, "parent") != page:
676 raise ValueError("not a textpage of this page")
~/miniconda3/envs/layoutlm38/lib/python3.8/site-packages/fitz/fitz.py in get_textpage(self, clip, flags)
5604 self.set_rotation(0)
5605 try:
-> 5606 textpage = self._get_text_page(clip, flags=flags)
5607 finally:
5608 if old_rotation != 0:
~/miniconda3/envs/layoutlm38/lib/python3.8/site-packages/fitz/fitz.py in _get_text_page(self, clip, flags)
5593 self, clip: rect_like = None, flags: int = 0
5594 ) -> "TextPage":
-> 5595 val = _fitz.Page__get_text_page(self, clip, flags)
5596 val.thisown = True
5597
RuntimeError: Illegal dimensions for pixmap 0 -128 Same for This is probably due to a poorly written pdf with some negative pixmap. |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 6 replies
-
can you share the file or just one problem page of it? |
Beta Was this translation helpful? Give feedback.
-
Sure!
Jorj
From: Victor ***@***.***>
Sent: Sonntag, 17. Oktober 2021 11:02
To: ***@***.***>
Cc: Jorj X. ***@***.***>; ***@***.***>
Subject: Re: [pymupdf/PyMuPDF] Illegal dimensions for pixmap (Discussion #1327)
Can I send that to ***@***.***? I can't make it public
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#1327 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB7IDIWUIS6FW3QUYS44ME3UHLQQPANCNFSM5GE4Q3DQ>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Beta Was this translation helpful? Give feedback.
-
Looked at it. |
Beta Was this translation helpful? Give feedback.
-
It also is outside PyMuPDF, but a MuPDF issue. If doing this under Linux: $ mutool draw -o test-linux.html test-illegal-pixmap.pdf
page test-illegal-pixmap.pdf 1error: Illegal dimensions for pixmap 0 -128
warning: Ignoring error during interpretation
page test-illegal-pixmap.pdf 2 Whereas under Windows it does work >mutool draw -o test-windows.html test-illegal-pixmap.pdf
page test-illegal-pixmap.pdf 1
page test-illegal-pixmap.pdf 2 Although some images are shown upside down etc. |
Beta Was this translation helpful? Give feedback.
-
The "image upside down" issue can at least be clarified under windows. Under Linux, again this error pops up: >>> imglist = page.get_images(True)
>>> pprint(page.get_image_bbox(imglist[0], transform=True))
(Rect(456.37799072265625, 554.771728515625, 597.3779907226562, 586.771728515625),
Matrix(141.0, 0.0, -0.0, -32.0, 456.37799072265625, 586.771728515625))
>>> pprint(page.get_image_bbox(imglist[1], transform=True))
(Rect(445.0393981933594, 29.196847915649414, 583.93701171875, 49.10923767089844),
Matrix(138.89759826660156, 0.0, -0.0, -19.912389755249023, 445.0393981933594, 49.10923767089844))
>>> Both transformation matrices have a negative d value and a positive a value, showing that the original images had been inserted upside down in the pdf. |
Beta Was this translation helpful? Give feedback.
-
Located the problem a bit better:
bboxes = page.get_bboxlog()
for i, item in enumerate(bboxes):
print(i, item)
38 ('fill-shade', (-2147483648.0, -2147483648.0, 2147483520.0, 2147483520.0)) What you see in the bbox tuple, is the new definition of an infinite rectangle. If then looking at the page definition: In [11]: print(doc.xref_object(page.xref))
<<
/Contents 28 0 R
/MediaBox [ 0 0 612 792 ]
/Parent 27 0 R
/Resources <<
/Font 1 0 R
/ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
/Shading <<
/Sh0 2 0 R
>>
/XObject <<
/FormXob.093205580d257a5e510e567391a4c4de 4 0 R
/FormXob.7c82326558f3ec6161464e23f3002a66 3 0 R
>>
>>
/Rotate 0
/Trans <<
>>
/TrimBox [ 0 0 612 792 ]
/Type /Page
>> exhibits a shade definition at xref 2, named " >>> page.get_contents() # get /Contents xrefs
28
>>> cont = bytearray(page.read_contents())
>>> # looking for command invoking that shading:
>>> cont.find(b"/Sh0 sh")
2751
>>> # now remove this byte string:
>>> cont[2751:2758] = b""
>>> # write back to page contents
>>> doc.update_stream(28, cont)
>>> doc.save("no-shade.pdf") The resulting PDF then shows no error ... and looks almost the same 😎. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
That's OK!
Em dom., 17 de out. de 2021 às 17:12, Jorj X. McKie <
***@***.***> escreveu:
… It worked! Here is a mutilated PDF with all text removed except for the
headings. Only first page is left over.
But the error is still there:
$ mutool draw -o cleaned.svg cleaned.pdf
page cleaned.pdf 1error: Overly large image
warning: Ignoring error during interpretation
$ mutool draw -o cleaned.html cleaned.pdf
page cleaned.pdf 1error: Illegal dimensions for pixmap 0 -128
warning: Ignoring error during interpretation
This is how the page looks like:
[image: grafik]
<https://user-images.githubusercontent.com/8290722/137643289-ce7f85f8-a1e5-49fd-a9a7-9e8d5d8e4c66.png>
If you are ok with this, I would like to submit this file to MuPDF ...
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1327 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGKYMCS3BJQZ5GGIGAUTHQ3UHMUZHANCNFSM5GE4Q3DQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Beta Was this translation helpful? Give feedback.
Located the problem a bit better:
page.get_bboxlog()
I found that there is one operation which creates an infinite bbox: namely the instruction that creates the yellow shaded area near the top.What you see in the bbox tuple, is the new definition of an infinite rectangle. If then looking at the page definition: