Replies: 3 comments 12 replies
-
What precisely do you mean by "forces my script to exit"? You need to paste the actual error, otherwise I can't really judge what this may be about. Is it a python traceback, or even a C crash? However, I have two notes already:
|
Beta Was this translation helpful? Give feedback.
-
Thank you very much for following up and reporting the bug. FYI: I'm on a Windows 11 machine, Python3, and I ran the script in VSCode and IDLE, and on the Windows command line, all with the same result. I tried with your patch from the devel_new branch, but it produced the same result. Good catch that it's the pages with transparency that fail; I also noticed that if I print the ImageNotExtractableError in the The only other file the script failed with, is from the same creator: 1143CabdGhaniNabulusi.HadraUnsiyya.pdf My script is part of a pipeline that should process dozens of texts at a time. Although the problem seems to be with pdfium, it would be great if the pypdfium call to |
Beta Was this translation helpful? Give feedback.
-
Brilliant, it worked! My pipeline now works without the annoying exit. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to extract images from PDFs by looping over the page objects; if there's only one image object and no text object of the page, I try to extract the image (to retain the original quality of the image); in all other cases, I render the page and save the rendered page as an image (see the code below).
In some cases, pypdfium fails to extract the image directly (without rendering it); this has to do with pdfium itself, as described here: https://issues.chromium.org/issues/42270939. In those cases, I'm rendering the page and storing the rendered page as an image.
However, with some PDFs, this doesn't work either, as the call to pdfium_c.FPDFImageObj_GetBitmap() forces my script to exit instead of returning an error, as it is designed to do:
Does anyone have an idea why the code exits instead of raising an error, and how I could solve this?
This is an example of a PDF where this fails: 0309Hallaj.Diwan.pdf
This is my (slightly simplified) code:
Beta Was this translation helpful? Give feedback.
All reactions