Can I convert from pix to PIL without saving to disk? #1678
-
Hi! I am currently doing this to extract images from PDFs and then feeding those to tesseract-OCR, it seems that I'm wasting, time, quality and not optimizing the code when I write to PNG and read the PNG in the next line, is there any way to feed a pixmap to a Image element?
Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 11 comments 1 reply
-
Sure there is!
To improve the OCR detection, you can render the PDF page at higher quality by using a matrix in |
Beta Was this translation helpful? Give feedback.
-
Thank you @JorjMcKie its a single image per page and I'm already using matrix 2,2 with great results. |
Beta Was this translation helpful? Give feedback.
-
Option 2 is about one third faster than option 1 with a similar memory footprint ... Option 1 is important for feeding image data to some packages, most notably |
Beta Was this translation helpful? Give feedback.
-
Thankyou A LOT @JorjMcKie it works great, could I have any problem if the source image is grayscale or b&w? will it "scale" to RGB? |
Beta Was this translation helpful? Give feedback.
-
You are welcome! |
Beta Was this translation helpful? Give feedback.
-
I'm VERY happy with the result currently using
Thank you a lot! |
Beta Was this translation helpful? Give feedback.
-
close issue? |
Beta Was this translation helpful? Give feedback.
-
Can you please give a code snippet for |
Beta Was this translation helpful? Give feedback.
-
@rakesh4real @alejandrofm -
|
Beta Was this translation helpful? Give feedback.
-
For those who wants numpy array not PIL.Image
|
Beta Was this translation helpful? Give feedback.
-
@canklot - there is an even more direct and much faster way for ndarrays: see here. |
Beta Was this translation helpful? Give feedback.
@alejandrofm
Can you please give a code snippet for
fitz.csGRAY
andmode R