Mingling Korean Doc Image with Korean Text #2372
-
Hi, I got following issue: My current OCR does not support Korean. But it seems it supports text extraction from a text-pdf with Korean characters. Thus, I thought I might just use another OCR for Korean and create a text-pdf which my existing OCR will pick up. Thus, I managed to:
However, my OCR does not pick it up? Do I miss something substantially? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
What is your "current OCR"? easyocr? Why don't you try PyMuPDF's builtin interface to Tesseract-OCR (which must be installed of course). When invoking it, make sure to supply the language spec for Korean to it. |
Beta Was this translation helpful? Give feedback.
-
Ah sorry,did not fully understand your comment in the first place: |
Beta Was this translation helpful? Give feedback.
Ah sorry,did not fully understand your comment in the first place:
No: an OCR engine only interprets images. Standard text therefore is not taken into account.
In PyMuPDF, there are ways to deal with a micture of OCRed and standard text.