Change Visibility of OCR'd pdf text layer #3537
Replies: 1 comment
-
OCR-ed text may have been made invisible in a number of different ways. Sometimes the text is written in "background", such that the image covers it. You can locate the respective PDF command Sometimes the text is written in "background", such that the image covers it. You can locate the respective PDF command The only way I see is this approach:
The only way I see is this approach:
Sometimes the text is written in "background", such that the image covers it. You can locate the respective PDF command The only way I see is this approach:
This is what comes out in your test case: Pretty ugly ... 🤷♂️ |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Is your feature request related to a problem? Please describe.
I have OCR'd an image to generate a text layer over the image. This text layer is invisible in the pdf. I then use ghostscript to remove image and vector data to just keep the text layer to further reduce file size but keep page textual structure intact.
TestOCR.pdf - OCR'd image as pdf
TestOCR_textonly.pdf - removed image and vector data using ghostscript -dFILTERIMAGE -dFILTERVECTOR, We can highlight over this "blank" pdf to see the text layer is still there.
TestOCR.pdf
TestOCR_textonly.pdf
Describe the solution you'd like
Make this text layer visible in TestOCR_textonly.pdf. I want the OCR'd text to be visible following the same structural layout as the input.
Can I change the render mode or color for all the text in this pdf to be visible?
My pipeline will eventually deal with very large pdf files, so would like the solution to be performant as well.
@JorjMcKie I have tried your solutions for changing text font color found here but to no avail. Would really appreciate any support.
Beta Was this translation helpful? Give feedback.
All reactions