Replies: 2 comments 1 reply
-
Tesseract is installing with a simple |
Beta Was this translation helpful? Give feedback.
-
I understand there's no control over the installation of the tesseract-ocr packages. What I'm bringing up is, the installing/update script might need to be updated following StirlingPDF documentation related to where the tesseract-ocr is placed. https://docs.stirlingpdf.com/Advanced%20Configuration/OCR/ What I mentioned is a good workaround if people want to use the OCR feature and run into the problem I did. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I was trying to clean-up various PDF files to be used with a local LLM. When the nordic encoder is used it reads a lot of data as garbled mess. To fix this, I went to my local Stirling-PDF install to try and clean up the OCR, but when I tried, it didn't show anything!
Digging some more, I found that the tesseract-data was not installed correctly. A couple things I found out:
/usr/share/tesseract-ocr/5
when Stirling-PDF is looking for/usr/share/tessdata
by default/opt/Stirling-PDF/.env
does not contain theTESSDATA_PREFIX
which is the path to the data location. (You could use the existing path, but the Stirling-PDF docs suggest thetessdata
directory).I fixed these items and restarted the service and everything looks to work!
I think the Stirling-PDF install/update script should be adding this info to make it work out of the box.
Anyone else have this issue? Or is it my install being bad? I did make this LXC before the refactor.
Beta Was this translation helpful? Give feedback.
All reactions