Stirling-PDF OCR clean-up yields no visible UI (with fix) #7225

stiggy87 · 2025-08-26T17:17:30Z

stiggy87
Aug 26, 2025

I was trying to clean-up various PDF files to be used with a local LLM. When the nordic encoder is used it reads a lot of data as garbled mess. To fix this, I went to my local Stirling-PDF install to try and clean up the OCR, but when I tried, it didn't show anything!

Digging some more, I found that the tesseract-data was not installed correctly. A couple things I found out:

The tesseract-data was installed in /usr/share/tesseract-ocr/5 when Stirling-PDF is looking for /usr/share/tessdata by default
The /opt/Stirling-PDF/.env does not contain the TESSDATA_PREFIX which is the path to the data location. (You could use the existing path, but the Stirling-PDF docs suggest the tessdata directory).

I fixed these items and restarted the service and everything looks to work!

I think the Stirling-PDF install/update script should be adding this info to make it work out of the box.

Anyone else have this issue? Or is it my install being bad? I did make this LXC before the refactor.

tremor021 · 2025-08-26T20:21:03Z

tremor021
Aug 26, 2025
Collaborator

Tesseract is installing with a simple apt-get install -y 'tesseract-ocr-*'. We don't have controll over how tesseract deb packages install

0 replies

stiggy87 · 2025-08-26T20:28:30Z

stiggy87
Aug 26, 2025
Author

I understand there's no control over the installation of the tesseract-ocr packages.

What I'm bringing up is, the installing/update script might need to be updated following StirlingPDF documentation related to where the tesseract-ocr is placed. https://docs.stirlingpdf.com/Advanced%20Configuration/OCR/

What I mentioned is a good workaround if people want to use the OCR feature and run into the problem I did.

1 reply

MickLesk Aug 27, 2025
Maintainer

Isn't it much easier to just create a symlink?

smarthomelawyer · 2025-09-17T16:28:53Z

smarthomelawyer
Sep 17, 2025

I was trying to clean-up various PDF files to be used with a local LLM. When the nordic encoder is used it reads a lot of data as garbled mess. To fix this, I went to my local Stirling-PDF install to try and clean up the OCR, but when I tried, it didn't show anything!

Digging some more, I found that the tesseract-data was not installed correctly. A couple things I found out:

The tesseract-data was installed in /usr/share/tesseract-ocr/5 when Stirling-PDF is looking for /usr/share/tessdata by default

The /opt/Stirling-PDF/.env does not contain the TESSDATA_PREFIX which is the path to the data location. (You could use the existing path, but the Stirling-PDF docs suggest the tessdata directory).

I fixed these items and restarted the service and everything looks to work!

I think the Stirling-PDF install/update script should be adding this info to make it work out of the box.

Anyone else have this issue? Or is it my install being bad? I did make this LXC before the refactor.

Same issue - do you mind explaining how you fixed it?

1 reply

smarthomelawyer Sep 17, 2025

Never mind - I found a prior discussion on tteck's GitHub that fixed the issue https://github.com/tteck/Proxmox/discussions/2538

in short, I ran: cp -r /usr/share/tesseract-ocr/5/* /usr/share/

derreisende77 · 2025-11-30T17:52:20Z

derreisende77
Nov 30, 2025

ln -s /usr/share/tesseract-ocr/5/tessdata /usr/share/tessdata is enought in the LXC to make the languages available. No need to copy the files.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Stirling-PDF OCR clean-up yields no visible UI (with fix) #7225

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Stirling-PDF OCR clean-up yields no visible UI (with fix) #7225

Uh oh!

stiggy87 Aug 26, 2025

Replies: 4 comments · 2 replies

Uh oh!

tremor021 Aug 26, 2025 Collaborator

Uh oh!

stiggy87 Aug 26, 2025 Author

Uh oh!

MickLesk Aug 27, 2025 Maintainer

Uh oh!

smarthomelawyer Sep 17, 2025

Uh oh!

smarthomelawyer Sep 17, 2025

Uh oh!

derreisende77 Nov 30, 2025

stiggy87
Aug 26, 2025

Replies: 4 comments 2 replies

tremor021
Aug 26, 2025
Collaborator

stiggy87
Aug 26, 2025
Author

MickLesk Aug 27, 2025
Maintainer

smarthomelawyer
Sep 17, 2025

derreisende77
Nov 30, 2025