-
|
I assume this is a Tesseract thing and not a fault with SubtitleEdit - does everybody else have to constantly battle with I's and L's being recognised as a pipe/bar when doing OCR? I have never in my life, ever, seen any subtitle ever that contains the pipe character, so why does it gets used as a guess so often? I would find others' thoughts on this and any tricks to reduce this very interesting. In my opinion, the pipe symbol should never. ever be used as a guess when trying to OCR and I cannot fathom why it is currently. Cheers! |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
|
Yes, the pipes are from Tesseract. |
Beta Was this translation helpful? Give feedback.
-
|
Hi Nik, sorry for the long delay. I've been OCRing again recently and hit a few pipes in The Goonies (UHD), sup attached: |
Beta Was this translation helpful? Give feedback.
-
|
Thx for the file - love OCR'ing new BD Sups :) Actually, with latest SE beta which uses Tesseract 5.0.1 I got not pipes, yay. nOCR also works well with this file. |
Beta Was this translation helpful? Give feedback.
-
|
Great news! I will update to the beta and continue my current OCR mission. I'm doing my whole UHD collection because when I rip the disc and watch in HDR via Kodi/Plex, the subs are far too bright. Using various player's options to have grey/dark grey subs helps but doesn't go far enough imo. Anyway, if I come across any particularly bad pipe cases going forward, I'll upload the .sup and give you a nudge here. |
Beta Was this translation helpful? Give feedback.
-
|
Hm, I might need a little more help here, I must have mis-configured something. I've updated to the beta and tried OCRing the Goonies again, I'm still getting pipes. Here's the result of the OCR which shows my settings too: https://i.imgur.com/xtso5fV.jpg I thought maybe it was because I have the "Engine mode" set to "Original Tesseract only" (I like italics) so I tried again with "Default" and unticked the "Fallback to Tesseract 3.02" option. This looks to solve the pipe problem but it also makes many basic mistakes now, example: https://i.imgur.com/5MSRLU9.jpg Is it just a case of picking my poison or have I done something obviously wrong here? edit: unticking the [Italic] box has solved Default mode making lots of mistakes, it looks very accurate now albeit without any italics at all. So I guess I have to choose between general accuracy and italic support? |
Beta Was this translation helpful? Give feedback.
-
|
Works fine here. You probably need the beta dictionary/ocr folder too... |
Beta Was this translation helpful? Give feedback.

Works fine here. You probably need the beta dictionary/ocr folder too...