-
Notifications
You must be signed in to change notification settings - Fork 519
[FIX] DVB OCR: Memory Leak & Quantization Issues #1675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 9e2a594...:
All tests passing on the master branch were passed completely. NOTE: The following tests have been failing on the master branch as well as the PR:
Check the result page for more info. |
|
This seems reasonable. Hopefully we'll have a working test platform soon to verify :-( @canihavesomecoffee |
That is what I'm currently working on :) |
|
CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 9e2a594...:
All tests passing on the master branch were passed completely. NOTE: The following tests have been failing on the master branch as well as the PR:
Congratulations: Merging this PR would fix the following tests:
Check the result page for more info. |
* fix: do not free ocr text before return * fix(OCR): erode and dilate function
* fix: do not free ocr text before return * fix(OCR): erode and dilate function
In raising this pull request, I confirm the following (please check boxes):
My familiarity with the project is as follows (check one):
Closes #985
DVB subtitle extraction is currently broken on the latest master build. I've verified this by testing it on the following few files:
09-ITV_Red_Heat.ts2016-12-15-BBC4.tsCHANNEL_4_2016-06-21.tschan7_BBC NEWS.tsI've found two root issues on why this is the case:
ocr_bitmap()is freed before being returned. Removing this free causes memory leaks (as pointed out by Memory leak on OCR code #1511).quantize_map()function--quant 0 (or 2)with the first fix enables proper extraction of DVB subtitles.I've spent the past two days trying to understand this function and have narrowed it down to the
erode()function introduced in PR 1510. I believe this is better explained visually, so here are the subtitle bitmaps before and after theerode()call for two different video files:Before
After
Fixes
erode()function, I noticed that the text was being eroded based on transparency rather than the text background. This method will only work for bitmaps which have their quantized text color be transparent.I've modified erode and dilate so that they now use the text and text background color rather than the alpha.
I'm getting these colors from the loop which populates the mcit variable. This approach has been pretty successful in my limited amount of testing, however it relies on the assumption that the background and text color will always be the second and third most frequently occurring colors respectively.
channel5-2018-02-12.tsis one exception though, in it the text color is the fourth most frequently occurring color (black, the bg color is repeated twice for some reason). So erosion succeeds but dilation fails, the result is still better than the raw quantized results but it might be worthwhile to disable quantization by default.