-
Notifications
You must be signed in to change notification settings - Fork 519
fix(hardsubx): Fix heap corruption from Rust/C allocator mismatch #1847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR triggers a fresh CI run to analyze all failing regression tests and determine whether each needs a ground truth update or a code fix. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
The freep() function expects a pointer-to-pointer (void**) so it can dereference, free, and NULL-out the pointer. The code was passing lctx->dec_sub directly instead of &lctx->dec_sub. This caused freep to interpret the first 8 bytes of the cc_subtitle struct as a pointer and attempt to free() it, resulting in a crash (SIGABRT/exit code 134) in the memory allocator. Fixes Test 241 (Hardsubx) crash on Sample Platform. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
1. Remove invalid free(tessdata_path) - probe_tessdata_location() returns a pointer to static strings or getenv() result, not heap memory. 2. Fix alloc-dealloc mismatch in OCR text handling: - TessBaseAPIGetUTF8Text() allocates with C++ operator new[] - The code was freeing with C free() causing allocator mismatch - Now properly copy string and use TessDeleteText() before returning - Unified all OCR text return paths to use Rust-allocated strings 3. Previous fix: freep(&lctx->dec_sub) instead of freep(lctx->dec_sub) These fixes resolve Test 241 (Hardsubx) crash on Sample Platform. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Free basefilename in _dinit_hardsubx (allocated by get_basename) - Free subtitle_text after each frame processing iteration - Free prev_subtitle_text when replaced and at end of function - Free sws_ctx with sws_freeContext (was never freed) Reduces memory leaks from 63,926 bytes to 0 bytes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
The hardsubx code was using C's free() on strings allocated by Rust's CString::into_raw(). Since Rust and C use different memory allocators, this caused heap corruption that manifested as garbage OCR output after processing ~27 subtitle frames. Changes: - Export free_rust_c_string() from Rust as extern "C" function - Declare free_rust_c_string() in hardsubx.h for C code - Replace free(subtitle_text) with free_rust_c_string(subtitle_text) in hardsubx_decoder.c for Rust-allocated strings - Fix memory leaks in process_hardsubx_linear_frames_and_normal_subs() where subtitle_text_hard and prev_subtitle_text_hard were not freed - Remove dummy CI trigger file (no longer needed) Testing: - AddressSanitizer: No memory errors detected - Valgrind: 0 bytes definitely lost, 0 bytes indirectly lost - Manual testing: OCR output now correct for entire video duration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
The rcwt_loop function was returning exit code 10 (no captions) even when CEA-608 captions were successfully extracted from RCWT/BIN format files. This happened because CEA-608 decoding writes directly to the encoder via printdata() without setting dec_sub->got_output. Add a check after the main loop (similar to general_loop) that also considers enc_ctx->srt_counter, enc_ctx->cea_708_counter, and dec_ctx->saw_caption_block to properly detect when captions were found. Fixes regression test 217 which was failing with exit code 10. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 138ccd0...:
Your PR breaks these cases:
NOTE: The following tests have been failing on the master branch as well as the PR:
Congratulations: Merging this PR would fix the following tests:
It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you). Check the result page for more info. |
This PR triggers a fresh CI run to verify the combined effect of: - PR #1847: Hardsubx crash fix, memory leak fixes, rcwt exit code fix - PR #1848: XDS empty content entries fix 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
This PR triggers a fresh CI run to verify the combined effect of: - PR #1847: Hardsubx crash fix, memory leak fixes, rcwt exit code fix - PR #1848: XDS empty content entries fix 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 138ccd0...:
Your PR breaks these cases:
NOTE: The following tests have been failing on the master branch as well as the PR:
Congratulations: Merging this PR would fix the following tests:
It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you). Check the result page for more info. |
Summary
Fixes heap corruption in hardsubx (burned-in subtitle extraction) that caused garbage OCR output after processing ~27 subtitle frames.
Root Cause
The C code was using
free()on strings allocated by Rust'sCString::into_raw(). Since Rust and C use different memory allocators, this caused heap corruption that accumulated over time, eventually corrupting the OCR results.Changes
free_rust_c_string()asextern "C"functionfree_rust_c_string()for C codefree(subtitle_text)withfree_rust_c_string(subtitle_text)for Rust-allocated stringsprocess_hardsubx_linear_frames_and_normal_subs()wheresubtitle_text_hardandprev_subtitle_text_hardwere not freedTesting
Before/After
Before (memory corruption):
After (correct output):
🤖 Generated with Claude Code