Skip to content

Conversation

@jstrot
Copy link
Contributor

@jstrot jstrot commented Jan 7, 2024

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Version: 0.94

During OCR of a VOB PS, ccextractor can run out of buffer space if it has to copy all text since the last font tag (which can also be the beginning of the input):

$ ./ccextractor -1 -cc2 -out=srt -utf8 test.vob -o test.srt
...
Error: In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.

I believe the bug existed since that piece of code was introduced way back in 2017 (#844)

The fix simply makes sure the allocated buffer is big enough for this extra string.

Example crash under gdb:

$ gdb --args ./ccextractor -1 -cc2 -out=srt -utf8 test.vob -o test.srt
(gdb) run                     
Starting program: /home/jst/tools/src/ccextractor/linux/ccextractor -1 -cc2 -out=srt -utf8 test.vob -o test.srt  
[Thread debugging using libthread_db enabled]  
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".  
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.  
Teletext portions taken from Petr Kutalek's telxcc                                                               
--------------------------------------------------------------------------  
Input: test.vob                                                             
[Extract: 1] [Stream mode: Autodetect]                      
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]  
[Timing mode: Auto] [Debug: No] [Buffer input: No]                          
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]  
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]  
[Add font color data: Yes] [Add font typesetting: Yes]         
[Convert case: No][Filter profanity: No] [Video-edit join: No]  
[Extraction start time: not set (from start)]                        
[Extraction end time: not set (to end)]                              
[Live stream: No] [Clock frequency: 90000]              
[Teletext page: Autodetect]                                     
[Start credits text: None]                       
[Quantisation-mode: CCExtractor's internal function]  
                                                 
-----------------------------------------------------------------  
Opening file: test.vob                           
File seems to be a program stream, enabling PS mode   
Analyzing data in general mode                   
                                                                   
                                                 
New video information found                          
[720 * 480] [AR: 02 - 4:3] [FR: 04 - 29.97] [progressive: no]  
   
  0%  |  00:00                                   
...                          
Skip forward to the next Sequence or GOP start.  
 95%  |  19:38  
Skip forward to the next Sequence or GOP start.  
  
Skip forward to the next Sequence or GOP start.  
  
Thread 1 "ccextractor" hit Breakpoint 1, fatal (exit_code=1000, fmt=0x555555ee8da0 "In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.\n") at ../src/lib_ccx/utility.c:272  
272             va_start(args, fmt);  
(gdb) up  
#1  0x00005555557976ed in ocr_bitmap (arg=0x602000008250, palette=0x602000b1c390, alpha=0x602000b1c3b0 "", indata=0x62a000726200 "", w=556, h=42, copy=0x60400003c210) at ../src/lib_ccx/ocr.c:638  
638                                                             fatal(CCX_COMMON_EXIT_BUG_BUG, "In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.\n", errno);  
(gdb) list  
633                                             {  
634                                                     if ((new_text_out_iter - new_text_out) +  
635                                                             (last_font_tag_end - last_font_tag) >  
636                                                         length)  
637                                                     {  
638                                                             fatal(CCX_COMMON_EXIT_BUG_BUG, "In ocr_bitmap: Running out of memory. It shouldn't happen. Please report.\n", errno);  
639                                                     }  
640                                                     memcpy(new_text_out_iter, last_font_tag, last_font_tag_end - last_font_tag);  
641                                                     new_text_out_iter += last_font_tag_end - last_font_tag;  
642                                             }  
(gdb) p new_text_out_iter - new_text_out  
$1 = 96  
(gdb) p last_font_tag_end - last_font_tag  
$2 = 76  
(gdb) p length  
$3 = 158  
(gdb) p new_text_out_iter - new_text_out + last_font_tag_end - last_font_tag  
$4 = 172                                                                                                                                                                                                                                                                         

Before actually reaching this point I also had to fix an ASAN error with process_spu using memcpy on overlapping buffers. I can't say I understand why the buffers would be overlapping but using memmove at least fixes the error.

==611746==ERROR: AddressSanitizer: memcpy-param-overlap: memory ranges [0x7fffdf1eae84,0x7fffdf1eb528) and [0x7fffdf1ea800, 0x7fffdf1eaea4) overlap
    #0 0x7ffff786db25 in __interceptor_memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:899
    #1 0x5555556c2302 in process_spu ../src/lib_ccx/dvd_subtitle_decoder.c:387
    #2 0x5555556fe994 in process_data ../src/lib_ccx/general_loop.c:662
    #3 0x555555701650 in process_non_multiprogram_general_loop ../src/lib_ccx/general_loop.c:968
    #4 0x555555702248 in general_loop ../src/lib_ccx/general_loop.c:1062
    #5 0x5555556738ee in api_start ../src/ccextractor.c:204
    #6 0x555555675c39 in main ../src/ccextractor.c:465
    #7 0x7ffff64456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #8 0x7ffff6445784 in __libc_start_main_impl ../csu/libc-start.c:360
    #9 0x555555672c50 in _start (/home/jst/tools/src/ccextractor/linux/ccextractor+0x11ec50) (BuildId: 466667d3e95ff9aa8e7b1165aeac946dcfc18371)

@jstrot
Copy link
Contributor Author

jstrot commented Feb 1, 2025

Hi @cfsmp3, the only Rust formatting issue that's left and I don't know how to fix is:

warning: creating a mutable reference to mutable static is discouraged
   --> src/lib.rs:285:5
    |
285 |     ccx_options.copy_from_rust(opt);
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mutable reference to mutable static
    |
    = note: for more information, see <https://doc.rust-lang.org/nightly/edition-guide/rust-2024/static-mut-references.html>
    = note: mutable references to mutable statics are dangerous; it's undefined behavior if any other pointer to the static is used or if any other reference is created for the static while the mutable reference lives
    = note: `#[warn(static_mut_refs)]` on by default

Can you please enlighten me?

@wylited
Copy link

wylited commented Mar 5, 2025

You would need to use an unsafe block, because as the message error states, a race condition could arise.
You could put a small safety message

// safety: the C code is single-threaded, so accessing `ccx_options` is safe.
unsafe {
  ccx_options.copy_from_rust(opt);
}

I believe, though it depends on if the C code is single threaded, which I haven't checked.

@jstrot
Copy link
Contributor Author

jstrot commented Mar 6, 2025

@wylited, thanks for the tip. Unfortunately, it doesn't solve the warning:

diff --git a/src/rust/src/lib.rs b/src/rust/src/lib.rs
index 8b014cd1..597e14e4 100644
--- a/src/rust/src/lib.rs
+++ b/src/rust/src/lib.rs
@@ -282,7 +282,10 @@ pub unsafe extern "C" fn ccxr_parse_parameters(argc: c_int, argv: *mut *mut c_ch
     tlt_config = _tlt_config.to_ctype(&opt);

     // Convert the rust struct (CcxOptions) to C struct (ccx_s_options), so that it can be used by the C code
-    ccx_options.copy_from_rust(opt);
+    // safety: the C code is single-threaded and opt is locally scoped, so accessing `ccx_options` and `opt` is safe.^M
+    unsafe {^M
+        ccx_options.copy_from_rust(opt);^M
+    }^M

     if !_capitalization_list.is_empty() {
         capitalization_list = _capitalization_list.to_ctype();

I did end up finding a solution which is to copy to a temporary ccx_options before assigning that copy to the static. See 6264d5b

But, when I merged with latest master, someone already worked on fixing that issue by completely changing the C call to use a &raw mut pointer. Not as elegant as my solution -- but I'll take it.

A couple more of my formatting fixes got lost in the merge. Oh well. (On second look, all of them :-( )

Please approve if this is up to code!
Thanks

@ccextractor-bot

This comment was marked as off-topic.

@ccextractor-bot

This comment was marked as off-topic.

Copy link
Member

@prateekmedia prateekmedia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jstrot
Copy link
Contributor Author

jstrot commented Mar 8, 2025

Looks like the CI - linux check's result never made it back to GH. It's still in pending state several days after completion.

Can someone give it a nudge / restart it / bypass it?
Or I can just push an empty commit to force a new instance myself.

Let me know,
Thanks

@prateekmedia prateekmedia merged commit 9e2a594 into CCExtractor:master Mar 11, 2025
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants