Skip to content

[FEAT] Port SRT encoder to Rust#2227

Open
DhanushVarma-2 wants to merge 1 commit intoCCExtractor:masterfrom
DhanushVarma-2:feat/srt-encoder-rust
Open

[FEAT] Port SRT encoder to Rust#2227
DhanushVarma-2 wants to merge 1 commit intoCCExtractor:masterfrom
DhanushVarma-2:feat/srt-encoder-rust

Conversation

@DhanushVarma-2
Copy link
Copy Markdown
Contributor

@DhanushVarma-2 DhanushVarma-2 commented Mar 24, 2026

Ported all 5 functions from ccx_encoders_srt.c to Rust:

  • write_stringz_as_srt → ccxr_write_stringz_as_srt (text subtitles)
  • write_cc_buffer_as_srt → ccxr_write_cc_buffer_as_srt (CEA-608 with autodash)
  • write_cc_subtitle_as_srt → ccxr_write_cc_subtitle_as_srt (subtitle chain + teletext multi-page)
  • write_cc_bitmap_as_srt → ccxr_write_cc_bitmap_as_srt (OCR, behind hardsubx_ocr feature)
  • write_stringz_as_srt_to_output (internal helper)

Rust is called by default. C fallback behind #ifdef DISABLE_RUST.

Tested on Matroska test files (Elephant Dreams) with 8 subtitle tracks
including English and Hungarian with UTF-8 accented characters.
All extract correctly.

@DhanushVarma-2 DhanushVarma-2 force-pushed the feat/srt-encoder-rust branch 7 times, most recently from d9b3a3a to c6523ce Compare March 25, 2026 11:07
@DhanushVarma-2
Copy link
Copy Markdown
Contributor Author

Screenshot 2026-03-25 at 9 16 39 PM Output

@DhanushVarma-2 DhanushVarma-2 force-pushed the feat/srt-encoder-rust branch 5 times, most recently from af58374 to a1dc78a Compare March 26, 2026 11:07
@DhanushVarma-2
Copy link
Copy Markdown
Contributor Author

The bitmap function can't be wired right now. CMake sets ENABLE_HARDSUBX for C code but never calls corrosion_set_features to pass hardsubx_ocr to the Rust crate. So the Rust function doesn't get compiled in hardsubx builds, and wiring it from C causes a linker error....that's what happened when the OCR Docker CI failed earlier.

The Rust code is there and ready. To actually wire it, someone needs to add corrosion_set_features(ccx_rust hardsubx_ocr) inside the WITH_HARDSUBX block in CMakeLists.txt. I'll do that as a follow-up PR to keep this one clean.

And yes get_teletext_output and get_teletext_srt_counter both exist in ccx_encoders_common.c (lines 1356 and 1460)
They've been there since the teletext multi-page feature was added.

@ccextractor-bot
Copy link
Copy Markdown
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 03ad9e8...:
Report Name Tests Passed
Broken 10/13
CEA-708 1/14
DVB 4/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 79/86
Teletext 20/21
WTV 13/13
XDS 34/34

Your PR breaks these cases:

  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 b22260d065..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 01509e4d27..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --out=spupng c83f765c66..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

Copy link
Copy Markdown
Member

@steel-bucket steel-bucket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare Master Branch SP test results and yours

Comment on lines +9 to +11
#ifndef DISABLE_RUST
extern int ccxr_write_stringz_as_srt(const char *string, struct encoder_ctx *context, LLONG ms_start, LLONG ms_end);
extern int ccxr_write_cc_buffer_as_srt(struct eia608_screen *data, struct encoder_ctx *context);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DISABLE_RUST is deprecated

Comment on lines +10 to +18
extern "C" {
fn get_decoder_line_encoded(
ctx: *mut encoder_ctx,
buffer: *mut c_uchar,
line_num: c_int,
data: *const eia608_screen,
) -> c_uint;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import Externs from the way we do in src/rust/src/lib.rs, not this way

@DhanushVarma-2 DhanushVarma-2 force-pushed the feat/srt-encoder-rust branch from a1dc78a to 43ef45f Compare March 27, 2026 08:34
@ccextractor-bot
Copy link
Copy Markdown
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 74e3842...:
Report Name Tests Passed
Broken 9/13
CEA-708 1/14
DVB 4/7
DVD 3/3
DVR-MS 2/2
General 22/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 81/86
Teletext 20/21
WTV 13/13
XDS 31/34

Your PR breaks these cases:

  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b...
  • ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9...
  • ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc...
  • ccextractor --autoprogram --out=srt --latin1 b22260d065...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e...
  • ccextractor --autoprogram --out=ttxt --latin1 01509e4d27...
  • ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e...
  • ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7...

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --out=spupng c83f765c66..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

Implement ccxr_write_stringz_as_srt and ccxr_write_cc_buffer_as_srt
in src/rust/src/encoder/srt.rs. Covers:
- Subtitle counter and timestamp formatting with -1ms overlap prevention
- \n unescape handling for multi-line subtitles
- Encoding conversion (UTF-8, Latin1, UCS-2)
- Autodash detection for CEA-608 screen buffers
- Speaker name detection (colon-based)

Uses existing Rust encoder infrastructure (encode_line, write_wrapped)
and calls C get_decoder_line_encoded for CEA-608 line encoding until
that function is also ported.

Exported as #[no_mangle] extern C functions ready to replace the C
versions in ccx_encoders_srt.c.
@DhanushVarma-2 DhanushVarma-2 force-pushed the feat/srt-encoder-rust branch from 43ef45f to 043901b Compare March 28, 2026 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants