Skip to content

Commit 34cd891

Browse files
authored
Merge branch 'master' into feat/ssa-ass-precise-positioning
2 parents 25f546a + fc4a14e commit 34cd891

File tree

14 files changed

+431
-123
lines changed

14 files changed

+431
-123
lines changed

.github/workflows/build_appimage.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ jobs:
4242
4343
- name: Checkout repository
4444
if: steps.should_build.outputs.should_build == 'true'
45-
uses: actions/checkout@v4
45+
uses: actions/checkout@v6
4646

4747
- name: Install base dependencies
4848
if: steps.should_build.outputs.should_build == 'true'
@@ -93,7 +93,7 @@ jobs:
9393
- name: Cache GPAC build
9494
if: steps.should_build.outputs.should_build == 'true'
9595
id: cache-gpac
96-
uses: actions/cache@v4
96+
uses: actions/cache@v5
9797
with:
9898
path: /usr/local/lib/libgpac*
9999
key: gpac-v2.4.0-ubuntu22
@@ -143,14 +143,14 @@ jobs:
143143
144144
- name: Upload AppImage artifact
145145
if: steps.should_build.outputs.should_build == 'true'
146-
uses: actions/upload-artifact@v4
146+
uses: actions/upload-artifact@v6
147147
with:
148148
name: ${{ steps.appimage_name.outputs.name }}
149149
path: linux/${{ steps.appimage_name.outputs.name }}
150150

151151
- name: Upload to Release
152152
if: steps.should_build.outputs.should_build == 'true' && github.event_name == 'release'
153-
uses: softprops/action-gh-release@v1
153+
uses: softprops/action-gh-release@v2
154154
with:
155155
files: linux/${{ steps.appimage_name.outputs.name }}
156156
env:

.github/workflows/release.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,10 +34,10 @@ jobs:
3434
LLVM_CONFIG_PATH: "C:\\Program Files\\LLVM\\bin\\llvm-config"
3535
CARGO_TARGET_DIR: "..\\..\\windows"
3636
BINDGEN_EXTRA_CLANG_ARGS: -fmsc-version=0
37-
run: msbuild ccextractor.sln /p:Configuration=Release-Full /p:Platform=Win32
37+
run: msbuild ccextractor.sln /p:Configuration=Release-Full /p:Platform=x64
3838
working-directory: ./windows
3939
- name: Copy files to directory for installer
40-
run: mkdir installer; cp ./Release-Full/ccextractorwinfull.exe ./installer; cp ./Release-Full/*.dll ./installer
40+
run: mkdir installer; cp ./x64/Release-Full/ccextractorwinfull.exe ./installer; cp ./x64/Release-Full/*.dll ./installer
4141
working-directory: ./windows
4242
- name: install WiX
4343
run: dotnet tool install --global wix --version 4.0.0-preview.0 && wix extension -g add WixToolset.UI.wixext

docs/CHANGES.TXT

Lines changed: 23 additions & 92 deletions
Original file line numberDiff line numberDiff line change
@@ -1,107 +1,38 @@
1-
0.96 (2025-12-21)
1+
0.96 (2025-12-23)
22
-----------------
33
- New: Added ASS/SSA \pos-based positioning for CEA-608 captions when layout is simple (1-2 rows)
4+
- New: Multi-page teletext extraction support (#665)
5+
- Extract multiple teletext pages simultaneously with separate output files
6+
- Use --tpage multiple times (e.g., --tpage 100 --tpage 200)
7+
- Output files are named with page suffix (e.g., output_p100.srt, output_p200.srt)
8+
49
- New: Added --list-tracks (-L) option to list all tracks in media files without processing
5-
- Fix: Garbled captions from HDHomeRun and I/P-only H.264 streams (#1109)
6-
- Fix: Enable stdout output for CEA-708 captions on Windows (#1693)
7-
- Fix: McPoodle DVD raw format read/write - properly handle loop markers (#1524)
8-
- Fix: Variable shadowing in general_loop causing false "premature end of file" messages
9-
- Fix: Double-free crash in teletext cleanup when processing multiple files
10-
- Fix: Uninitialized memory and memory leaks found by Valgrind testing
11-
- Fix: Dangling pointers in Rust FFI copy_from_rust functions
12-
- New: Improve -out=report to show detected Teletext subtitle pages (#1034)
13-
- FIX: Include ATSC VCT virtual channel numbers and call signs in XMLTV output
14-
- FIX: Restore ATSC XMLTV generation with ETT parsing for extended descriptions, multi-segment handling, extended table ID's (EIT/VCT), corrected <programme> XMLTV formatting, buffer bounds fixes
15-
- Fix: DVB subtitle extraction improvements for Chinese broadcasts (#224):
16-
- Fix crash in parse_PMT() due to missing bounds checks
17-
- Fix negative timestamps in DVB subtitle output
18-
- Fix crash in ignore_alpha_at_edge() OCR cropping
19-
- Improve DVB subtitle OCR accuracy with image inversion
20-
- Fix --ocrlang to accept Tesseract language names (chi_tra, chi_sim, etc.)
21-
- Add case-insensitive matching for --dvblang parameter
22-
- FIX: Add HEVC/H.265 stream type recognition to prevent crashes on ATSC 3.0 streams
23-
- New: Add demuxer and file_functions module in lib_ccxr (#1662)
24-
- Fix: handle row_count decrease in CEA-708 C decoder
25-
- Fix: Bounds checks to prevent panic on malformed CEA-708 data
26-
- Fix: Multiprogram logic in is_decoder_processed_enough() causing false warnings
27-
- Fix: Write consistent 2-byte UTF-16BE encoding for CEA-708 captions (Japanese/Chinese)
28-
- New: Add --ttxtforcelatin option to force Latin G0 charset in Teletext
29-
- Fix: Add fallback for TS files without PAT/PMT tables
30-
- Fix: PTS jump handling to continue fts_now updates after jump
31-
- Fix: Null checks for unchecked memory allocations throughout codebase
32-
- Fix: Null checks and invalid UTF-8 handling in Rust FFI functions
33-
- Fix: Panics in timing code when processing multiple files
34-
- Fix: Caption start/end times to match FFmpeg timing in MP4/MPEG/TS
35-
- Fix: Correctly count and store multiple input files
36-
- Fix: Handle MP4 c608 tracks and improve garbage frame detection
37-
- Fix: Update fts_now for each frame in elementary streams
38-
- Fix: Preserve CR time during pop-on to roll-up transition
39-
- Fix: Defer min_pts until frame type is known
40-
- Fix: Skip leading non-I-frames when setting min_pts
41-
- Fix: Memory leaks in ts_tables_epg, ocr, and ccx_encoders_spupng
42-
- Fix: Buffer overruns in 708_output, mcc_encoder, utility, xds_decoder
43-
- Fix: Replace sprintf/strcpy with bounds-checked snprintf/strncpy in encoders
44-
- Fix: HHMMSSFFF format for ttxt output timestamps
45-
- Fix: Always emit position codes at start of SCC caption
46-
- Fix: Memory safety issues in ccx_decoders_common
47-
- Fix: Null checks after malloc calls in dvb_subtitle_decoder
48-
- Fix: Memory safety checks and memory leaks in Matroska parser
49-
50-
0.95 (2025-09-15)
51-
-----------------
52-
- Fix: ARM64/aarch64 build failure due to c_char type mismatch in nal.rs
53-
- Fix: HardSubX OCR on Rust
54-
- Removed the Share Module
55-
- Fix: Regression failures on DVD files
56-
- Fix: Segmentation faults on MP4 files with CEA-708 captions
57-
- Refactor: Remove API structures from ccextractor
58-
- New: Add Encoder Module to Rust
59-
- Fix: Elementary stream regressions
60-
- Fix: Segmentation faults on XDS files
61-
- Fix: Clippy Errors Based on Rust 1.88
62-
- IMPROVEMENT: Refactor and optimize Dockerfile
63-
- Fix: Improved handling of IETF language tags in Matroska files (#1665)
64-
- New: Create unit test for rust code (#1615)
65-
- Breaking: Major argument flags revamp for CCExtractor (#1564 & #1619)
10+
New: Chinese, Korean, Japanese support - proper encoding and OCR.
11+
New: Correct McPoodle DVD raw format support
12+
Fix: Timing is now frame perfect (using FFMpeg timing dump as reference) in all formats.
13+
Fix: Solved garbling in all the pending issues we had on GitHub.
14+
Fix: All causes of "premature end of file" messages due to bugs and not actual file cuts.
15+
Fix: All memory leaks, double frees and usual C nastyness that valgrind could find.
16+
- Fix Include ATSC VCT virtual channel numbers and call signs in XMLTV output
17+
- Fix: Restore ATSC XMLTV generation with ETT parsing for extended descriptions, multi-segment handling, extended table ID's (EIT/VCT), corrected <programme> XMLTV formatting, buffer bounds fixes
18+
- Fix: Add HEVC/H.265 stream type recognition to prevent crashes on ATSC 3.0 streams.
19+
Fix: Tolerance to damaged streams - recover where possible instead of terminating.
20+
Issues closed: Over 40! Too many to list here, but each of them was either a bug squashed or a feature implemented.
21+
22+
0.95 (2025-09-15 - never formally packaged)
23+
-----------------
6624
- New: Create a Docker image to simplify the CCExtractor usage without any environmental hustle (#1611)
67-
- New: Add time units module in lib_ccxr (#1623)
68-
- New: Add bits and levenshtein module in lib_ccxr (#1627)
69-
- New: Add constants module in lib_ccxr (#1624)
70-
- New: Add log module in lib_ccxr (#1622)
71-
- New: Create `lib_ccxr` and `libccxr_exports` (#1621)
72-
- Fix: Unexpected behavior of get_write_interval (#1609)
73-
- Update: Bump rsmpeg to latest version for ffmpeg bindings (#1600)
7425
- New: Add SCC support for CEA-708 decoder (#1595)
75-
- Fix: respect `-stdout` even if multiple CC tracks are present in a Matroska input file (#1453)
76-
- Fix: crash in Rust decoder on ATSC1.0 TS Files (#1407)
77-
- Removed the --with-gui flag for linux/configure and mac/configure (use the Flutter GUI instead)
26+
Refactor: Lots of code ported to Rust.
27+
- Fix: Improved handling of IETF language tags in Matroska files (#1665)
28+
- Breaking: Major argument flags revamp for CCExtractor (#1564 & #1619)
7829
- Fix: segmentation fault in using hardsubx
79-
- New: Add function (and command) that extracts closed caption subtitles as well as burnt-in subtitles from a file in a single pass. (As proposed in issue 726)
80-
- Refactored: the `general_loop` function has some code moved to a new function
8130
- Fix: WebVTT X-TIMESTAMP-MAP placement (#1463)
82-
- Disable X-TIMESTAMP-MAP by default (changed option --no-timestamp-map to --timestamp-map)
83-
- Fix: missing `#` in color attribute of font tag
8431
- Fix: ffmpeg 5.0, tesseract 5.0 compatibility and remove deprecated methods
8532
- Fix: tesseract 5.x traineddata location in ocr
86-
- Fix: fix autoconf tesseract detection problem (#1503)
87-
- Fix: add missing compile_info_real.h source to Autotools build
88-
- Fix: add missing `-lavfilter` for hardsubx linking
89-
- Fix: make webvtt-full work correctly with multi-byte utf-8 characters
90-
- Fix: encoding of solid block in latin-1 and unicode
91-
- Fix: McPoodle Broadcast Raw format for field 1
92-
- Fix: Incorrect skipping of packets
93-
- Fix: Repeated values for enums
94-
- Cleanup: Remove the (unmaintained) Nuklear GUI code
95-
- Cleanup: Reduce the amount of Windows build options in the project file
96-
- Fix: infinite loop in MP4 file type detector.
97-
- Improvement: Use Corrosion to build Rust code
9833
- Improvement: Ignore MXF Caption Essence Container version byte to enhance SRT subtitle extraction compatibility
9934
- New: Add tesseract page segmentation modes control with `--psm` flag
100-
- Fix: Resolve compile-time error about implicit declarations (#1646)
101-
- Fix: fatal out of memory error extracting from a VOB PS
102-
- Fix: Unit Test Rust failing due to changes in Rust Version 1.86.0
10335
- Fix: Support for MINGW-w64 cross compiling
104-
- Fix: Build with ENABLE_FFMPEG to support ffmpeg 5
10536

10637
0.94 (2021-12-14)
10738
-----------------

src/lib_ccx/ccx_common_structs.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,9 @@ struct cc_subtitle
8484
/** Raw PTS value when this subtitle started (for DVB timing) */
8585
LLONG start_pts;
8686

87+
/** Teletext page number (for multi-page extraction, issue #665) */
88+
uint16_t teletext_page;
89+
8790
struct cc_subtitle *next;
8891
struct cc_subtitle *prev;
8992
};

src/lib_ccx/ccx_encoders_common.c

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -721,6 +721,9 @@ void dinit_encoder(struct encoder_ctx **arg, LLONG current_fts)
721721
write_subtitle_file_footer(ctx, ctx->out + i);
722722
}
723723

724+
// Clean up teletext multi-page output files (issue #665)
725+
dinit_teletext_outputs(ctx);
726+
724727
free_encoder_context(ctx->prev);
725728
dinit_output_ctx(ctx);
726729
freep(&ctx->subline);
@@ -840,6 +843,15 @@ struct encoder_ctx *init_encoder(struct encoder_cfg *opt)
840843
ctx->segment_last_key_frame = 0;
841844
ctx->nospupngocr = opt->nospupngocr;
842845

846+
// Initialize teletext multi-page output arrays (issue #665)
847+
ctx->tlt_out_count = 0;
848+
for (int i = 0; i < MAX_TLT_PAGES_EXTRACT; i++)
849+
{
850+
ctx->tlt_out[i] = NULL;
851+
ctx->tlt_out_pages[i] = 0;
852+
ctx->tlt_srt_counter[i] = 0;
853+
}
854+
843855
ctx->prev = NULL;
844856
return ctx;
845857
}
@@ -1300,3 +1312,168 @@ void switch_output_file(struct lib_ccx_ctx *ctx, struct encoder_ctx *enc_ctx, in
13001312
enc_ctx->cea_708_counter = 0;
13011313
enc_ctx->srt_counter = 0;
13021314
}
1315+
1316+
/**
1317+
* Get or create the output file for a specific teletext page (issue #665)
1318+
* Creates output files on-demand with suffix _pNNN (e.g., output_p891.srt)
1319+
* Returns NULL if we're in stdout mode or if too many pages are being extracted
1320+
*/
1321+
struct ccx_s_write *get_teletext_output(struct encoder_ctx *ctx, uint16_t teletext_page)
1322+
{
1323+
// If teletext_page is 0, use the default output
1324+
if (teletext_page == 0 || ctx->out == NULL)
1325+
return ctx->out;
1326+
1327+
// Check if we're sending to stdout - can't do multi-page in that case
1328+
if (ctx->out[0].fh == STDOUT_FILENO)
1329+
return ctx->out;
1330+
1331+
// Check if we already have an output file for this page
1332+
for (int i = 0; i < ctx->tlt_out_count; i++)
1333+
{
1334+
if (ctx->tlt_out_pages[i] == teletext_page)
1335+
return ctx->tlt_out[i];
1336+
}
1337+
1338+
// If we only have one teletext page requested, use the default output
1339+
// (no suffix needed for backward compatibility)
1340+
extern struct ccx_s_teletext_config tlt_config;
1341+
if (tlt_config.num_user_pages <= 1 && !tlt_config.extract_all_pages)
1342+
return ctx->out;
1343+
1344+
// Need to create a new output file for this page
1345+
if (ctx->tlt_out_count >= MAX_TLT_PAGES_EXTRACT)
1346+
{
1347+
mprint("Warning: Too many teletext pages to extract (max %d), using default output for page %03d\n",
1348+
MAX_TLT_PAGES_EXTRACT, teletext_page);
1349+
return ctx->out;
1350+
}
1351+
1352+
// Allocate the new write structure
1353+
struct ccx_s_write *new_out = (struct ccx_s_write *)malloc(sizeof(struct ccx_s_write));
1354+
if (!new_out)
1355+
{
1356+
mprint("Error: Memory allocation failed for teletext output\n");
1357+
return ctx->out;
1358+
}
1359+
memset(new_out, 0, sizeof(struct ccx_s_write));
1360+
1361+
// Create the filename with page suffix
1362+
const char *ext = get_file_extension(ctx->write_format);
1363+
char suffix[16];
1364+
snprintf(suffix, sizeof(suffix), "_p%03d", teletext_page);
1365+
1366+
char *basefilename = NULL;
1367+
if (ctx->out[0].filename != NULL)
1368+
{
1369+
basefilename = get_basename(ctx->out[0].filename);
1370+
}
1371+
else if (ctx->first_input_file != NULL)
1372+
{
1373+
basefilename = get_basename(ctx->first_input_file);
1374+
}
1375+
else
1376+
{
1377+
basefilename = strdup("untitled");
1378+
}
1379+
1380+
if (basefilename == NULL)
1381+
{
1382+
free(new_out);
1383+
return ctx->out;
1384+
}
1385+
1386+
char *filename = create_outfilename(basefilename, suffix, ext);
1387+
free(basefilename);
1388+
1389+
if (filename == NULL)
1390+
{
1391+
free(new_out);
1392+
return ctx->out;
1393+
}
1394+
1395+
// Open the file
1396+
new_out->filename = filename;
1397+
new_out->fh = open(filename, O_RDWR | O_CREAT | O_TRUNC | O_BINARY, S_IREAD | S_IWRITE);
1398+
if (new_out->fh == -1)
1399+
{
1400+
mprint("Error: Failed to open output file %s: %s\n", filename, strerror(errno));
1401+
free(filename);
1402+
free(new_out);
1403+
return ctx->out;
1404+
}
1405+
1406+
mprint("Creating teletext output file: %s\n", filename);
1407+
1408+
// Store in our array
1409+
int idx = ctx->tlt_out_count;
1410+
ctx->tlt_out[idx] = new_out;
1411+
ctx->tlt_out_pages[idx] = teletext_page;
1412+
ctx->tlt_srt_counter[idx] = 0;
1413+
ctx->tlt_out_count++;
1414+
1415+
// Write the subtitle file header
1416+
write_subtitle_file_header(ctx, new_out);
1417+
1418+
return new_out;
1419+
}
1420+
1421+
/**
1422+
* Get the SRT counter for a specific teletext page (issue #665)
1423+
* Returns pointer to the counter, or NULL if page not found
1424+
*/
1425+
unsigned int *get_teletext_srt_counter(struct encoder_ctx *ctx, uint16_t teletext_page)
1426+
{
1427+
// If teletext_page is 0, use the default counter
1428+
if (teletext_page == 0)
1429+
return &ctx->srt_counter;
1430+
1431+
// Check if we're using multi-page mode
1432+
extern struct ccx_s_teletext_config tlt_config;
1433+
if (tlt_config.num_user_pages <= 1 && !tlt_config.extract_all_pages)
1434+
return &ctx->srt_counter;
1435+
1436+
// Find the counter for this page
1437+
for (int i = 0; i < ctx->tlt_out_count; i++)
1438+
{
1439+
if (ctx->tlt_out_pages[i] == teletext_page)
1440+
return &ctx->tlt_srt_counter[i];
1441+
}
1442+
1443+
// Not found, use default counter
1444+
return &ctx->srt_counter;
1445+
}
1446+
1447+
/**
1448+
* Clean up all teletext output files (issue #665)
1449+
*/
1450+
void dinit_teletext_outputs(struct encoder_ctx *ctx)
1451+
{
1452+
if (!ctx)
1453+
return;
1454+
1455+
for (int i = 0; i < ctx->tlt_out_count; i++)
1456+
{
1457+
if (ctx->tlt_out[i] != NULL)
1458+
{
1459+
// Write footer
1460+
write_subtitle_file_footer(ctx, ctx->tlt_out[i]);
1461+
1462+
// Close file
1463+
if (ctx->tlt_out[i]->fh != -1)
1464+
{
1465+
close(ctx->tlt_out[i]->fh);
1466+
}
1467+
1468+
// Free filename
1469+
if (ctx->tlt_out[i]->filename != NULL)
1470+
{
1471+
free(ctx->tlt_out[i]->filename);
1472+
}
1473+
1474+
free(ctx->tlt_out[i]);
1475+
ctx->tlt_out[i] = NULL;
1476+
}
1477+
}
1478+
ctx->tlt_out_count = 0;
1479+
}

0 commit comments

Comments
 (0)