Skip to content

Conversation

@cfsmp3
Copy link
Contributor

@cfsmp3 cfsmp3 commented Dec 20, 2025

Summary

Problem

UK Freeview DVB recordings from September 2022 onwards would fail at 0% with:

ATTENTION!!!!!!
In switch_to_next_file(): Processing of file.ts ended prematurely 185024 < 20000000

VLC and other players could display the subtitles correctly, but ccextractor would terminate immediately.

Root Cause

When read_video_pes_header() encounters a malformed or truncated PES packet, it returns -1. The calling function copy_capbuf_demux_data() then returned CCX_EOF (-101), which terminated the entire file. This was overly aggressive - a single broken PES packet should be skipped, not terminate the file.

Fix

Changed error handling to skip broken packets and continue:

// Before:
return CCX_EOF;  // terminates file

// After:
return CCX_OK;   // skips packet, continues

Test Results

Metric Before After
Progress 0% (185KB of 20MB) 100%
Error "ended prematurely" None
Subtitles 0 10 extracted correctly

Sample subtitle extracted: "of eight 400 ounce gold bars that were stolen in 1998."

All 299 Rust tests pass.

Test plan

🤖 Generated with Claude Code

cfsmp3 and others added 6 commits December 20, 2025 10:34
Fixes #1690 - Captions fail to extract on HEVC video stream

HEVC video streams with embedded EIA-608/708 captions weren't being
extracted, even though VLC/MPV could display them.

Root causes fixed:
1. HEVC stream type (0x24) wasn't recognized for CC extraction
2. HEVC NAL parsing used H.264 format (1-byte) instead of HEVC (2-byte)
3. HEVC SEI types (39/40) weren't handled (only H.264 SEI type 6)
4. CC data accumulation across SEIs caused u8 overflow/garbled output

Changes:
- C code: Add HEVC stream detection, CCX_HEVC buffer type, is_hevc flag
- Rust code: HEVC NAL header parsing (2-byte, type=(byte[0]>>1)&0x3F),
  HEVC SEI handling (PREFIX_SEI=39, SUFFIX_SEI=40), immediate CC flush

Thanks to @trufio465-bot for the initial research in PR #1735.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
HEVC uses B-frames extensively, causing CC data to arrive in decode
order instead of presentation order. This was causing character pairs
to be scrambled (e.g., "MEDIOCRE" became "MIOEDCRE").

Changes:
- Implement PTS-based sequence numbering for HEVC CC data (similar to H.264)
- Change flush logic to only trigger on IDR frames (not every VCL NAL)
- Add HEVC fallback detection for streams without PAT/PMT

Fixes #1639 (ATSC 3.0 HEVC caption extraction)
Tested with issue_1639_sample.ts and caption_test_1690.ts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The HEVC NAL type constants are defined for completeness and reference,
but not all are currently used in the codebase.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Fixes #1455

When read_video_pes_header() encounters a malformed or truncated PES
packet (returns -1), copy_capbuf_demux_data() previously returned
CCX_EOF which terminated the entire file processing. This was overly
aggressive - a single broken PES packet should be skipped, not
terminate the entire file.

UK Freeview DVB recordings from September 2022 onwards contain some
malformed PES packets in the DVB subtitle stream that triggered this
condition, causing ccextractor to stop at 0% with "Processing ended
prematurely" error even though VLC could display the subtitles.

The fix changes the error handling to skip the broken packet and
continue processing:
- Before: return CCX_EOF (terminates file)
- After: return CCX_OK (skips packet, continues)

Test results with UK Freeview sample:
- Before: 0% processed, 0 subtitles extracted
- After: 100% processed, 10 subtitles extracted correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit b9aabcd...:
Report Name Tests Passed
Broken 13/13
CEA-708 9/14
DVB 7/7
DVD 3/3
DVR-MS 2/2
General 24/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 81/86
Teletext 21/21
WTV 13/13
XDS 34/34

Your PR breaks these cases:

  • ccextractor --autoprogram --out=ttxt --latin1 1974a299f0...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65...
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b...
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 7236304cfc..., Last passed: Never
  • ccextractor --out=srt --latin1 611b4a9235..., Last passed: Never
  • ccextractor --out=sami --latin1 --autoprogram --no-goptime 5b4e0a6034..., Last passed: Never
  • ccextractor --service 1 --out=txt --no-bom --no-rollup ea83ff7bcb..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 5ae2007a79..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 1e44efd810..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 add511677c..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 9a496d3828..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 56c9f34548..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 e9b9008fdf..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 c032183ef0..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 27e46255f0..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 d037c7509e..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 b22260d065..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla c41f73056a..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 5d3a29f9f8..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 70000200c0..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 6dc772d881..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla adce82fd39..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 15feae9133..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 95dd33c6f1..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 01509e4d27..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla ab9cf8cfad..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --output-field 2 5d3a29f9f8..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --output-field 2 c41f73056a..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --sentencecap c032183ef0..., Last passed: Never
  • ccextractor --autoprogram --out=bin --latin1 c032183ef0..., Last passed: Never
  • ccextractor --hardsubx 1a0302f7fd..., Last passed: Never
  • ccextractor --hauppauge --autoprogram --out=srt --latin1 a03b5b2a56..., Last passed: Never
  • ccextractor --autoprogram --out=srt --hauppauge --latin1 553d78e755..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --hauppauge --ucla --latin1 553d78e755..., Last passed: Never
  • ccextractor --out=dvdraw c83f765c66..., Last passed: Never
  • ccextractor --out=txt c83f765c66..., Last passed: Never
  • ccextractor --out=ttxt c83f765c66..., Last passed: Never
  • ccextractor --out=spupng c83f765c66..., Last passed: Never
  • ccextractor --goptime c83f765c66..., Last passed: Never
  • ccextractor --unixts 5 --out=txt c83f765c66..., Last passed: Never
  • ccextractor --out=txt --datets c83f765c66..., Last passed: Never
  • ccextractor --out=txt --sects c83f765c66..., Last passed: Never
  • ccextractor --out=txt --lf c83f765c66..., Last passed: Never
  • ccextractor --in=es dc7169d7c4..., Last passed: Never
  • ccextractor --in=bin 988d4e8bba..., Last passed: Never
  • ccextractor --endcreditstext "CCextractor Ends crdit Testing" addf5e2fc9..., Last passed: Never
  • ccextractor --endcreditsforatleast 3 --endcreditstext "CCextractor Ends crdit Testing" addf5e2fc9..., Last passed: Never
  • ccextractor --endcreditsforatmost 2 --endcreditstext "CCextractor Ends crdit Testing" addf5e2fc9..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --datets dcada745de..., Last passed: Never
  • ccextractor --out=srt --latin1 f23a544ba8..., Last passed: Never
  • ccextractor --out=srt --latin1 d7e7dbdf68..., Last passed: Never
  • ccextractor --out=srt --latin1 76734ac4a7..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 725a49f871..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla e274a73653..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds b22260d065..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --xds --ucla c813e713a0..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 27fab4dbb6..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds bbd5bb52fc..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds b992e0cccb..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds d0291cdcf6..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 7d2730d38e..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds c8dc039a88..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 53339f3455..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla 53339f3455..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 83b03036a2..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7d3f25c32c..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla 7d3f25c32c..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds f41d4c29a1..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 88cd42b89a..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 0069dffd21..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit b9aabcd...:
Report Name Tests Passed
Broken 11/13
CEA-708 9/14
DVB 7/7
DVD 3/3
DVR-MS 2/2
General 23/27
Hardsubx 0/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 81/86
Teletext 21/21
WTV 12/13
XDS 25/34

Your PR breaks these cases:

  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b...
  • ccextractor --out=srt --latin1 611b4a9235...
  • ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9...
  • ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 15feae9133...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 95dd33c6f1...
  • ccextractor --hardsubx 1a0302f7fd...
  • ccextractor --out=dvdraw c83f765c66...
  • ccextractor --in=bin 988d4e8bba...
  • ccextractor --endcreditstext "CCextractor Ends crdit Testing" addf5e2fc9...
  • ccextractor --endcreditsforatleast 3 --endcreditstext "CCextractor Ends crdit Testing" addf5e2fc9...
  • ccextractor --endcreditsforatmost 2 --endcreditstext "CCextractor Ends crdit Testing" addf5e2fc9...
  • ccextractor --out=srt --latin1 d7e7dbdf68...
  • ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla e274a73653...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds b22260d065...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 27fab4dbb6...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds bbd5bb52fc...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds b992e0cccb...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds d0291cdcf6...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 7d2730d38e...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 88cd42b89a...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7...

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=ttxt --latin1 7236304cfc..., Last passed: Never
  • ccextractor --out=sami --latin1 --autoprogram --no-goptime 5b4e0a6034..., Last passed: Never
  • ccextractor --service 1 --out=txt --no-bom --no-rollup ea83ff7bcb..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 f1422b8bfe..., Last passed: Never
  • ccextractor --datapid 5603 --autoprogram --out=srt --latin1 --teletext 85c7fc1ad7..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 5ae2007a79..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 1e44efd810..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 add511677c..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 56c9f34548..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 e9b9008fdf..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 d037c7509e..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 1974a299f0..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 b22260d065..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla c41f73056a..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 6dc772d881..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla adce82fd39..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 01509e4d27..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla ab9cf8cfad..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --output-field 2 c41f73056a..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --sentencecap c032183ef0..., Last passed: Never
  • ccextractor --autoprogram --out=bin --latin1 c032183ef0..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --hauppauge --autoprogram --out=srt --latin1 a03b5b2a56..., Last passed: Never
  • ccextractor --autoprogram --out=srt --hauppauge --latin1 553d78e755..., Last passed: Never
  • ccextractor --out=txt c83f765c66..., Last passed: Never
  • ccextractor --out=ttxt c83f765c66..., Last passed: Never
  • ccextractor --out=spupng c83f765c66..., Last passed: Never
  • ccextractor --goptime c83f765c66..., Last passed: Never
  • ccextractor --unixts 5 --out=txt c83f765c66..., Last passed: Never
  • ccextractor --out=txt --datets c83f765c66..., Last passed: Never
  • ccextractor --out=txt --sects c83f765c66..., Last passed: Never
  • ccextractor --out=txt --lf c83f765c66..., Last passed: Never
  • ccextractor --in=es dc7169d7c4..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 c0d2fba8c0..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 006fdc391a..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 e92a1d4d2a..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 7e4ebf7fd7..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 9256a60e4b..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 27d7a43dd6..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 297a44921a..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 efbe129086..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 eae0077731..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 e2e2b501e0..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 c6407fb294..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --datets dcada745de..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --tpage 398 5d5838bde9..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --teletext --tpage 398 3b276ad8bf..., Last passed: Never
  • ccextractor --out=srt --latin1 a226cc302d..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds c8dc039a88..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla 53339f3455..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla 7d3f25c32c..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@cfsmp3 cfsmp3 merged commit feb2a61 into master Dec 20, 2025
29 of 31 checks passed
@cfsmp3 cfsmp3 deleted the fix/issue-1455-uk-freeview-dvb branch December 21, 2025 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Processing of file ended prematurely in switch_to_next_file() on UK Freeview DVB subtitles

3 participants