Skip to content

Conversation

@cfsmp3
Copy link
Contributor

@cfsmp3 cfsmp3 commented Dec 20, 2025

Summary

Problem

The AVC parser would fail with "Leading bytes are non-zero" error when processing HLS/Twitch stream segments. These segments often start mid-stream without proper NAL unit headers at the beginning.

Root cause: When process_avc encountered non-zero leading bytes, it returned an error with 0 bytes processed. The C code would not remove any bytes from the buffer, causing subsequent data to accumulate with the corrupt beginning, leading to infinite errors and no caption extraction.

Solution

  • Add find_nal_start_code() function to search for valid NAL start codes (3-byte or 4-byte)
  • If buffer doesn't start with 0x00 0x00, search for first NAL start code
  • Skip garbage data before first valid NAL unit
  • Return full buffer length when no NAL found (clears the buffer, prevents accumulation)
  • Change forbidden_zero_bit error from fatal to skip-and-continue

Test plan

Before fix:

Error in process_avc: BrokenStream("Leading bytes are non-zero")
Error in process_avc: BrokenStream("Leading bytes are non-zero")
... (repeated for every PES packet)

After fix:

Number of NAL_type_7: 5
Total frames time:    00:00:10:000  (600 frames at 60.00fps)
Done, processing time = 0 seconds

Extracted captions:

1
00:00:04,066 --> 00:00:06,332
congratulations to papa plot and
their community for winning ■k71
million in a TV contest through 

2
00:00:06,334 --> 00:00:09,998
their community for winning ■k71
million in a TV contest through 
Twitch. I mean, just wild. Um

🤖 Generated with Claude Code

…ractor#1626)

The AVC parser would fail with "Leading bytes are non-zero" error when
processing HLS/Twitch stream segments that start mid-stream without
proper NAL unit headers at the beginning.

Root cause: When process_avc encountered non-zero leading bytes, it
returned an error with 0 bytes processed. The C code would not remove
any bytes from the buffer, causing subsequent data to accumulate with
the corrupt beginning, leading to infinite errors.

Fix:
- Add find_nal_start_code() to search for valid NAL start codes
- If buffer doesn't start with 0x00 0x00, search for first NAL start
- Skip garbage data before first valid NAL unit
- Return full buffer length when no NAL found (clears the buffer)
- Change forbidden_zero_bit error from fatal to skip-and-continue

Tested with 6 Twitch HLS sample files - all now process correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit a0593c6...:
Report Name Tests Passed
Broken 9/13
CEA-708 8/14
DVB 6/7
DVD 0/3
DVR-MS 2/2
General 4/27
Hardsubx 0/1
Hauppage 0/3
MP4 3/3
NoCC 10/10
Options 72/86
Teletext 20/21
WTV 10/13
XDS 13/34

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=srt --latin1 1d9731bd80..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 73d9313d64..., Last passed: Never
  • ccextractor --out=ttxt --latin1 001dd8cdf7..., Last passed: Never
  • ccextractor --out=srt --latin1 06b3a9237d..., Last passed: Never
  • ccextractor --out=srt --latin1 83f8cceb74..., Last passed: Never
  • ccextractor --out=srt --latin1 b46e9e8e3f..., Last passed: Never
  • ccextractor --out=srt --latin1 89e417e622..., Last passed: Never
  • ccextractor --out=srt --latin1 d59eadc4ed..., Last passed: Never
  • ccextractor --out=srt --latin1 4d4e938ef6..., Last passed: Never
  • ccextractor --service 1 --out=txt f17524b53f..., Last passed: Never
  • ccextractor --service 1 --out=txt da904de35d..., Last passed: Never
  • ccextractor --service 1 --out=txt 80848c45f8..., Last passed: Never
  • ccextractor --service 1 --out=srt da904de35d..., Last passed: Never
  • ccextractor --service 1 --out=sami da904de35d..., Last passed: Never
  • ccextractor --service 1 --out=ttxt da904de35d..., Last passed: Never
  • ccextractor --service all da904de35d..., Last passed: Never
  • ccextractor --service 1,2[UTF-8],3[EUC-KR],54 --out=txt da904de35d..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 1020459a86..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 f1422b8bfe..., Last passed: Never
  • ccextractor --datapid 5603 --autoprogram --out=srt --latin1 --teletext 85c7fc1ad7..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 d41b53b504..., Last passed: Never
  • ccextractor --stdout --quiet --no-fontcolor 79a51f3500..., Last passed: Never
  • ccextractor --stdout --quiet --no-fontcolor 767b546f96..., Last passed: Never
  • ccextractor --wtvconvertfix --autoprogram --out=srt --latin1 acf871cbfd..., Last passed: Never
  • ccextractor --wtvconvertfix --autoprogram --out=srt --latin1 5cbb21adb6..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 1974a299f0..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 15feae9133..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --in=mp4 --out=srt --latin1 b2771c84c2..., Last passed: Never
  • ccextractor --in=mp4 --out=srt --latin1 5df914ce77..., Last passed: Never
  • ccextractor --autoprogram --out=srt --bom --latin1 8849331dda..., Last passed: Never
  • ccextractor --mp4vidtrack --autoprogram --out=ttxt --latin1 adc0a818c3..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 08bdf0e2c1..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 bee139671a..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 3842d00925..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 80af83c038..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 837b02f722..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 41dab6b2a7..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 4b117b4d66..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 26ee6add4d..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds fca0dce412..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --output-field 1 a65d39ccb3..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --output-field 2 a65d39ccb3..., Last passed: Never
  • ccextractor --autoprogram c83f765c66..., Last passed: Never
  • ccextractor --service 1 c83f765c66..., Last passed: Never
  • ccextractor --in=ts c83f765c66..., Last passed: Never
  • ccextractor --out=srt c83f765c66..., Last passed: Never
  • ccextractor --out=sami c83f765c66..., Last passed: Never
  • ccextractor --out=smptett c83f765c66..., Last passed: Never
  • ccextractor --no-goptime c83f765c66..., Last passed: Never
  • ccextractor --fixpadding c83f765c66..., Last passed: Never
  • ccextractor --90090 c83f765c66..., Last passed: Never
  • ccextractor --myth c83f765c66..., Last passed: Never
  • ccextractor --program-number 1 c83f765c66..., Last passed: Never
  • ccextractor --datapid 256 c83f765c66..., Last passed: Never
  • ccextractor --datastreamtype 2 c83f765c66..., Last passed: Never
  • ccextractor --datastreamtype 2 --streamtype 2 c83f765c66..., Last passed: Never
  • ccextractor --no-autotimeref c83f765c66..., Last passed: Never
  • ccextractor --bom c83f765c66..., Last passed: Never
  • ccextractor --no-bom c83f765c66..., Last passed: Never
  • ccextractor --unicode c83f765c66..., Last passed: Never
  • ccextractor --utf8 c83f765c66..., Last passed: Never
  • ccextractor --latin1 c83f765c66..., Last passed: Never
  • ccextractor --no-fontcolor c83f765c66..., Last passed: Never
  • ccextractor --no-typesetting c83f765c66..., Last passed: Never
  • ccextractor --trim c83f765c66..., Last passed: Never
  • ccextractor --sentencecap c83f765c66..., Last passed: Never
  • ccextractor --capfile /repository/Dictionary/MattS_dictionary.txt c83f765c66..., Last passed: Never
  • ccextractor --autodash --trim c83f765c66..., Last passed: Never
  • ccextractor --bufferinput c83f765c66..., Last passed: Never
  • ccextractor --no-bufferinput c83f765c66..., Last passed: Never
  • ccextractor --buffersize 1M c83f765c66..., Last passed: Never
  • ccextractor --dru c83f765c66..., Last passed: Never
  • ccextractor --no-rollup c83f765c66..., Last passed: Never
  • ccextractor --ru1 c83f765c66..., Last passed: Never
  • ccextractor --ru2 c83f765c66..., Last passed: Never
  • ccextractor --ru3 c83f765c66..., Last passed: Never
  • ccextractor --delay 200 c83f765c66..., Last passed: Never
  • ccextractor --startat 4 --endat 7 c83f765c66..., Last passed: Never
  • ccextractor --no-codec dvbsub c83f765c66..., Last passed: Never
  • ccextractor --debug --out=srt c83f765c66..., Last passed: Never
  • ccextractor --608 --out=srt c83f765c66..., Last passed: Never
  • ccextractor --708 --out=srt c83f765c66..., Last passed: Never
  • ccextractor --goppts --out=srt c83f765c66..., Last passed: Never
  • ccextractor --xdsdebug --out=srt c83f765c66..., Last passed: Never
  • ccextractor --vides --out=srt c83f765c66..., Last passed: Never
  • ccextractor --cbraw --out=srt c83f765c66..., Last passed: Never
  • ccextractor --no-sync --out=srt c83f765c66..., Last passed: Never
  • ccextractor --fullbin --out=srt c83f765c66..., Last passed: Never
  • ccextractor --parsedebug --out=srt c83f765c66..., Last passed: Never
  • ccextractor --parsePAT --out=srt c83f765c66..., Last passed: Never
  • ccextractor --parsePMT --out=srt c83f765c66..., Last passed: Never
  • ccextractor --investigate-packets --out=srt c83f765c66..., Last passed: Never
  • ccextractor --in=ps e9b9008fdf..., Last passed: Never
  • ccextractor --in=asf 6395b281ad..., Last passed: Never
  • ccextractor --in=wtv b46e9e8e3f..., Last passed: Never
  • ccextractor --in=raw fb79021542..., Last passed: Never
  • ccextractor --in=mp4 b2771c84c2..., Last passed: Never
  • ccextractor --mp4vidtrack 5df914ce77..., Last passed: Never
  • ccextractor --wtvconvertfix acf871cbfd..., Last passed: Never
  • ccextractor --wtvmpeg2 10f0f77cf4..., Last passed: Never
  • ccextractor --hauppauge d6df1b227a..., Last passed: Never
  • ccextractor --codec dvbsub --out=spupng 85271be4d2..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --tpage 801 4e56e88ba4..., Last passed: Never
  • ccextractor --tverbose 4e56e88ba4..., Last passed: Never
  • ccextractor --teletext 4e56e88ba4..., Last passed: Never
  • ccextractor --out=txt --ucla c83f765c66..., Last passed: Never
  • ccextractor --xmltv=3 --out=null 96efd279cf..., Last passed: Never
  • ccextractor --datapid 2310 --autoprogram --out=srt --latin1 e639e54550..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 4e56e88ba4..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 c0d2fba8c0..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 006fdc391a..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 e92a1d4d2a..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 b37ce60eb9..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 7e4ebf7fd7..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 9256a60e4b..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 27d7a43dd6..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 297a44921a..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 efbe129086..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 eae0077731..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 e2e2b501e0..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 8c1615c1a8..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 c6407fb294..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --tpage 398 5d5838bde9..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --tpage 299 44c45593fb..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --tpage 299 b8c55aa2e9..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --teletext --tpage 398 3b276ad8bf..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 b236a0590b..., Last passed: Never
  • ccextractor --out=srt --latin1 97cc394d87..., Last passed: Never
  • ccextractor --out=srt --latin1 10f0f77cf4..., Last passed: Never
  • ccextractor --out=srt --latin1 df3b4d62d3..., Last passed: Never
  • ccextractor --out=srt --latin1 c791382c94..., Last passed: Never
  • ccextractor --out=srt --latin1 f673b2f916..., Last passed: Never
  • ccextractor --out=srt --latin1 da75bdee47..., Last passed: Never
  • ccextractor --out=srt --latin1 bd6f33a669..., Last passed: Never
  • ccextractor --out=srt --latin1 0e5e6b26be..., Last passed: Never
  • ccextractor --out=srt --latin1 a226cc302d..., Last passed: Never
  • ccextractor --out=srt --latin1 ae6327683e..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla d037c7509e..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla d037c7509e..., Last passed: Never
  • ccextractor --autoprogram --out=smptett --latin1 --ucla e274a73653..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla c813e713a0..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla 27fab4dbb6..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla bbd5bb52fc..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla 7d2730d38e..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla c8dc039a88..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla 83b03036a2..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla f41d4c29a1..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla 88cd42b89a..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --output-field 2 --ucla 88cd42b89a..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla 7f41299cc7..., Last passed: Never

All tests passing on the master branch were passed completely.

Check the result page for more info.

@cfsmp3 cfsmp3 merged commit 5f0c672 into CCExtractor:master Dec 20, 2025
18 of 19 checks passed
@cfsmp3 cfsmp3 deleted the fix/issue-1626-broken-avc-stream branch December 20, 2025 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants