-
Notifications
You must be signed in to change notification settings - Fork 515
fix(hevc): Add HEVC/H.265 caption extraction support with B-frame reordering #1852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
87fbb5b to
cbb1456
Compare
Fixes #1690 - Captions fail to extract on HEVC video stream HEVC video streams with embedded EIA-608/708 captions weren't being extracted, even though VLC/MPV could display them. Root causes fixed: 1. HEVC stream type (0x24) wasn't recognized for CC extraction 2. HEVC NAL parsing used H.264 format (1-byte) instead of HEVC (2-byte) 3. HEVC SEI types (39/40) weren't handled (only H.264 SEI type 6) 4. CC data accumulation across SEIs caused u8 overflow/garbled output Changes: - C code: Add HEVC stream detection, CCX_HEVC buffer type, is_hevc flag - Rust code: HEVC NAL header parsing (2-byte, type=(byte[0]>>1)&0x3F), HEVC SEI handling (PREFIX_SEI=39, SUFFIX_SEI=40), immediate CC flush Thanks to @trufio465-bot for the initial research in PR #1735. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
HEVC uses B-frames extensively, causing CC data to arrive in decode order instead of presentation order. This was causing character pairs to be scrambled (e.g., "MEDIOCRE" became "MIOEDCRE"). Changes: - Implement PTS-based sequence numbering for HEVC CC data (similar to H.264) - Change flush logic to only trigger on IDR frames (not every VCL NAL) - Add HEVC fallback detection for streams without PAT/PMT Fixes #1639 (ATSC 3.0 HEVC caption extraction) Tested with issue_1639_sample.ts and caption_test_1690.ts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
4e1044e to
9e970fd
Compare
The HEVC NAL type constants are defined for completeness and reference, but not all are currently used in the codebase. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
@canihavesomecoffee more samples to add |
CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit b9aabcd...:
Your PR breaks these cases:
NOTE: The following tests have been failing on the master branch as well as the PR:
Congratulations: Merging this PR would fix the following tests:
It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you). Check the result page for more info. |
CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit b9aabcd...:
Your PR breaks these cases:
NOTE: The following tests have been failing on the master branch as well as the PR:
Congratulations: Merging this PR would fix the following tests:
It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you). Check the result page for more info. |
|
Looks like this now works with MPEG-TS, but I still encounter failures with MP4 and Matroska. MP4 sample: https://drive.google.com/file/d/10X8R95TUnFzAZZkP_M2letRJzvz9OPSm/view?usp=sharing |
PR #1852 added HEVC caption extraction for MPEG-TS containers, but MP4/MKV containers weren't supported. This adds HEVC support for MP4 containers using GPAC. Changes: - Add HEVC subtype definitions (hev1, hvc1) - Add process_hevc_sample() to parse HEVC NAL units and extract CC - Add process_hevc_track() to iterate through HEVC track samples - Detect and process HEVC tracks in processmp4() - Add store_hdcc() call to flush buffered CC data after each sample The key fix was adding store_hdcc() after processing each sample. Without this, CC data was being parsed but never output because store_hdcc() is normally called from slice_header() which is AVC-only. Also includes plans/HEVC_MKV.md documenting the implementation plan for Matroska container HEVC support (not yet implemented). Closes #1690 (for MP4 containers) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
PR #1852 added HEVC caption extraction for MPEG-TS containers, but MP4/MKV containers weren't supported. This adds HEVC support for MP4 containers using GPAC. Changes: - Add HEVC subtype definitions (hev1, hvc1) - Add process_hevc_sample() to parse HEVC NAL units and extract CC - Add process_hevc_track() to iterate through HEVC track samples - Detect and process HEVC tracks in processmp4() - Add store_hdcc() call to flush buffered CC data after each sample The key fix was adding store_hdcc() after processing each sample. Without this, CC data was being parsed but never output because store_hdcc() is normally called from slice_header() which is AVC-only. Closes #1690 (for MP4 containers) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
PR #1852 added HEVC caption extraction for MPEG-TS containers, but MP4/MKV containers weren't supported. This adds HEVC support for MP4 containers using GPAC. Changes: - Add HEVC subtype definitions (hev1, hvc1) - Add process_hevc_sample() to parse HEVC NAL units and extract CC - Add process_hevc_track() to iterate through HEVC track samples - Detect and process HEVC tracks in processmp4() - Add store_hdcc() call to flush buffered CC data after each sample The key fix was adding store_hdcc() after processing each sample. Without this, CC data was being parsed but never output because store_hdcc() is normally called from slice_header() which is AVC-only. Closes #1690 (for MP4 containers) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Summary
Fixes #1690 - Captions fail to extract on HEVC video stream
Fixes #1639 - CCextractor not finding 608 captions in ATSC3.0 TS (garbled output)
HEVC (H.265) video streams with embedded EIA-608/708 captions weren't being extracted correctly. VLC/MPV could display them, but CCExtractor either failed to find them or produced garbled output.
Root Causes Fixed
Issue #1639 Deep Dive - B-Frame Reordering
The ATSC 3.0 sample from issue #1639 exposed a critical problem: HEVC uses B-frames extensively, and CC data embedded in SEI NAL units was being processed in decode order instead of presentation order.
Evidence from the sample file:
" BOTH OF MIOEDCRE WNDOHE TRESTH.TCL WWEILATE SEWHNSAP HPE IGAN ME...""COME BOTH OF MEDIOCRE DOWN THE STRETCH. WE WILL SEE WHAT HAPPENS IN GAME NUMBER ONE TONIGHT"The same characters were present but scrambled due to B-frame reordering.
Before fix:
After fix:
Changes
C Code:
ts_info.c: Include HEVC (0x24) in video stream detectionts_functions.c: AddCCX_HEVCbuffer type handling, HEVC NAL type detection for streams without PAT/PMT (VPS=32, SPS=33, PPS=34, PREFIX_SEI=39, SUFFIX_SEI=40, IDR=19,20, CRA=21)ccx_common_constants.h: AddCCX_HEVC = 11buffer typegeneral_loop.c: HandleCCX_HEVCand setis_hevcflagavc_functions.h: Addis_hevcfield toavc_ctxRust Code (
src/rust/src/avc/core.rs):(byte[0] >> 1) & 0x3Fpts_ordering_mode):cc_countto 0 per SEI but preservecc_datavector lengthTest Results
Issue #1690 sample file - now extracts 329 lines of clean, readable captions:
Issue #1639 sample file - garbled output now displays correctly:
Acknowledgments
Thanks to @trufio465-bot for the initial research direction in PR #1735. While that PR was incomplete, it helped identify the key areas that needed modification.
Test plan
🤖 Generated with Claude Code