You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(708): Support Korean EUC-KR encoding in CEA-708 decoder
Korean broadcasts use EUC-KR encoding (variable-width) in CEA-708
captions, where ASCII is 1 byte and Korean characters are 2 bytes.
The decoder was always writing 2 bytes per character (UTF-16BE style),
causing NULL bytes to be inserted before every ASCII character.
Changes:
- Add is_utf16_charset() to detect fixed-width 16-bit encodings
- Modify write_char() to accept use_utf16 flag:
- true: Always 2 bytes (UTF-16BE for Japanese, issue #1451)
- false: 1 byte for ASCII, 2 bytes for extended (EUC-KR for Korean)
- Detect charset type in write_row() before building output buffer
This fixes Korean subtitle extraction when using --service "1[EUC-KR]"
while maintaining compatibility with Japanese UTF-16BE (issue #1451).
Closes#1065
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <[email protected]>
0 commit comments