Skip to content

Add PGS to SRT OCR subtitle extraction feature#709

Merged
cdgriffith merged 2 commits intocdgriffith:developfrom
mikeSGman:feature/pgs-to-srt-ocr
Feb 7, 2026
Merged

Add PGS to SRT OCR subtitle extraction feature#709
cdgriffith merged 2 commits intocdgriffith:developfrom
mikeSGman:feature/pgs-to-srt-ocr

Conversation

@mikeSGman
Copy link
Contributor

Implements OCR-based conversion of PGS (Presentation Graphic Stream) subtitles to SRT format with automatic tool detection.

Features:

  • Auto-detect Tesseract OCR from PATH or Subtitle Edit installations
  • Auto-detect MKVToolNix from standard install locations
  • Support multiple language codes (ISO 639-2/3, language names)
  • GUI checkbox to enable/disable OCR for PGS subtitles
  • Automatic cleanup of intermediate .sup files after conversion

Dependencies:

  • pgsrip: PGS subtitle OCR engine
  • pytesseract: Tesseract OCR wrapper
  • babelfish: Language code handling
  • opencv-python, cleanit, trakit: Image/metadata processing

Known limitation:
This feature works when running from source (python -m fastflix) but not in PyInstaller-built executables due to subprocess environment issues with pgsrip. Users needing PGS OCR should run from source.

Implements OCR-based conversion of PGS (Presentation Graphic Stream)
subtitles to SRT format with automatic tool detection.

Features:
- Auto-detect Tesseract OCR from PATH or Subtitle Edit installations
- Auto-detect MKVToolNix from standard install locations
- Support multiple language codes (ISO 639-2/3, language names)
- GUI checkbox to enable/disable OCR for PGS subtitles
- Automatic cleanup of intermediate .sup files after conversion

Dependencies:
- pgsrip: PGS subtitle OCR engine
- pytesseract: Tesseract OCR wrapper
- babelfish: Language code handling
- opencv-python, cleanit, trakit: Image/metadata processing

Known limitation:
This feature works when running from source (python -m fastflix) but
not in PyInstaller-built executables due to subprocess environment
issues with pgsrip. Users needing PGS OCR should run from source.
@mikeSGman
Copy link
Contributor Author

Supersedes #701 — original branch history was rewritten so GitHub can’t reopen that PR.

@mikeSGman
Copy link
Contributor Author

@cdgriffith - want me to keep hammering on this? It's been a few months since I last looked at it, as I've written my own pipeline (with built-in SRT generation via OCR.

@cdgriffith
Copy link
Owner

Hey @mikeSGman Thanks again for this! I'm going to resolve the conflicts and just let Claude Code do any changes on it we may want

@cdgriffith cdgriffith merged commit 7bcb819 into cdgriffith:develop Feb 7, 2026
@cdgriffith cdgriffith mentioned this pull request Feb 9, 2026
@mikeSGman
Copy link
Contributor Author

Hey @cdgriffith just saw this, thanks! Hope someone finds it useful. Glad to contribute, thanks for the opportunity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants