fix(windows): Bundle tessdata for OCR support out of the box #1908

cfsmp3 · 2025-12-26T14:18:13Z

Summary

The Windows release was missing Tesseract OCR runtime dependencies (tessdata files) needed for the HardSubx feature to work. Users had to manually install Tesseract OCR and set TESSDATA_PREFIX environment variable.

This PR fixes that by:

Adding get_executable_directory() to ocr.c that finds the directory where CCExtractor is installed (works on Windows, Linux, and macOS)
Updating probe_tessdata_location() to search for tessdata in the executable directory, enabling bundled tessdata to be found automatically
Updating release workflow to download eng.traineddata and osd.traineddata from tesseract-ocr/tessdata_fast during release builds
Updating WiX installer to include the tessdata/ directory with the traineddata files

Now the Windows release includes tessdata files, and CCExtractor will automatically find them in the installation directory without requiring users to install Tesseract separately or set environment variables.

Test plan

Created test release on fork: https://github.com/cfsmp3/ccextractor/releases/tag/v0.97.0-test-tessdata
Verified portable ZIP contains tessdata/eng.traineddata and tessdata/osd.traineddata
Tested on Windows VM - HardSubx works out of the box without Tesseract installation

Fixes #1578

🤖 Generated with Claude Code

The Windows release was missing Tesseract OCR runtime dependencies (tessdata files) needed for the HardSubx feature to work. Users had to manually install Tesseract OCR and set TESSDATA_PREFIX. Changes: - Add get_executable_directory() to ocr.c that returns the directory containing the executable (works on Windows, Linux, and macOS) - Update probe_tessdata_location() to search for tessdata in the executable directory, enabling bundled tessdata to be found - Update release workflow to download eng.traineddata and osd.traineddata from tesseract-ocr/tessdata_fast during release builds - Update WiX installer to include tessdata directory with the traineddata files Now the Windows release includes tessdata files, and CCExtractor will automatically find them in the installation directory without requiring users to install Tesseract separately or set environment variables. Fixes CCExtractor#1578 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

ccextractor-bot · 2025-12-26T14:57:50Z

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 45ee03a...:

Report Name	Tests Passed
Broken	13/13
CEA-708	14/14
DVB	7/7
DVD	3/3
DVR-MS	2/2
General	24/27
Hardsubx	1/1
Hauppage	3/3
MP4	3/3
NoCC	10/10
Options	81/86
Teletext	21/21
WTV	13/13
XDS	34/34

Your PR breaks these cases:

ccextractor --autoprogram --out=ttxt --latin1 1974a299f0...
ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65...
ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b...
ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

cfsmp3 merged commit dc352a2 into CCExtractor:master Dec 26, 2025
22 of 24 checks passed

cfsmp3 deleted the fix/issue-1578-bundle-tessdata branch December 26, 2025 14:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(windows): Bundle tessdata for OCR support out of the box #1908

fix(windows): Bundle tessdata for OCR support out of the box #1908

Uh oh!

cfsmp3 commented Dec 26, 2025

Uh oh!

Uh oh!

ccextractor-bot commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(windows): Bundle tessdata for OCR support out of the box #1908

fix(windows): Bundle tessdata for OCR support out of the box #1908

Uh oh!

Conversation

cfsmp3 commented Dec 26, 2025

Summary

Test plan

Uh oh!

Uh oh!

ccextractor-bot commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants