A small pipeline that captures screen regions (like UI text in games such as The Sims), performs OCR via Tesseract, and translates the recognized text using LibreTranslate.
Inspired by the fact that I accidentally installed Sims in Dutch and did not want to reinstall it.
Unfortunately I couldn't make the entire app in a fully containerized network. This was because when I would run the screen capturing on the WSL, it didn't work (probably Docker just doesn't have access to the screen framebuffer).
Because of that the architecture is split in two parts.
Within the local folder you can find a main.py which takes the screenshot. The dependencies that are used are mss, pyautogui, Pillow, dotenv, requests, keyboard, colorama. The Python version used in locally was 3.13.0.
After you run the script in a terminal you can take a screenshot using the Ctrl + Shift + T shortcut. You will see the translation in there as well
Running... Press Ctrl+Shift+T to capture screen region.
Hotkey pressed: Capturing and sending image...
Capturing at (1706, 752)...
==================================================
📝 OCR Text:
Hallo wereld!
--------------------------------------------------
🌍 Translated Text:
Hello, world!
==================================================
In the remote directory you can find the two containers that are used. One of the containers is using Tesseract to detect text while the other one is a LibreTranslate container. Once the local app sends the image (base64) to the Tesseract container, it detects the text and forwards it to the LibreTranslate container to be translated. After that the LibreTranslate responds to the Tesseract container and then finally the Tesseract returns it to the local app.
- Anaconda (or any Python 3.12 environment)
- The following packages:
mss,pyautogui,Pillow,dotenv,requests,keyboard,colorama - Docker + Docker Compose
git clone https://github.com/v-stamenova/screen-whisper.git
cd translatorBuild and run OCR API and LibreTranslate:
docker compose upThe client runs locally on Windows to capture your screen.
pip install mss pyautogui pillow requests dotenv requests keyboard coloramacd local
python main.py- If LibreTranslate gives
"nl is not supported": make sureLT_LOAD_ONLYincludesnl. - Use
localhost:5000for OCR server andlocalhost:5010for LibreTranslate in development. - Test OCR output with simple screenshots before using complex UI.