OCR4Linux is a versatile text extraction tool that allows you to take a screenshot of a selected area, extract text using OCR, and copy it to the clipboard. It supports both Wayland and X11 sessions and offers multiple language support.
Note: This script is currently only made for Arch Linux. It may work on other arch-based distributions, but it has not been tested yet.
I didn't find any easy tool in Linux that does the same thing as the PowerTool app in Windows. This motivated me to create OCR4Linux, a simple and efficient tool to capture screenshots, extract text, and copy it to the clipboard, all in one seamless process.
-
Screenshot Capture
- Wayland support via
grimblast
- X11 support via
scrot
- Configurable screenshot directory
- Wayland support via
-
Text Extraction
- Interactive language selection via
rofi
- Multi-language OCR support with custom language combinations
- Automatic language detection fallback
- Image preprocessing for better accuracy
- UTF-8 text output
- Interactive language selection via
-
Clipboard Integration
- Wayland:
wl-copy
andcliphist
- X11:
xclip
- Wayland:
-
Additional Features
- Interactive language selection menu
- Optional screenshot retention
- Comprehensive logging system
- Command-line interface
-
Arch Linux or arch-based distribution
-
Python 3.x
-
yay
package manager (will be installed if needed) -
tesseract
OCR engine -
tesseract-data-eng
English language pack -
tesseract-data-ara
Arabic language pack -
If you need any other language other than the above two, search for it using the command:
sudo pacman -Ss tesseract-data-{lang}
python-pillow
python-pytesseract
- Wayland:
grimblast-git
wl-clipboard
cliphist
rofi-wayland
- X11:
scrot
xclip
rofi
Note: rofi
is required for the interactive language selection feature.
-
Clone the repository:
git clone https://github.com/moheladwy/OCR4Linux.git cd OCR4Linux
-
Run the setup script to install the required packages and copy the necessary files to the configuration directory:
chmod +x setup.sh ./setup.sh
-
Run the main script to take a screenshot, extract text, and copy it to the clipboard:
chmod +x OCR4Linux.sh ./OCR4Linux.sh
-
The script will:
- With
--lang
option: Use specified languages directly (bypasses rofi menu) - Without
--lang
option: Display an interactive language selection menu viarofi
- Allow you to select one or multiple languages for OCR processing
- Take a screenshot of the selected area after language selection
- Extract text from the image using the selected languages
- Copy the extracted text to the clipboard### Language Selection
- With
You have two options for language selection:
Specify languages directly using the --lang
option:
--lang all
- Use all available languages--lang eng
- Use English only--lang eng+ara+fra
- Use multiple specific languages
When you run the script without --lang
, a rofi
menu will appear with:
- ALL: Select all available languages
- Individual languages: Choose specific languages (e.g., eng, ara, fra, deu)
- Multi-select: Hold
Ctrl
and click to select multiple languages
The selected languages will be used by Tesseract for more accurate text recognition in multi-language documents.
The complete OCR4Linux workflow:
- Language Selection:
- Command-line specified languages (with
--lang
) OR - Interactive rofi menu displays available languages (without
--lang
)
- Command-line specified languages (with
- Language Processing: Selected languages are validated and formatted
- Screenshot Capture: Area selection and image capture
- OCR Processing: Text extraction using selected languages
- Clipboard Integration: Extracted text copied to system clipboard
- Cleanup: Optional screenshot removal and logging
Option | Description | Default |
---|---|---|
-r |
Remove screenshot after processing | false |
-d DIR |
Set screenshot directory | $HOME/Pictures/screenshots |
-l |
Keep logs | false |
--lang LANGUAGES |
Specify OCR languages (bypasses rofi) | Interactive selection |
-h |
Show help message | - |
Language Format for --lang
:
- Use
all
for all available languages - Use
+
to separate multiple languages (e.g.,eng+ara+fra
) - Single languages:
eng
,ara
,fra
, etc.
Option | Description | Required |
---|---|---|
image_path |
Path to input image | Yes |
output_path |
Path to save extracted text | Yes |
--langs <languages> |
Specify languages for OCR | No |
-l, --list-langs |
List available OCR languages | No |
-h, --help |
Show help message | No |
Language Format: Use +
to separate multiple languages (e.g., eng+ara+fra
)
# Basic usage (shows interactive rofi menu)
./OCR4Linux.sh
# Direct language specification (bypasses rofi)
./OCR4Linux.sh --lang eng
./OCR4Linux.sh --lang all
./OCR4Linux.sh --lang eng+ara+fra
# Save logs and remove screenshot after processing
./OCR4Linux.sh -l -r
# Custom screenshot directory with logging
./OCR4Linux.sh -d ~/Documents/screenshots -l
# Combine language specification with other options
./OCR4Linux.sh --lang eng -l -r
./OCR4Linux.sh --lang all -d ~/screenshots -l
# Show help
./OCR4Linux.sh -h
# Basic usage (uses all available languages)
python OCR4Linux.py input.png output.txt
# Specify single language
python OCR4Linux.py input.png output.txt --langs eng
# Specify multiple languages
python OCR4Linux.py input.png output.txt --langs eng+ara+fra
# List available languages
python OCR4Linux.py --list-langs
# Show help
python OCR4Linux.py --help
-
Language Selection Options:
-
Command Line: Use
--lang
for automated/scripted usage--lang all
for maximum compatibility--lang eng
for English-only documents--lang eng+ara
for bilingual documents
-
Interactive Menu: Run without
--lang
for manual selection- Select "ALL" to use all available languages
- Select specific languages for better performance
- Use
Ctrl+Click
to select multiple languages - Press
Escape
to cancel the operation
-
-
Performance Optimization:
- Use fewer specific languages for faster processing
- Use
--lang all
only when document language is unknown - Command-line specification is faster than interactive selection
-
Keyboard Shortcuts: You can create a keyboard shortcut to run the script for easy access.
-
put the following lines in your
hyprland.conf
file:$OCR4Linux = ~/.config/OCR4Linux/OCR4Linux.sh $OCR4Linux_ENG = ~/.config/OCR4Linux/OCR4Linux.sh --lang eng bind = $mainMod SHIFT, E, exec, $OCR4Linux # OCR4Linux with interactive selection bind = $mainMod SHIFT, T, exec, $OCR4Linux_ENG # OCR4Linux with English only
-
put the following lines in your
config.h
file:static const char *ocr4linux[] = { "sh", "-c", "~/.config/OCR4Linux/OCR4Linux.sh", NULL }; static const char *ocr4linux_eng[] = { "sh", "-c", "~/.config/OCR4Linux/OCR4Linux.sh --lang eng", NULL }; { MODKEY | ShiftMask, XK_e, spawn, {.v = ocr4linux } }, // OCR4Linux interactive { MODKEY | ShiftMask, XK_t, spawn, {.v = ocr4linux_eng } }, // OCR4Linux English only
-
-
Language Optimization: For best results:
- Select only the languages present in your document
- Use fewer languages for better performance
- Install additional Tesseract language packs as needed
- OCR4Linux.py: Python script to preprocess the image and extract text using
tesseract
with support for custom language selection. - OCR4Linux.sh: Shell script that provides both interactive language selection via rofi and direct command-line language specification, takes a screenshot, passes it to the python script with selected languages, gets the extracted text, and copies it to the clipboard.
- setup.sh: Shell script to install the required packages and copy the necessary files to the configuration directory (run this script the first time you clone the repository only).
We welcome contributions from the community to help improve OCR4Linux and make it available for all Linux users and distributions. Whether it's reporting bugs, suggesting new features, or submitting patches, your help is greatly appreciated. Please check out our contributing guidelines to get started.
This project is licensed under the MIT License. See the LICENSE file for more details.