OCR to Excel Converter is a Python-based desktop application that processes images using different OCR engines (PaddleOCR, Tesseract, or EasyOCR) and converts the extracted data into an Excel file. The application features a graphical user interface (GUI) built using Tkinter and ttkbootstrap.
- Features
- Installation
- Requirements
- Tesseract Installation
- Setup
- Usage
- Building the Executable
- Project Structure
- Troubleshooting
- License
- Support for multiple OCR engines:
- PaddleOCR (auto-downloads models)
- EasyOCR (auto-downloads models)
- Tesseract (requires manual installation)
- Confidence-based Excel highlighting
- Cross-platform support (Windows/macOS)
- GUI with image upload and screenshot capture
- Automatic path detection for Tesseract
-
Clone the repository:
git clone https://github.com/a-sajjad72/Medical_OCR_SS_Tool.git cd Medical_OCR_SS_Tool -
Set up a virtual environment:
python -m venv venv
-
Activate the virtual environment:
- Windows:
venv\Scripts\activate
- macOS/Linux:
source venv/bin/activate
- Windows:
-
Install dependencies:
The project contains two requirement files:
requirements.txt– List of all dependencies with their versions. (recommended to use)requirements_paddle.txt– list of key dependencies for PaddleOCR.
pip install -r requirements.txt
- Requires Python 3.12 (3.12.6 or newer recommended)
- Minimum Supported Version: Python 3.12.0
- Tested Version: Python 3.12.6
- Download: python.org/downloads
Important Notes:
💡 Installation Tips:
- For Windows users: Check "Add python.exe to PATH" during installation
- For macOS/Linux users: Consider using pyenv for version management
- Verify installation:
python --version
🔗 PyTorch Compatibility Reference:
Official PyTorch Python Support Matrix
- Required for Tesseract OCR engine
- Version 5.3.0 or newer
- Installation guides below
All Python package requirements are listed in requirements.txt. Key dependencies include:
- PaddleOCR
- EasyOCR
- pytesseract
- OpenCV
- ttkbootstrap
- openpyxl
- pytorch
# Install using Homebrew
brew install tesseract
# Verify installation
tesseract --version- Download installer from UB-Mannheim Tesseract
- Run the installer with default settings:
- Use recommended installation path i.e. (
C:\Program Files\Tesseract-OCR)
- Use recommended installation path i.e. (
Create .env file in project root for custom paths:
TESS_BINARY_PATH=/path/to/tesseract
TESSDATA_PREFIX=/path/to/tessdata-
Configuration:
- For non-standard Tesseract installations, edit the
.envfile - Models for PaddleOCR/EasyOCR will auto-download on first run
- For non-standard Tesseract installations, edit the
-
Verify Paths:
python -c "from utils import get_tessbin_path, get_tessdata_path; print(f'Tesseract: {get_tessbin_path()}\nTessdata: {get_tessdata_path()}')"
-
Run the application:
python main.py
-
Application workflow:
- Select OCR engine from dropdown
- Adjust confidence thresholds (High/Medium)
- Upload image or capture screenshot
- Processed Excel file saves automatically
- Results shown with bounding box visualization
-
Windows:
build_windows.bat
-
macOS/Linux:
source build_mac.sh
Script Features:
- Automatically detects Tesseract installation paths
- Handles both system-wide and custom installations
- Maintains consistent resource paths between dev and prod
- Validates Tesseract presence before building
Manual Build Requirements:
# For reference - use scripts instead
python -c "from utils import get_tessbin_path, get_tessdata_path; print(f'--add-binary {get_tessbin_path()}:models/tesseract --add-data {get_tessdata_path()}:models/tesseract/tessdata')"Why This Works:
- Uses your existing path detection logic from
utils.py - Maintains frozen application structure
- Eliminates hardcoded paths
- Automatically adapts to different installations
- Preserves PyInstaller's resource bundling requirements
- Path not found: Verify installation and check
.envfile - Missing languages: Install tesseract-lang (macOS) or reinstall with additional languages (Windows)
- Version mismatch: Requires Tesseract 5.3.0+ t
TESSDATA_PREFIX not set: Verify tessdata directory existsNo module named...: Reinstall requirements.txt dependenciesPermission denied: Run as administrator (Windows) or usesudo(macOS)
This project is licensed under the Creative Commons License. See the LICENSE file for details.