OCR to Excel Converter

OCR to Excel Converter is a Python-based desktop application that processes images using different OCR engines (PaddleOCR, Tesseract, or EasyOCR) and converts the extracted data into an Excel file. The application features a graphical user interface (GUI) built using Tkinter and ttkbootstrap.

Features

Support for multiple OCR engines:
- PaddleOCR (auto-downloads models)
- EasyOCR (auto-downloads models)
- Tesseract (requires manual installation)
Confidence-based Excel highlighting
Cross-platform support (Windows/macOS)
GUI with image upload and screenshot capture
Automatic path detection for Tesseract

Installation

Clone the repository:

git clone https://github.com/a-sajjad72/Medical_OCR_SS_Tool.git
cd Medical_OCR_SS_Tool

Set up a virtual environment:
```
python -m venv venv
```
Activate the virtual environment:
- Windows:
```
venv\Scripts\activate
```
- macOS/Linux:
```
source venv/bin/activate
```
Install dependencies:

The project contains two requirement files:
- requirements.txt – List of all dependencies with their versions. (recommended to use)
- requirements_paddle.txt – list of key dependencies for PaddleOCR.
```
pip install -r requirements.txt
```

Requirements

Python Interpreter

Requires Python 3.12 (3.12.6 or newer recommended)
Minimum Supported Version: Python 3.12.0
Tested Version: Python 3.12.6
Download: python.org/downloads

Important Notes: ⚠️ PyTorch Compatibility: The application requires PyTorch, which currently only has official support for Python 3.12. Earlier versions (3.11 or below) are not supported due to dependency conflicts.

💡 Installation Tips:

For Windows users: Check "Add python.exe to PATH" during installation
For macOS/Linux users: Consider using pyenv for version management
Verify installation: python --version

🔗 PyTorch Compatibility Reference:
Official PyTorch Python Support Matrix

Tesseract OCR

Required for Tesseract OCR engine
Version 5.3.0 or newer
Installation guides below

Python Dependencies

All Python package requirements are listed in requirements.txt. Key dependencies include:

PaddleOCR
EasyOCR
pytesseract
OpenCV
ttkbootstrap
openpyxl
pytorch

Tesseract Installation

macOS

# Install using Homebrew
brew install tesseract

# Verify installation
tesseract --version

Windows

Download installer from UB-Mannheim Tesseract
Run the installer with default settings:
- Use recommended installation path i.e. (C:\Program Files\Tesseract-OCR)

Custom Installations

Create .env file in project root for custom paths:

TESS_BINARY_PATH=/path/to/tesseract
TESSDATA_PREFIX=/path/to/tessdata

Setup

Configuration:
- For non-standard Tesseract installations, edit the .env file
- Models for PaddleOCR/EasyOCR will auto-download on first run

Verify Paths:

python -c "from utils import get_tessbin_path, get_tessdata_path; print(f'Tesseract: {get_tessbin_path()}\nTessdata: {get_tessdata_path()}')"

Usage

Run the application:
```
python main.py
```
Application workflow:
- Select OCR engine from dropdown
- Adjust confidence thresholds (High/Medium)
- Upload image or capture screenshot
- Processed Excel file saves automatically
- Results shown with bounding box visualization

Building the Executable

Automated Build Scripts

Windows:
```
build_windows.bat
```
macOS/Linux:
```
source build_mac.sh
```

Script Features:

Automatically detects Tesseract installation paths
Handles both system-wide and custom installations
Maintains consistent resource paths between dev and prod
Validates Tesseract presence before building

Manual Build Requirements:

# For reference - use scripts instead
python -c "from utils import get_tessbin_path, get_tessdata_path; print(f'--add-binary {get_tessbin_path()}:models/tesseract --add-data {get_tessdata_path()}:models/tesseract/tessdata')"

Why This Works:

Uses your existing path detection logic from utils.py
Maintains frozen application structure
Eliminates hardcoded paths
Automatically adapts to different installations
Preserves PyInstaller's resource bundling requirements

Troubleshooting

Tesseract Issues

Path not found: Verify installation and check .env file
Missing languages: Install tesseract-lang (macOS) or reinstall with additional languages (Windows)
Version mismatch: Requires Tesseract 5.3.0+ t

Common Errors

TESSDATA_PREFIX not set: Verify tessdata directory exists
No module named...: Reinstall requirements.txt dependencies
Permission denied: Run as administrator (Windows) or use sudo (macOS)

License

This project is licensed under the Creative Commons License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
OCR_Modules		OCR_Modules
icons		icons
test		test
test2		test2
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
build_mac.sh		build_mac.sh
build_windows.bat		build_windows.bat
main.py		main.py
requirements.txt		requirements.txt
requirements_paddle.txt		requirements_paddle.txt
screenshot.py		screenshot.py
simfang.ttf		simfang.ttf
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR to Excel Converter

Table of Contents

Features

Installation

Requirements

Python Interpreter

Tesseract OCR

Python Dependencies

Tesseract Installation

macOS

Windows

Custom Installations

Setup

Usage

Building the Executable

Automated Build Scripts

Troubleshooting

Tesseract Issues

Common Errors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

a-sajjad72/Medical_OCR_SS_Tool

Folders and files

Latest commit

History

Repository files navigation

OCR to Excel Converter

Table of Contents

Features

Installation

Requirements

Python Interpreter

Tesseract OCR

Python Dependencies

Tesseract Installation

macOS

Windows

Custom Installations

Setup

Usage

Building the Executable

Automated Build Scripts

Troubleshooting

Tesseract Issues

Common Errors

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages