PaddleOCR Studio (paddle-ui)

A modern web-based OCR application powered by PaddleOCR 3.x with advanced document parsing capabilities.

Features

OCR Mode (PP-OCRv5)

109+ languages supported - From Chinese, Japanese, Korean to Arabic, Hindi, Thai, and many more
Text detection and recognition with PP-OCRv5 (latest), PP-OCRv4, PP-OCRv3
Image preprocessing (brightness, contrast, saturation, sharpness)
Bounding box visualization
Languages grouped by script/region for easy selection
Export results to JSON

Structure Mode (PP-StructureV3)

PDF and image support - Parse multi-page PDF documents
Layout detection (text, titles, tables, formulas, charts, seals)
Table recognition HTML output
Formula recognition LaTeX output
Chart parsing with data extraction
Seal text recognition
Export to Markdown/JSON with preserved structure
Per-page results for PDF documents

VL Mode (PaddleOCR-VL)

Vision-Language Model (0.9B parameters)
109 languages supported
SOTA document parsing performance
Complex element recognition (text, tables, formulas, charts)
Markdown and JSON output

Model Management

Download/delete models on demand
Disk usage tracking
Filter by model type (detection, recognition, classification)
Model registry with version info

Quick Start

Prerequisites

Python 3.10+
Conda (recommended)

Installation

`bash

Clone the repository

git clone https://github.com/yourusername/paddle-ui.git cd paddle-ui

Create conda environment

conda create -n paddle python=3.10 -y conda activate paddle

Install dependencies

pip install paddlepaddle paddleocr flask flask-cors pillow opencv-python-headless numpy requests

For PDF support (optional but recommended)

pip install pymupdf

Run the application

python app.py `

Open http://localhost:5000 in your browser.

Docker (Recommended)

`bash

Clone the repository

git clone https://github.com/yourusername/paddle-ui.git cd paddle-ui

Build and run with Docker Compose

docker-compose up -d

Or build manually

docker build -t paddle-ui . docker run -p 5000:5000 -v paddleocr-models:/root/.paddleocr paddle-ui `

Open http://localhost:5000 in your browser.

Note: First run will download PaddleOCR models (~500MB). Models are persisted in a Docker volume.

Project Structure

paddle-ui/ app.py # Flask application with API endpoints ocr_engine.py # PP-OCR wrapper (109+ languages) structure_engine.py # PP-StructureV3 wrapper (PDF + images) vl_engine.py # PaddleOCR-VL wrapper model_manager.py # Model download/management image_processor.py # Image preprocessing utilities templates/ index.html # Main UI template static/ css/ style.css # Modern dark theme styles js/ app.js # Frontend logic with dynamic language loading

API Endpoints

Endpoint	Method	Description
/api/ocr	POST	Basic OCR processing
/api/structure	POST	PP-StructureV3 document parsing (images + PDF)
/api/vl	POST	PaddleOCR-VL parsing
/api/languages	GET	List all supported OCR languages
/api/language-groups	GET	Languages organized by script/region
/api/models	GET	List available models
/api/models//download	POST	Download a model
/api/models/	DELETE	Delete a model

Supported Languages (109+)

East Asian

Chinese (Simplified & Traditional), Japanese, Korean

European (Latin Script)

English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Romanian, Swedish, Norwegian, Danish, Finnish, Czech, Hungarian, Croatian, Slovenian, Slovak, and more...

Cyrillic Script

Russian, Ukrainian, Belarusian, Bulgarian, Serbian, Macedonian, Mongolian, Kazakh, Kyrgyz, Tajik

Arabic Script

Arabic, Persian/Farsi, Urdu, Pashto, Uyghur, Kurdish, Sindhi

Indic Scripts

Hindi, Marathi, Nepali, Sanskrit, Bengali, Assamese, Gujarati, Punjabi, Odia, Tamil, Telugu, Kannada, Malayalam, Sinhala

Southeast Asian

Thai, Lao, Myanmar/Burmese, Khmer, Vietnamese, Indonesian, Malay, Filipino

Other Scripts

Greek, Hebrew, Amharic, Tigrinya, Georgian, Armenian

Screenshots

OCR Mode

Structure Mode (PP-StructureV3)

VL Mode (PaddleOCR-VL)

Model Management

Chat OCR Mode (PP-ChatOCRv4)

LLM-based key information extraction from documents
Multiple LLM providers: ERNIE (Baidu), OpenAI, Ollama (local)
Built-in extraction templates (invoice, receipt, ID card, business card)
Custom key extraction
Ask natural language questions about document content

Translation Mode (PP-DocTranslation)

Document translation powered by ERNIE 4.5
14 supported languages (English, Chinese, Japanese, Korean, German, French, Spanish, Russian, Arabic, Italian, Portuguese, Vietnamese, Thai, Indonesian)
Translate images, PDFs, and Markdown documents
Preserves document structure during translation

Batch Processing

Process multiple files at once
Supports OCR, Structure, and VL batch operations
Progress tracking with real-time updates
Export all results to JSON
Job management with status tracking

Single-Character Coordinates

Get character-level bounding boxes for precise text positioning
Useful for text overlay, editing, and localization tasks
Available via /api/ocr/chars endpoint

API Reference

Core Endpoints

Endpoint	Method	Description
/api/ocr	POST	Perform OCR on image
/api/ocr/chars	POST	OCR with character-level coordinates
/api/structure	POST	Document structure analysis
/api/vl	POST	Vision-Language model processing

ChatOCR Endpoints

Endpoint	Method	Description
/api/chatocr/extract	POST	Extract key info from document
/api/chatocr/templates	GET	Get extraction templates
/api/chatocr/providers	GET	Get supported LLM providers
/api/chatocr/ask	POST	Ask question about document

Translation Endpoints

Endpoint	Method	Description
/api/translate/document	POST	Translate document
/api/translate/text	POST	Translate plain text
/api/translate/languages	GET	Get supported languages

Batch Processing Endpoints

Endpoint	Method	Description
/api/batch/create	POST	Create batch job
/api/batch/<job_id>/process	POST	Start processing
/api/batch/<job_id>	GET	Get job status
/api/batch/<job_id>	DELETE	Delete job
/api/batch/<job_id>/export	POST	Export results
/api/batch	GET	List all jobs

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Acknowledgments

PaddlePaddle/PaddleOCR - The OCR engine powering this application
Flask - Web framework

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
screenshots		screenshots
static		static
templates		templates
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
arabic_ocr.py		arabic_ocr.py
batch_processor.py		batch_processor.py
chat_ocr_engine.py		chat_ocr_engine.py
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
image.png		image.png
image_processor.py		image_processor.py
model_manager.py		model_manager.py
ocr_engine.py		ocr_engine.py
requirements.txt		requirements.txt
structure_engine.py		structure_engine.py
translation_engine.py		translation_engine.py
vl_engine.py		vl_engine.py

License

PiKa919/paddle-ui

Folders and files

Latest commit

History

Repository files navigation

PaddleOCR Studio (paddle-ui)

Features

OCR Mode (PP-OCRv5)

Structure Mode (PP-StructureV3)

VL Mode (PaddleOCR-VL)

Model Management

Quick Start

Prerequisites

Installation

Clone the repository

Create conda environment

Install dependencies

For PDF support (optional but recommended)

Run the application

Docker (Recommended)

Clone the repository

Build and run with Docker Compose

Or build manually

Project Structure

API Endpoints

Supported Languages (109+)

East Asian

European (Latin Script)

Cyrillic Script

Arabic Script

Indic Scripts

Southeast Asian

Other Scripts

Screenshots

OCR Mode

Structure Mode (PP-StructureV3)

VL Mode (PaddleOCR-VL)

Model Management

Chat OCR Mode (PP-ChatOCRv4)

Translation Mode (PP-DocTranslation)

Batch Processing

Single-Character Coordinates

API Reference

Core Endpoints

ChatOCR Endpoints

Translation Endpoints

Batch Processing Endpoints

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages