A modern web-based OCR application powered by PaddleOCR 3.x with advanced document parsing capabilities.
- 109+ languages supported - From Chinese, Japanese, Korean to Arabic, Hindi, Thai, and many more
- Text detection and recognition with PP-OCRv5 (latest), PP-OCRv4, PP-OCRv3
- Image preprocessing (brightness, contrast, saturation, sharpness)
- Bounding box visualization
- Languages grouped by script/region for easy selection
- Export results to JSON
- PDF and image support - Parse multi-page PDF documents
- Layout detection (text, titles, tables, formulas, charts, seals)
- Table recognition HTML output
- Formula recognition LaTeX output
- Chart parsing with data extraction
- Seal text recognition
- Export to Markdown/JSON with preserved structure
- Per-page results for PDF documents
- Vision-Language Model (0.9B parameters)
- 109 languages supported
- SOTA document parsing performance
- Complex element recognition (text, tables, formulas, charts)
- Markdown and JSON output
- Download/delete models on demand
- Disk usage tracking
- Filter by model type (detection, recognition, classification)
- Model registry with version info
- Python 3.10+
- Conda (recommended)
`bash
git clone https://github.com/yourusername/paddle-ui.git cd paddle-ui
conda create -n paddle python=3.10 -y conda activate paddle
pip install paddlepaddle paddleocr flask flask-cors pillow opencv-python-headless numpy requests
pip install pymupdf
python app.py `
Open http://localhost:5000 in your browser.
`bash
git clone https://github.com/yourusername/paddle-ui.git cd paddle-ui
docker-compose up -d
docker build -t paddle-ui . docker run -p 5000:5000 -v paddleocr-models:/root/.paddleocr paddle-ui `
Open http://localhost:5000 in your browser.
Note: First run will download PaddleOCR models (~500MB). Models are persisted in a Docker volume.
paddle-ui/ app.py # Flask application with API endpoints ocr_engine.py # PP-OCR wrapper (109+ languages) structure_engine.py # PP-StructureV3 wrapper (PDF + images) vl_engine.py # PaddleOCR-VL wrapper model_manager.py # Model download/management image_processor.py # Image preprocessing utilities templates/ index.html # Main UI template static/ css/ style.css # Modern dark theme styles js/ app.js # Frontend logic with dynamic language loading
| Endpoint | Method | Description |
|---|---|---|
| /api/ocr | POST | Basic OCR processing |
| /api/structure | POST | PP-StructureV3 document parsing (images + PDF) |
| /api/vl | POST | PaddleOCR-VL parsing |
| /api/languages | GET | List all supported OCR languages |
| /api/language-groups | GET | Languages organized by script/region |
| /api/models | GET | List available models |
| /api/models//download | POST | Download a model |
| /api/models/ | DELETE | Delete a model |
Chinese (Simplified & Traditional), Japanese, Korean
English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Romanian, Swedish, Norwegian, Danish, Finnish, Czech, Hungarian, Croatian, Slovenian, Slovak, and more...
Russian, Ukrainian, Belarusian, Bulgarian, Serbian, Macedonian, Mongolian, Kazakh, Kyrgyz, Tajik
Arabic, Persian/Farsi, Urdu, Pashto, Uyghur, Kurdish, Sindhi
Hindi, Marathi, Nepali, Sanskrit, Bengali, Assamese, Gujarati, Punjabi, Odia, Tamil, Telugu, Kannada, Malayalam, Sinhala
Thai, Lao, Myanmar/Burmese, Khmer, Vietnamese, Indonesian, Malay, Filipino
Greek, Hebrew, Amharic, Tigrinya, Georgian, Armenian
- LLM-based key information extraction from documents
- Multiple LLM providers: ERNIE (Baidu), OpenAI, Ollama (local)
- Built-in extraction templates (invoice, receipt, ID card, business card)
- Custom key extraction
- Ask natural language questions about document content
- Document translation powered by ERNIE 4.5
- 14 supported languages (English, Chinese, Japanese, Korean, German, French, Spanish, Russian, Arabic, Italian, Portuguese, Vietnamese, Thai, Indonesian)
- Translate images, PDFs, and Markdown documents
- Preserves document structure during translation
- Process multiple files at once
- Supports OCR, Structure, and VL batch operations
- Progress tracking with real-time updates
- Export all results to JSON
- Job management with status tracking
- Get character-level bounding boxes for precise text positioning
- Useful for text overlay, editing, and localization tasks
- Available via /api/ocr/chars endpoint
| Endpoint | Method | Description |
|---|---|---|
| /api/ocr | POST | Perform OCR on image |
| /api/ocr/chars | POST | OCR with character-level coordinates |
| /api/structure | POST | Document structure analysis |
| /api/vl | POST | Vision-Language model processing |
| Endpoint | Method | Description |
|---|---|---|
| /api/chatocr/extract | POST | Extract key info from document |
| /api/chatocr/templates | GET | Get extraction templates |
| /api/chatocr/providers | GET | Get supported LLM providers |
| /api/chatocr/ask | POST | Ask question about document |
| Endpoint | Method | Description |
|---|---|---|
| /api/translate/document | POST | Translate document |
| /api/translate/text | POST | Translate plain text |
| /api/translate/languages | GET | Get supported languages |
| Endpoint | Method | Description |
|---|---|---|
| /api/batch/create | POST | Create batch job |
| /api/batch/<job_id>/process | POST | Start processing |
| /api/batch/<job_id> | GET | Get job status |
| /api/batch/<job_id> | DELETE | Delete job |
| /api/batch/<job_id>/export | POST | Export results |
| /api/batch | GET | List all jobs |
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
- PaddlePaddle/PaddleOCR - The OCR engine powering this application
- Flask - Web framework



