DeepX OCR - High-Performance C++ OCR Inference Engine

中文 •

DeepX OCR is a high-performance, multi-threaded asynchronous OCR inference engine based on PP-OCRv5, optimized for DeepX NPU acceleration.

📖 Documentation

System Architecture - Detailed architecture diagrams, data flow, and model configuration.

✨ Features

🚀 High Performance: Asynchronous pipeline optimized for DeepX NPU.
🔄 Multi-threading: Efficient thread pool management for concurrent processing.
🛠️ Modular Design: Decoupled Detection, Classification, and Recognition modules.
🌍 Multi-language Support: Built-in freetype support for rendering multi-language text.
📊 Comprehensive Benchmarking: Integrated tools for performance analysis.

⚡ Quick Start

1. Clone & Initialize

# Clone the repository and initialize submodules
git clone --recursive git@github.com:Chris-godz/DEEPX-OCR.git
cd DEEPX-OCR

2. Install Dependencies

# Install freetype dependencies (for multi-language text rendering)
sudo apt-get install libfreetype6-dev libharfbuzz-dev libfmt-dev

3. Build & Setup

# Build the project
./build.sh

# Download/Setup models
./setup.sh

# Set DXRT environment variables (Example)
source ./set_env.sh 1 2 1 3 2 4

4. Run Tests

# Run the interactive test menu
./run.sh

🛠️ Build Configuration

This project uses Git Submodules to manage dependencies (nlohmann/json, Clipper2, spdlog, OpenCV, opencv_contrib).

Option 1: Build OpenCV from Source (Recommended)

Includes opencv_contrib for better text rendering support.

# Update submodules
git submodule update --init 3rd-party/opencv
git submodule update --init 3rd-party/opencv_contrib

# Build
./build.sh

Option 2: Use System OpenCV

Faster build if you already have OpenCV installed.

# Set environment variable
export BUILD_OPENCV_FROM_SOURCE=OFF

# Build
./build.sh

📁 Project Structure

OCR/
├── 📂 src/                    # Source Code
│   ├── 📂 common/             # Common Utilities (geometry, visualizer, logger)
│   ├── 📂 preprocessing/      # Preprocessing (uvdoc, image_ops)
│   ├── 📂 detection/          # Text Detection Module
│   ├── 📂 classification/     # Orientation Classification
│   ├── 📂 recognition/        # Text Recognition Module
│   └── 📂 pipeline/           # Main OCR Pipeline
├── 📂 3rd-party/              # Dependencies (Git Submodules)
│   ├── 📦 json                # nlohmann/json
│   ├── 📦 clipper2            # Polygon Clipping
│   ├── 📦 spdlog              # Logging
│   ├── 📦 opencv              # Computer Vision
│   ├── 📦 opencv_contrib      # Extra Modules (freetype)
│   ├── 📦 crow                # HTTP Framework
│   ├── 📦 poppler             # PDF Rendering
│   ├── 📦 cpp-base64          # Base64 Encoding
│   └── 📦 googletest          # Unit Testing Framework
├── 📂 engine/model_files      # Model Weights
│   ├── 📂 server/             # High-Accuracy Models
│   └── 📂 mobile/             # Lightweight Models
├── 📂 server/                 # HTTP Server
│   ├── 📂 benchmark/          # API Benchmark
│   ├── 📂 tests/              # Server Tests
│   └── 📂 webui/              # Web Interface
├── 📂 benchmark/              # Performance Benchmarking
├── 📂 test/                   # Unit & Integration Tests
├── 📂 docs/                   # Documentation
├── 📜 build.sh                # Build Script
├── 📜 run.sh                  # Interactive Runner
├── 📜 setup.sh                # Model Setup Script
└── 📜 set_env.sh              # Environment Setup

🧪 Testing & Benchmarking

Interactive Mode

./run.sh

Manual Execution

# Pipeline Test
./build_Release/bin/test_pipeline_async

# Module Tests
./build_Release/test_detector                 # Detection
./build_Release/test_recognizer               # Recognition (Server)
./build_Release/test_recognizer_mobile        # Recognition (Mobile)

Benchmarking

# Run Python benchmark wrapper
python3 benchmark/run_benchmark.py --model server
python3 benchmark/run_benchmark.py --model mobile

📊 Benchmark Reports (Summary)

x86 Platform

Test Configuration (from docs/results/local/x86/ reports):

Model: PP-OCR v5 (DEEPX NPU acceleration)
Dataset Size: 20 images
Success Rate: 100% (20/20)

Performance Summary (Server):

Setup	Avg Inference Time (ms)	Avg FPS	Avg CPS (chars/s)	Avg Character Accuracy
Single Card	135.06	7.40	243.22	96.93%
Dual Cards	67.89	14.73	483.88	96.93%
Three Cards	45.55	21.96	721.23	96.93%

Performance Summary (Mobile):

Setup	Avg Inference Time (ms)	Avg FPS	Avg CPS (chars/s)	Avg Character Accuracy
Single Card	82.93	12.06	378.63	89.60%
Dual Cards	44.24	22.61	709.83	89.60%
Three Cards	33.00	30.30	951.57	89.60%

Detailed Reports:

Setup	Server	Mobile
Single Card	Report	Report
Dual Cards	Report	Report
Three Cards	Report	Report

ARM Platform (Rockchip aarch64)

Test Configuration (from docs/results/local/arm/ reports):

Model: PP-OCR v5 (DEEPX NPU acceleration)
Dataset Size: 20 images
Success Rate: 100% (20/20)

Performance Summary:

Model	Avg Inference Time (ms)	Avg FPS	Avg CPS (chars/s)	Avg Character Accuracy
Server	133.88	7.47	245.74	96.82%
Mobile	60.00	16.67	524.96	89.37%

Detailed Reports:

Model	Report
Server	Report
Mobile	Report

🔄 Reproduce Benchmark Results

To reproduce the benchmark results above, run the following commands:

# 1. Build the project
./build.sh

# 2. Download/setup models
./setup.sh

# 3. Set DXRT environment variables (example)
source ./set_env.sh 1 2 1 3 2 4

# 4. Run benchmark (server model, 60 runs per image)
python3 benchmark/run_benchmark.py --model server --runs 60 \
    --images_dir test/twocode_images

# 5. Run benchmark (mobile model, 60 runs per image)
python3 benchmark/run_benchmark.py --model mobile --runs 60 \
    --images_dir test/twocode_images

Parameters:

Parameter	Description	Default
`--model`	Model type (`server` / `mobile`)	`server`
`--runs`	Number of runs per image	`3`
`--images_dir`	Test images directory	`images`
`--no-acc`	Skip accuracy calculation	-
`--no-cpp`	Skip C++ benchmark (use existing results)	-

📡 API Server Benchmark

Test configuration (same across all reports):

Mode: throughput
Concurrency: 20
Runs per sample: 20

x86 Platform

Server Model:

Setup	QPS	Success Rate	CPS (chars/s)	Accuracy	Avg Latency (ms)	P50 (ms)	P99 (ms)
Single Card	7.64	100%	236.88	96.93%	2594.17	2618.61	3498.46
Dual Cards	13.62	100%	401.24	89.60%	1423.65	1438.99	1786.95
Three Cards	21.50	100%	605.96	96.93%	900.14	907.47	1517.51

Mobile Model:

Setup	QPS	Success Rate	CPS (chars/s)	Accuracy	Avg Latency (ms)	P50 (ms)	P99 (ms)
Single Card	13.62	100%	401.24	89.60%	1423.65	1438.99	1786.95
Dual Cards	23.97	100%	692.24	89.60%	788.05	763.87	1586.34
Three Cards	28.00	100%	801.66	89.60%	635.59	564.74	1299.82

Detailed reports:

Setup	Server	Mobile
Single Card	Report	Report
Dual Cards	Report	Report
Three Cards	Report	Report

ARM Platform (Rockchip aarch64)

Model	QPS	Success Rate	CPS (chars/s)	Accuracy	Avg Latency (ms)	P50 (ms)	P99 (ms)
Server	7.45	100%	225.62	96.82%	2635.66	2646.28	4270.81
Mobile	16.11	100%	469.57	89.37%	1192.55	1200.13	1673.76

Detailed reports:

Model	Report
Server	Report
Mobile	Report

🔄 Reproduce API Server Benchmark Results

Start the OCR server:

cd server
./run_server.sh

Install benchmark dependencies:

cd server/benchmark
pip install -r requirements.txt

Run throughput test:

./quick_start.sh

# Select option 2 to run the throughput test

🖥️ WebUI Demo

Start the OCR server (required for the WebUI backend):

cd server
./run_server.sh

Start the WebUI:

cd server/webui
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python app.py

Access: http://localhost:7860

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepX OCR - High-Performance C++ OCR Inference Engine

📖 Documentation

✨ Features

⚡ Quick Start

1. Clone & Initialize

2. Install Dependencies

3. Build & Setup

4. Run Tests

🛠️ Build Configuration

Option 1: Build OpenCV from Source (Recommended)

Option 2: Use System OpenCV

📁 Project Structure

🧪 Testing & Benchmarking

Interactive Mode

Manual Execution

Benchmarking

📊 Benchmark Reports (Summary)

x86 Platform

ARM Platform (Rockchip aarch64)

📡 API Server Benchmark

x86 Platform

ARM Platform (Rockchip aarch64)

🖥️ WebUI Demo

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

DeepX OCR - High-Performance C++ OCR Inference Engine

📖 Documentation

✨ Features

⚡ Quick Start

1. Clone & Initialize

2. Install Dependencies

3. Build & Setup

4. Run Tests

🛠️ Build Configuration

Option 1: Build OpenCV from Source (Recommended)

Option 2: Use System OpenCV

📁 Project Structure

🧪 Testing & Benchmarking

Interactive Mode

Manual Execution

Benchmarking

📊 Benchmark Reports (Summary)

x86 Platform

ARM Platform (Rockchip aarch64)

📡 API Server Benchmark

x86 Platform

ARM Platform (Rockchip aarch64)

🖥️ WebUI Demo