PP-OCRv5 on ONNX Runtime + Arm KleidiAI | 100% Accuracy Aligned with PaddleOCR | Apple M4 Benchmark
English | 中文
A production-ready, single-file PP-OCRv5 inference pipeline using ONNX Runtime, delivering 1.72x speedup over PaddleOCR native inference with 100% text-level accuracy alignment — verified on 228 text regions across 7 images with zero mismatch.
- 1.72x faster than Paddle native inference on Apple M4 (KleidiAI auto-enabled)
- 100% accuracy match — 228/228 texts identical, confidence diff < 0.00002
- Single-file deployment —
ppocrv5_onnx.py(~550 lines), copy-paste into any ARM app - Reproducible benchmarks — run on your own platform in 3 commands
- Zero accuracy loss from KleidiAI acceleration (ORT 1.21 vs 1.23: 0.000000 confidence diff)
| Backend | Avg Latency | vs Paddle | Text Match | Conf Diff |
|---|---|---|---|---|
| Paddle 3.3.0 | 9,451 ms | 1.00x | baseline | — |
| ORT 1.21.1 (no KleidiAI) | 6,407 ms | 1.48x faster | 228/228 (100%) | 0.000019 |
| ORT 1.23.2 (KleidiAI) | 5,486 ms | 1.72x faster | 228/228 (100%) | 0.000019 |
Measured on Apple M4, macOS ARM64, 8 threads, 7 images, 3 runs/image. Reproduce:
python benchmarks/benchmark_unified.py --backend ort --num-runs 3
KleidiAI per-model speedup (ORT 1.21.1 → 1.23.2)
| Model | ORT 1.21.1 (ms) | ORT 1.23.2 (ms) | Speedup | Role |
|---|---|---|---|---|
| doc_ori | 2.57 | 1.34 | 1.91x | Document orientation (4-class) |
| textline_ori | 67.88 | 36.51 | 1.86x | Text line orientation (2-class) |
| rec | 1,599.89 | 1,319.33 | 1.21x | Text recognition (CTC) |
| det | 3,779.37 | 3,788.58 | 1.00x | Text detection (DB, large-kernel Conv) |
KleidiAI accelerates GEMM-dominated models (classification, recognition) via Arm I8MM instructions. Detection is dominated by large-kernel convolutions (9x9), which are not GEMM-bound.
Per-image latency breakdown
| Image | Texts | Paddle 3.3.0 | ORT 1.21.1 | ORT 1.23.2 (KleidiAI) |
|---|---|---|---|---|
| ancient_demo.png | 12 | 2,958 ms | 2,379 ms | 2,086 ms |
| handwrite_ch_demo.png | 10 | 1,834 ms | 1,230 ms | 1,064 ms |
| handwrite_en_demo.png | 11 | 2,422 ms | 1,620 ms | 1,395 ms |
| japan_demo.png | 28 | 14,017 ms | 8,606 ms | 7,313 ms |
| magazine.png | 65 | 24,095 ms | 14,625 ms | 12,519 ms |
| magazine_vertical.png | 65 | 17,279 ms | 14,803 ms | 12,803 ms |
| pinyin_demo.png | 37 | 3,553 ms | 1,585 ms | 1,220 ms |
┌─────────┐ ┌──────────┐ ┌───────┐ ┌──────────────┐ ┌───────┐
│ Image │────▶│ doc_ori │────▶│ det │────▶│ textline_ori │────▶│ rec │────▶ Results
│ (BGR) │ │ 4-class │ │ DB │ │ 2-class │ │ CTC │ [{text,
└─────────┘ │ rotation │ │ boxes │ │ rotation │ │ decode│ conf,
└──────────┘ └───────┘ └──────────────┘ └───────┘ bbox}]
LCNet PP-OCRv5 LCNet PP-OCRv5
224×224 HxW→stride32 160×80 48×W
See docs/PIPELINE_ARCHITECTURE.md for preprocessing parameters and implementation details.
git clone https://github.com/user/ppocrv5-kleidiAI-appleM4.git
cd ppocrv5-kleidiAI-appleM4
pip install onnxruntime>=1.22.0 opencv-python-headless numpy pyclipperDownload from Baidu Pan (password: uepw), place under models/. See models/README.md for expected layout.
python scripts/download_models.py # verify models are in placefrom ppocrv5_onnx import PPOCRv5Pipeline
pipeline = PPOCRv5Pipeline("models", dict_path="data/dict/ppocrv5_dict.txt")
results = pipeline.predict("image.png")
for r in results:
print(f"{r['text']} ({r['confidence']:.4f})")ppocrv5_onnx.py is a single-file module (~550 lines) with minimal dependencies. Copy it directly into your project:
from ppocrv5_onnx import PPOCRv5Pipeline
pipeline = PPOCRv5Pipeline(
model_dir="path/to/onnx/models",
dict_path="path/to/ppocrv5_dict.txt",
threads=4,
)
results = pipeline.predict(bgr_image_array) # accepts file path or BGR ndarray
# [{"text": "...", "confidence": 0.98, "bounding_box": [[x,y], ...]}, ...]Dependencies: onnxruntime, opencv-python-headless, numpy, pyclipper
# ORT benchmark (recommended: ORT >= 1.22 for KleidiAI)
pip install onnxruntime==1.23.2
python benchmarks/benchmark_unified.py --backend ort --num-runs 3
# Paddle benchmark (optional)
pip install paddlepaddle==3.3.0
python benchmarks/benchmark_unified.py --backend paddle --num-runs 3
# Compare all results in results/
python benchmarks/compare_results.pyResults are saved to results/*.json and can be compared across platforms.
The ONNX pipeline produces 100% identical text output to PaddleOCR/PaddleX 3.4.x native inference, achieved through 6 rounds of systematic debugging:
| Round | Fix | Match Rate |
|---|---|---|
| 1 | CTC decode, normalize, box sorting, ... | 65.6% → 71.8% |
| 3 | det resize params (Pipeline runtime overrides inference.yml) | → 90.8% |
| 5 | crop coordinate precision (int16 → minAreaRect float32) | → 93.3% |
| 6 | rec batch padding (batch_size=6, ratio sort, per-batch pad) | → 100.0% |
See docs/ACCURACY_ALIGNMENT.md for the full story and key insights.
ppocrv5-kleidiAI-appleM4/
├── ppocrv5_onnx.py # Core: single-file inference pipeline
├── benchmarks/
│ ├── benchmark_unified.py # Unified benchmark (--backend paddle|ort)
│ └── compare_results.py # Multi-backend comparison report
├── results/ # Reference benchmark data (Apple M4)
│ ├── paddle_3.3.0.json
│ ├── ort_1.21.1.json
│ └── ort_1.23.2.json
├── data/
│ ├── dict/ppocrv5_dict.txt # Character dictionary (18,383 chars)
│ └── images/ # 7 test images
├── models/ # ONNX models (download separately, ~180 MB)
├── docs/
│ ├── ACCURACY_ALIGNMENT.md # 6-round alignment process
│ ├── BENCHMARK_RESULTS.md # Full benchmark tables
│ └── PIPELINE_ARCHITECTURE.md # 4-model pipeline details
├── scripts/download_models.py # Model verification tool
└── examples/quickstart.py # Minimal usage example
| Document | Description |
|---|---|
| Pipeline Architecture | 4-model pipeline, preprocessing parameters, batch strategy |
| Accuracy Alignment | 6-round debugging journey from 65.6% to 100% |
| Benchmark Results | Full speed/accuracy tables, per-model KleidiAI analysis |
| Package | Version | Notes |
|---|---|---|
| Python | >= 3.10 | |
| onnxruntime | >= 1.21.0 | >= 1.22.0 for KleidiAI |
| opencv-python-headless | >= 4.8.0 | |
| numpy | >= 1.24.0 | |
| pyclipper | >= 1.3.0 | DB post-processing |
- PaddleOCR — PP-OCRv5 models and the original inference pipeline
- ONNX Runtime — Cross-platform inference engine
- KleidiAI — Arm CPU micro-kernel library for accelerated ML inference
If you use PP-OCRv5 models in your work, please cite the PaddleOCR 3.0 Technical Report:
@article{cui2025paddleocr,
title={PaddleOCR 3.0 Technical Report},
author={Cui, Cheng and Sun, Ting and Lin, Manhui and Gao, Tingquan and Zhang, Yubo and Liu, Jiaxuan and Wang, Xueqing and Zhang, Zelun and Zhou, Changda and Liu, Hongen and Zhang, Yue and Lv, Wenyu and Huang, Kui and Zhang, Yichao and Zhang, Jing and Zhang, Jun and Liu, Yi and Yu, Dianhai and Ma, Yanjun},
journal={arXiv preprint arXiv:2507.05595},
year={2025}
}- Paper: https://arxiv.org/abs/2507.05595
- Source Code: https://github.com/PaddlePaddle/PaddleOCR
- Document: https://paddlepaddle.github.io/PaddleOCR
- Models & Online Demo: https://huggingface.co/PaddlePaddle
This project is licensed under the Apache License 2.0.