Feature Request: Integrate Rust-native PaddleOCR without Python dependencies #306
Replies: 3 comments
-
|
Ok, I converted this to a discussion. Its not an issue. So, in a nutshell - the problem with paddle is that the models are very large and very heavy. We are talking hundred of MBs to GBs of data. Saying that, if we can do this as an optional feature, then maybe. |
Beta Was this translation helpful? Give feedback.
-
|
ok, i am going to do a feasibility test. If i see the implementation is straighforward - and given the huge demand for chinese, I will add it. |
Beta Was this translation helpful? Give feedback.
-
|
v4.3.0 includes paddleOCR. It will be released today. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Technical Feasibility Study: Integrating Python-Free Rust-Native PaddleOCR into Kreuzberg Framework
1. Overview and Background Analysis
1.1 Research Background and Objectives
Kreuzberg, as an emerging high-performance document intelligence processing framework, derives its core value proposition from leveraging Rust's memory safety, zero-cost abstractions, and exceptional concurrency capabilities to address performance bottlenecks and deployment complexity issues inherent in traditional document processing (especially Python-based ecosystems).
In document intelligence processing pipelines, Optical Character Recognition (OCR) is a critical component that directly determines whether unstructured data (such as scanned documents and images) can be transformed into searchable, analyzable structured information.
Currently, while Kreuzberg has implemented its core logic in Rust, it still partially relies on external ecosystems for OCR capabilities. Traditional integration approaches often invoke Tesseract (C++ library) via FFI or call EasyOCR/PaddleOCR through Python bindings. This "hybrid architecture," while leveraging mature existing models, introduces significant engineering pain points:
Deployment Complexity (Dependency Hell): Production environments must maintain Python runtime, manage pip dependencies, handle virtual environments, and ensure compatibility across different versions of deep learning frameworks (PyTorch/PaddlePaddle). This contradicts Rust's minimalist deployment philosophy of generating single static binaries.
Performance Bottlenecks: Python's Global Interpreter Lock (GIL) limits multi-threaded concurrency, and cross-language data transfer often involves memory copying, increasing latency.
Resource Overhead: Loading the complete Python interpreter and deep learning frameworks introduces substantial memory baseline (typically 500MB+), making it unsuitable for edge devices or high-density container deployments.
This report aims to thoroughly explore the feasibility of an alternative approach: integrating Baidu's PaddleOCR into the Kreuzberg framework using pure Rust or Rust-native bindings without introducing any Python runtime. The goal is to achieve recognition accuracy comparable to the original Python implementation while significantly reducing resource consumption and simplifying deployment.
1.2 PaddleOCR's Technical Standing
PaddleOCR (PP-OCR series) has become an industry de facto standard due to its excellent performance in Chinese and multilingual recognition, lightweight model design, and robustness for complex layouts (tables, distorted text). PP-OCRv4 and v5 versions, in particular, introduced more efficient backbone networks and data augmentation strategies, delivering excellent performance on both server and mobile platforms.
Therefore, whether these high-quality pretrained models can be reused in Rust is key to enhancing Kreuzberg's competitiveness.
2. Core Architecture Analysis and Integration Strategy
2.1 Kreuzberg v4's Plugin Architecture
Kreuzberg v4's architecture is deeply influenced by Rust language features, emphasizing modularity and extensibility. Its core defines interface specifications for various components through the trait system, allowing developers to inject custom implementations as plugins.
For OCR functionality, Kreuzberg defines the
OcrBackendtrait. This is an async trait designed to decouple specific OCR engine implementations. This means the core framework doesn't care whether the underlying layer calls Tesseract's C API, makes network requests to cloud APIs, or runs ONNX inference locally.This loose coupling provides a perfect entry point for integrating Rust-native PaddleOCR.
By implementing the
OcrBackendtrait, we can build an inference path that completely bypasses the Python layer. In this architecture, Kreuzberg's main process directly manages image data in memory and passes it to the Rust-implemented OCR module. This module handles image preprocessing, model inference (via Rust-bound inference engines), and post-processing (decoding), ultimately returning structured text results.2.2 Interface Definition and Data Flow
For seamless integration, we must strictly adhere to the
OcrBackendinterface contract. Based on Rust async programming best practices and Kreuzberg documentation, this interface typically contains the following key methods:When implementing this interface, special attention must be paid to zero-copy data flow. Kreuzberg internally likely uses the
imagecrate'sDynamicImagestructure or raw&[u8]byte slices for image data transfer. Our Rust PaddleOCR implementation must be able to directly consume this memory data, rather than requiring file paths like some Python scripts, thereby avoiding unnecessary disk I/O overhead.3. Evaluation of PaddleOCR Implementation Approaches in the Rust Ecosystem
To achieve "no Python dependency," a more pragmatic and efficient approach is to leverage Rust's powerful FFI capabilities to bind high-performance C++ inference engines (such as ONNX Runtime or MNN) while rewriting all preprocessing and post-processing logic in pure Rust.
3.1 Option 1: MNN-based rusto-rs
rusto-rs is currently the most complete PaddleOCR replica in the open-source Rust community.
imageandimageproccrates, including complex text box contour detection and perspective transformation.image::DynamicImageto inference results, integration into Kreuzberg requires only a thin wrapper layer.3.2 Option 2: ONNX Runtime-based oar-ocr / paddle-ocr-rs
This approach leverages the industry-standard model exchange format ONNX.
ortcrate (Rust bindings for Microsoft ONNX Runtime) as the inference backend.oar-ocrhave encapsulated DBNet (detection) and CRNN (recognition) post-processing logic. Compared to MNN, ONNX Runtime has a more mature server-side ecosystem, and theortcrate is actively maintained.3.3 Option 3: Pure Rust Inference Engine ocrs
ocrs represents the Rust community's exploration toward "pure Rust."
3.4 Selection Conclusion
Considering engineering feasibility, maintenance costs, and performance, Option 2 (ONNX Runtime-based) is currently the best choice, with Option 1 (rusto-rs) as a close second.
While both options rely on C++-written inference engines (ORT or MNN) at the bottom, they expose pure Rust interfaces, and compiled artifacts don't depend on Python environments in the system. This fully satisfies the core requirement of "no Python dependency."
4. Deep Technical Implementation Path
4.1 Precise Replication of Preprocessing Pipeline
OCR accuracy is extremely sensitive to image preprocessing. PaddleOCR (Python version) heavily uses OpenCV functionality. In Rust, we must replicate this logic using the
imagecrate and ensure pixel-level alignment; otherwise, inference accuracy will degrade.Resizing:
ResizeShort-like logic, adjusting the image's shortest edge to multiples of 32 while maintaining aspect ratio. Rust'simage::resizeprovides multiple interpolation algorithms (Nearest, Triangle, CatmullRom, Gaussian, Lanczos3). The algorithm closest to OpenCV'sINTER_LINEAR(typicallyFilterType::Triangle) must be selected to ensure input tensor consistency.Normalization:
Image data must undergo
(pixel - mean) / stdoperations. This can be vectorized using Rust'sndarraylibrary or even SIMD instruction sets, far more efficient than Python's NumPy operations.Geometric Transformation:
For detected tilted text boxes, perspective transformation is needed to "straighten" them. The rusto-rs project contains a pure Rust implementation of
get_rotate_crop_imagefunction, a core code snippet most worth reusing during integration.4.2 Inference Engine Lifecycle and Concurrency Management
Kreuzberg, as a high-performance framework, typically runs in an async environment (Tokio Runtime).
Model Persistence: Loading ONNX models (det.onnx, rec.onnx) is time-consuming. Loading must complete during
OcrBackendinitialization, with Session objects persistently stored in structs. Since Session is thread-safe (inort), we can wrap it inArc(atomic reference counting) to share a single model instance across multiple concurrent OCR tasks, greatly saving memory.Async Computation Isolation: Although
ortsupports multi-threaded inference, inference itself is CPU-intensive. Directly calling inference functions in Tokio async tasks blocks the event loop, slowing server responses. Therefore,tokio::task::spawn_blockingmust be used to dispatch inference tasks to dedicated blocking thread pools.4.3 Post-processing and Decoding Algorithms
Model outputs are merely tensors requiring complex post-processing for text conversion.
DBNet Post-processing: Detection model output is a probability map. Rust-implemented bitmap generation and polygon expansion algorithms (Unclip) are needed to extract final text box coordinates.
CTC Decoding: Recognition model output is a character index sequence. CTC (Connectionist Temporal Classification) decoding logic must be implemented:
ppocr_keys_v1.txt).This step is extremely efficient in Rust using
HashMaporVeclookups.5. Performance and Resource Efficiency Analysis
Migration to Rust-native implementation yields multi-dimensional performance benefits.
5.1 Memory and Startup Overhead
Memory Consumption: Python solutions require loading the Python interpreter, NumPy, PaddlePaddle/PyTorch frameworks, and dependencies, with memory baseline typically 500MB to 1GB. In contrast,
ortor MNN-based Rust implementations only need lightweight inference libraries and model weight files. Testing shows rusto-rs runtime peak memory can be controlled around 200MB—a huge advantage for resource-constrained container environments.Cold Start Speed: Removing Python interpreter initialization makes Rust binary startup nearly instantaneous, crucial for Serverless deployment scenarios.
5.2 Computational Throughput
Eliminating GIL Lock: In high-concurrency scenarios, Python's GIL limits CPU-intensive task (like image preprocessing) parallel efficiency. Rust solutions can easily implement data-parallel image preprocessing using the Rayon library, fully utilizing multi-core CPU performance.
Zero-copy Data: Within Kreuzberg, image data flows from file parsers to OCR engines involving only pointer or reference passing in Rust. In Python hybrid architectures, serialization/deserialization between Rust heap memory and Python heap memory (or Buffer Protocol copying) often occurs, producing significant latency in bulk image processing.
6. Implementation Roadmap and Recommendations
Based on the above technical analysis, the following steps are recommended:
1. Proof of Concept (PoC)
oar-ocrorrusto-rsas dependencies.2. Build Adapter
kreuzberg-paddleplugin module in Kreuzberg repository.OcrBackendtrait.scanmethod, convert Kreuzberg image data tondarray(for ONNX) or MNN Tensor.3. Resolve Build and Distribution Issues
ortdynamically links C++ libraries. For simplified distribution, exploreort's static linking features or pre-configurelibonnxruntime.soin Dockerfiles.ocrsdesign, check local cache directories for model files duringinit, auto-downloading from HuggingFace or other CDNs if absent.4. Performance Tuning
criterioncrate for benchmark testing preprocessing, inference, and post-processing stages separately, identifying performance hotspots for optimization.Conclusion
Completely feasible with significant benefits.
Integrating pure Rust PaddleOCR into Kreuzberg is not only technically feasible but a critical step toward the framework's "high-performance, easy-deployment" vision. By leveraging mature Rust ecosystem libraries like
ortorrusto-rs, we can successfully remove the heavyweight Python runtime while retaining PaddleOCR's powerful multilingual recognition capabilities. This resolves the dependency hell plaguing developers and establishes a solid foundation for building high-throughput, low-latency document intelligence processing pipelines.Related Resources
Beta Was this translation helpful? Give feedback.
All reactions