sequenceDiagram
autonumber
participant Main as Main Thread
participant DetThread as Detection Thread<br/>(1 thread)
participant Detector as TextDetector
participant DXRT as dxrt::InferenceEngine
participant NPU as NPU Hardware
participant CBThread as DXRT Callback Thread
participant StageExec as Stage Executor<br/>(ThreadPool, 8 threads)
participant RecQueue as recQueue
participant RecThread as Recognition Thread<br/>(1 thread)
Note over Main, RecThread: 初始化阶段
Main->>Detector: init()
Detector->>DXRT: RegisterCallback(internalCallback)
Main->>Detector: setCallback(lambda)
Main->>StageExec: Create ThreadPool(8)
Main->>DetThread: Start 1 detection thread
Main->>RecThread: Start 1 recognition thread
Note over Main, RecThread: 运行阶段 (并发非阻塞)
loop Detection Loop
DetThread->>DetThread: Doc Preprocessing
DetThread->>DetThread: Detection Preprocess
DetThread->>Detector: runAsync(image, taskId, ...)
Detector->>Detector: Create DetectionContext
Detector->>DXRT: RunAsync(data, context)
DXRT-->>DetThread: Return Immediately
end
par NPU Inference
DXRT->>NPU: Execute Model
NPU-->>DXRT: Done
and Recognition Thread
loop RecLoop
RecQueue->>RecThread: pop() (Wait for data)
end
end
Note over CBThread: 回调阶段 (由 DXRT 线程触发)
DXRT->>CBThread: Trigger Callback
activate CBThread
CBThread->>Detector: internalCallback(outputs, context)
Detector->>Detector: DBPostProcess (Decode Boxes)
rect rgb(240, 248, 255)
Note right of CBThread: Dispatch to Stage Executor
CBThread->>StageExec: dispatch(sortBoxes + pushQueue)
end
deactivate CBThread
StageExec->>StageExec: Sort Boxes
StageExec->>RecQueue: push(RecognitionTask)
RecQueue-->>RecThread: Task Available
rect rgb(255, 248, 240)
Note over RecThread, StageExec: Interleaved Crop & Submit
loop For each box
RecThread->>RecThread: Crop single box
RecThread->>DXRT: ClassifyAsync / RecognizeAsync
DXRT-->>RecThread: Return Immediately
end
end
Note over CBThread, StageExec: Cls/Rec 回调分派到 Stage Executor
DXRT->>CBThread: Classification Complete
CBThread->>StageExec: dispatch(rotate + submitRecognition)
DXRT->>CBThread: Recognition Complete
CBThread->>StageExec: dispatch(updateResult + finalize)
graph TB
subgraph "DeepX OCR Project"
ROOT["OCR/"]
subgraph "Source Code"
SRC["src/"]
SRC_COMMON["common/<br/>geometry, visualizer, logger,<br/>thread_pool, concurrent_queue"]
SRC_PREPROC["preprocessing/<br/>uvdoc, image_ops"]
SRC_DET["detection/<br/>text_detector, db_postprocess"]
SRC_CLS["classification/<br/>text_classifier"]
SRC_REC["recognition/<br/>text_recognizer, rec_postprocess"]
SRC_PIPE["pipeline/<br/>ocr_pipeline"]
end
subgraph "Third Party Libraries"
THIRDPARTY["3rd-party/"]
TP_JSON["json/ (nlohmann)"]
TP_CLIPPER["clipper2/"]
TP_SPDLOG["spdlog/"]
TP_OPENCV["opencv/"]
TP_CONTRIB["opencv_contrib/<br/>(freetype module)"]
end
subgraph "Models"
ENGINE["engine/model_files/"]
MODEL_SERVER["server/<br/>det_v5_*.dxnn<br/>rec_v5_*.dxnn"]
MODEL_MOBILE["mobile/<br/>det_mobile_*.dxnn<br/>rec_mobile_*.dxnn"]
end
subgraph "Build & Test"
BUILD["build_Release/"]
BUILD_BIN["bin/<br/>benchmark<br/>test_pipeline_async"]
BUILD_ROOT["test_detector<br/>test_recognizer<br/>test_recognizer_mobile"]
end
end
ROOT --> SRC
ROOT --> THIRDPARTY
ROOT --> ENGINE
ROOT --> BUILD
SRC --> SRC_COMMON
SRC --> SRC_PREPROC
SRC --> SRC_DET
SRC --> SRC_CLS
SRC --> SRC_REC
SRC --> SRC_PIPE
THIRDPARTY --> TP_JSON
THIRDPARTY --> TP_CLIPPER
THIRDPARTY --> TP_SPDLOG
THIRDPARTY --> TP_OPENCV
THIRDPARTY --> TP_CONTRIB
ENGINE --> MODEL_SERVER
ENGINE --> MODEL_MOBILE
BUILD --> BUILD_BIN
BUILD --> BUILD_ROOT
flowchart LR
subgraph Input
IMG[("输入图像<br/>cv::Mat")]
end
subgraph DocPreprocessing["文档预处理 (可选)"]
ORI["方向检测<br/>doc_ori_fixed.dxnn"]
UVDOC["文档矫正<br/>UVDoc_pruned_p3.dxnn"]
end
subgraph Detection["文本检测"]
DET640["det_v5_640.dxnn<br/>(小图)"]
DET960["det_v5_960.dxnn<br/>(大图)"]
DBPOST["DBPostProcess<br/>文本框解码"]
end
subgraph Classification["方向分类"]
CLS["textline_ori.dxnn<br/>0°/180° 判断"]
end
subgraph Recognition["文本识别"]
REC3["rec_v5_ratio_3.dxnn"]
REC5["rec_v5_ratio_5.dxnn"]
REC10["rec_v5_ratio_10.dxnn"]
REC15["rec_v5_ratio_15.dxnn"]
REC25["rec_v5_ratio_25.dxnn"]
REC35["rec_v5_ratio_35.dxnn"]
CTCDEC["CTC Decoder<br/>字符解码"]
end
subgraph Output
RESULT[("OCR 结果<br/>boxes + texts")]
end
IMG --> ORI --> UVDOC
UVDOC --> DET640
UVDOC --> DET960
DET640 --> DBPOST
DET960 --> DBPOST
DBPOST --> CLS
CLS --> REC3 & REC5 & REC10 & REC15 & REC25 & REC35
REC3 & REC5 & REC10 & REC15 & REC25 & REC35 --> CTCDEC
CTCDEC --> RESULT
| 模型文件 | 用途 | 输入尺寸 |
|---|---|---|
det_v5_640.dxnn |
文本检测 (小图) | 640×640 |
det_v5_960.dxnn |
文本检测 (大图) | 960×960 |
rec_v5_ratio_3.dxnn |
文本识别 | 48×144 (ratio=3) |
rec_v5_ratio_5.dxnn |
文本识别 | 48×240 (ratio=5) |
rec_v5_ratio_10.dxnn |
文本识别 | 48×480 (ratio=10) |
rec_v5_ratio_15.dxnn |
文本识别 | 48×720 (ratio=15) |
rec_v5_ratio_25.dxnn |
文本识别 | 48×1200 (ratio=25) |
rec_v5_ratio_35.dxnn |
文本识别 | 48×1680 (ratio=35) |
textline_ori.dxnn |
文本方向分类 | 80×160 |
doc_ori_fixed.dxnn |
文档方向检测 | - |
UVDoc_pruned_p3.dxnn |
文档矫正 | 488×712 |
| 模型文件 | 用途 | 说明 |
|---|---|---|
det_mobile_640.dxnn |
文本检测 | 轻量版 |
det_mobile_960.dxnn |
文本检测 | 轻量版 |
rec_mobile_ratio_*.dxnn |
文本识别 | 6个不同宽高比 |