feat: adapt rapid ocr char rec

Joker1212 · Joker1212 · commit ed6618224902 · 2024-11-22T22:05:13.000+08:00
diff --git a/README.md b/README.md
@@ -15,16 +15,16 @@
 </div>
 
 ### 最近更新
-- **2024.10.22**
-  - 补充复杂背景多表格检测提取方案[RapidTableDet](https://github.com/RapidAI/RapidTableDetection)
 - **2024.11.12**
-  - 抽离模型识别和处理过程核心阈值，方便大家进行微调适配自己的场景[微调入参参考](#核心参数)   
+  - 抽离模型识别和处理过程核心阈值，方便大家进行微调适配自己的场景[输入参数](#核心参数)   
 - **2024.11.16**
-  - 补充文档扭曲矫正方案，可作为前置处理 [文档扭曲变形修正](https://github.com/Joker1212/RapidUnWrap)
+  - 补充文档扭曲矫正方案，可作为前置处理 [RapidUnwrap](https://github.com/Joker1212/RapidUnWrap)
+- **2024.11.22**
+  - 支持单字符匹配方案，需要RapidOCR>=1.4.0
     
 ### 简介
 💖该仓库是用来对文档中表格做结构化识别的推理库，包括来自阿里读光有线和无线表格识别模型，llaipython(微信)贡献的有线表格模型，网易Qanything内置表格分类模型等。\
-[快速开始](#安装) [模型评测](#指标结果) [使用建议](#使用建议) [文档扭曲变形修正](https://github.com/Joker1212/RapidUnWrap) [表格旋转及透视修正](#表格旋转及透视修正) [微调入参参考](#核心参数) [常见问题](#FAQ) [更新计划](#更新计划)
+[快速开始](#安装) [模型评测](#指标结果) [使用建议](#使用建议) [单字匹配](#单字ocr匹配) [文档扭曲修正](https://github.com/Joker1212/RapidUnWrap) [表格旋转及透视修正](#表格旋转及透视修正) [输入参数](#核心参数) [常见问题](#FAQ) [更新计划](#更新计划)
 #### 特点
 
 ⚡  **快**  采用ONNXRuntime作为推理引擎，cpu下单图推理1-7s
@@ -106,7 +106,6 @@ print(f"elasp: {elasp}")
 # 使用其他ocr模型
 #ocr_engine =RapidOCR(det_model_dir="xxx/det_server_infer.onnx",rec_model_dir="xxx/rec_server_infer.onnx")
 #ocr_res, _ = ocr_engine(img_path)
-#html, elasp, polygons, logic_points, ocr_res = table_engine(img_path, ocr_result=ocr_res)  
 
 # output_dir = f'outputs'
 # complete_html = format_html(html)
@@ -121,6 +120,17 @@ print(f"elasp: {elasp}")
 # plot_rec_box(img_path, f"{output_dir}/ocr_box.jpg", ocr_res)
 ```
 
+#### 单字ocr匹配
+```python
+# 将单字box转换为行识别同样的结构)
+from rapidocr_onnxruntime import RapidOCR
+from wired_table_rec.utils_table_recover import trans_char_ocr_res
+img_path = "tests/test_files/wired/table4.jpg"
+ocr_engine =RapidOCR()
+ocr_res, _ = ocr_engine(img_path, return_word_box=True)
+ocr_res = trans_char_ocr_res(ocr_res)
+```
+
 #### 表格旋转及透视修正
 ##### 1.简单背景，小角度场景
 ```python
@@ -165,19 +175,17 @@ for i, res in enumerate(result):
 ```python
 wired_table_rec = WiredTableRecognition()
 html, elasp, polygons, logic_points, ocr_res = wired_table_rec(
-    img_path,
+    img, # 图片 Union[str, np.ndarray, bytes, Path, PIL.Image.Image]
+    ocr_result, # 输入rapidOCR识别结果，不传默认使用内部rapidocr模型
     version="v2", #默认使用v2线框模型，切换阿里读光模型可改为v1
-    morph_close=True, # 是否进行形态学操作,辅助找到更多线框,默认为True
-    more_h_lines=True, # 是否基于线框检测结果进行更多水平线检查，辅助找到更小线框, 默认为True
-    h_lines_threshold = 100, # 必须开启more_h_lines, 连接横线检测像素阈值，小于该值会生成新横线，默认为100
-    more_v_lines=True, # 是否基于线框检测结果进行更多垂直线检查，辅助找到更小线框, 默认为True
-    v_lines_threshold = 15, # 必须开启more_v_lines, 连接竖线检测像素阈值，小于该值会生成新竖线，默认为15
-    extend_line=True, # 是否基于线框检测结果进行线段延长，辅助找到更多线框, 默认为True
+    enhance_box_line=True, # 识别框切割增强(关闭避免多余切割，开启减少漏切割)，默认为True
     need_ocr=True, # 是否进行OCR识别, 默认为True
     rec_again=True,# 是否针对未识别到文字的表格框,进行单独截取再识别,默认为True
 )
 lineless_table_rec = LinelessTableRecognition()
 html, elasp, polygons, logic_points, ocr_res = lineless_table_rec(
+    img, # 图片 Union[str, np.ndarray, bytes, Path, PIL.Image.Image]
+    ocr_result, # 输入rapidOCR识别结果，不传默认使用内部rapidocr模型
     need_ocr=True, # 是否进行OCR识别, 默认为True
     rec_again=True,# 是否针对未识别到文字的表格框,进行单独截取再识别,默认为True
 )
diff --git a/README_en.md b/README_en.md
@@ -13,17 +13,16 @@
 </div>
 
 ### Recent Updates
-- **2024.10.22**
-    - Added the complex background multi-table detection and extraction solution [RapidTableDet](https://github.com/RapidAI/RapidTableDetection).
-
 - **2024.11.12**
     - Extracted model recognition and processing core thresholds for easier fine-tuning according to specific scenarios. See [Core Parameters](#core-parameters).
 - **2024.11.16**
-    - Added document distortion correction solution, which can be used as a pre-processing step [Document Distortion Correction](https://github.com/Joker1212/RapidUnWrap)
+    - Added document distortion correction solution, which can be used as a pre-processing step [RapidUnWrap](https://github.com/Joker1212/RapidUnWrap)
+- **2024.11.22**
+    - Support Char Rec, RapidOCR>=1.4.0 [RapidUnWrap](https://github.com/Joker1212/RapidUnWrap)
 ### Introduction
 💖 This repository serves as an inference library for structured recognition of tables within documents, including models for wired and wireless table recognition from Alibaba DulaLight, a wired table model from llaipython (WeChat), and a built-in table classification model from NetEase Qanything.
 
-[Quick Start](#installation) [Model Evaluation](#evaluation-results) [Usage Recommendations](#usage-recommendations) [Document Distortion Correction](https://github.com/Joker1212/RapidUnWrap) [Table Rotation & Perspective Correction](#table-rotation-and-perspective-correction) [Fine-tuning Input Parameters Reference](#core-parameters) [Frequently Asked Questions](#faqs) [Update Plan](#update-plan)
+[Quick Start](#installation) [Model Evaluation](#evaluation-results) [Char Rec](#Single-Character-OCR-Matching) [Usage Recommendations](#usage-recommendations) [Document Distortion Correction](https://github.com/Joker1212/RapidUnWrap) [Table Rotation & Perspective Correction](#table-rotation-and-perspective-correction) [Input Parameters](#core-parameters) [Frequently Asked Questions](#faqs) [Update Plan](#update-plan)
 #### Features
 
 ⚡ **Fast:** Uses ONNXRuntime as the inference engine, achieving 1-7 seconds per image on CPU.
@@ -121,6 +120,16 @@ print(f"elasp: {elasp}")
 # Visualize OCR recognition boxes
 # plot_rec_box(img_path, f"{output_dir}/ocr_box.jpg", ocr_res)
 ```
+#### Single Character OCR Matching
+```python
+# Convert single character boxes to the same structure as line recognition
+from rapidocr_onnxruntime import RapidOCR
+from wired_table_rec.utils_table_recover import trans_char_ocr_res
+img_path = "tests/test_files/wired/table4.jpg"
+ocr_engine =RapidOCR()
+ocr_res, _ = ocr_engine(img_path, return_word_box=True)
+ocr_res = trans_char_ocr_res(ocr_res)
+```
 
 #### Table Rotation and Perspective Correction
 ##### 1. Simple Background, Small Angle Scene
@@ -166,21 +175,19 @@ for i, res in enumerate(result):
 ```python
 wired_table_rec = WiredTableRecognition()
 html, elasp, polygons, logic_points, ocr_res = wired_table_rec(
-    img_path,
-    version="v2", # Default to use v2 line model, switch to Alibaba ReadLight model by changing to v1
-    morph_close=True,# Whether to perform morphological operations to find more lines, default is True
-    more_h_lines=True, # Whether to check for more horizontal lines based on line detection results to find smaller lines, default is True
-    h_lines_threshold = 100, # Must enable more_h_lines, threshold for connecting horizontal line detection pixels, new horizontal lines will be generated if below this value, default is 100
-    more_v_lines=True, # Whether to check for more vertical lines based on line detection results to find smaller lines, default is True
-    v_lines_threshold = 15, # Must enable more_v_lines, threshold for connecting vertical line detection pixels, new vertical lines will be generated if below this value, default is 15
-    extend_line=True, # Whether to extend line segments based on line detection results to find more lines, default is True
-    need_ocr=True, # Whether to perform OCR recognition, default is True
-    rec_again=True,# Whether to re-recognize table boxes that were not recognized, default is True
+    img,  # Image Union[str, np.ndarray, bytes, Path, PIL.Image.Image]
+    ocr_result,  # Input rapidOCR recognition result, use internal rapidocr model by default if not provided
+    version="v2",  # Default to using v2 line model, switch to AliDamo model by changing to v1
+    enhance_box_line=True,  # Enhance box line find (turn off to avoid excessive cutting, turn on to reduce missed cuts), default is True
+    need_ocr=True,  # Whether to perform OCR recognition, default is True
+    rec_again=True,  # Whether to re-recognize table boxes without detected text by cropping them separately, default is True
 )
 lineless_table_rec = LinelessTableRecognition()
 html, elasp, polygons, logic_points, ocr_res = lineless_table_rec(
-    need_ocr=True, # Whether to perform OCR recognition, default is True
-    rec_again=True, # Whether to re-recognize table boxes that were not recognized, default is True
+    img,  # Image Union[str, np.ndarray, bytes, Path, PIL.Image.Image]
+    ocr_result,  # Input rapidOCR recognition result, use internal rapidocr model by default if not provided
+    need_ocr=True,  # Whether to perform OCR recognition, default is True
+    rec_again=True,  # Whether to re-recognize table boxes without detected text by cropping them separately, default is True
 )
 ```
 
diff --git a/lineless_table_rec/utils_table_recover.py b/lineless_table_rec/utils_table_recover.py
@@ -605,6 +605,19 @@ def format_html(html):
     """
 
 
+def trans_char_ocr_res(ocr_res):
+    word_result = []
+    for res in ocr_res:
+        score = res[2]
+        for word_box, word in zip(res[3], res[4]):
+            word_res = []
+            word_res.append(word_box)
+            word_res.append(word)
+            word_res.append(score)
+            word_result.append(word_res)
+    return word_result
+
+
 def get_rotate_crop_image(img: np.ndarray, points: np.ndarray) -> np.ndarray:
     img_crop_width = int(
         max(
diff --git a/wired_table_rec/table_line_rec_plus.py b/wired_table_rec/table_line_rec_plus.py
@@ -73,17 +73,18 @@ def postprocess(self, img, pred, **kwargs):
         h_lines_threshold = kwargs.get("h_lines_threshold", 100) if kwargs else 100
         v_lines_threshold = kwargs.get("v_lines_threshold", 15) if kwargs else 15
         angle = kwargs.get("angle", 50) if kwargs else 50
+        enhance_box_line = kwargs.get("enhance_box_line") if kwargs else True
         morph_close = (
-            kwargs.get("morph_close", True) if kwargs else True
+            kwargs.get("morph_close", enhance_box_line) if kwargs else enhance_box_line
         )  # 是否进行闭合运算以找到更多小的框
         more_h_lines = (
-            kwargs.get("more_h_lines", True) if kwargs else True
+            kwargs.get("more_h_lines", enhance_box_line) if kwargs else enhance_box_line
         )  # 是否调整以找到更多的横线
         more_v_lines = (
-            kwargs.get("more_v_lines", True) if kwargs else True
+            kwargs.get("more_v_lines", enhance_box_line) if kwargs else enhance_box_line
         )  # 是否调整以找到更多的横线
         extend_line = (
-            kwargs.get("extend_line", True) if kwargs else True
+            kwargs.get("extend_line", enhance_box_line) if kwargs else enhance_box_line
         )  # 是否进行线段延长使得端点连接
 
         ori_shape = img.shape
diff --git a/wired_table_rec/utils_table_recover.py b/wired_table_rec/utils_table_recover.py
@@ -288,6 +288,19 @@ def plot_rec_box_with_logic_info(img_path, output_path, logic_points, sorted_pol
         cv2.imwrite(output_path, img)
 
 
+def trans_char_ocr_res(ocr_res):
+    word_result = []
+    for res in ocr_res:
+        score = res[2]
+        for word_box, word in zip(res[3], res[4]):
+            word_res = []
+            word_res.append(word_box)
+            word_res.append(word)
+            word_res.append(score)
+            word_result.append(word_res)
+    return word_result
+
+
 def plot_rec_box(img_path, output_path, sorted_polygons):
     """
     :param img_path