chore: add en readme and optimize visual fun

Joker1212 · Joker1212 · commit 52dd5a275146 · 2024-11-16T11:29:46.000+08:00
diff --git a/README.md b/README.md
@@ -10,6 +10,8 @@
   <a href="https://semver.org/"><img alt="SemVer2.0" src="https://img.shields.io/badge/SemVer-2.0-brightgreen"></a>
   <a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
   <a href="https://github.com/RapidAI/TableStructureRec/blob/c41bbd23898cb27a957ed962b0ffee3c74dfeff1/LICENSE"><img alt="GitHub" src="https://img.shields.io/badge/license-Apache 2.0-blue"></a>
+
+[English](README_en.md) | 简体中文 
 </div>
 
 ### 最近更新
@@ -185,8 +187,9 @@ html, elasp, polygons, logic_points, ocr_res = lineless_table_rec(
 ## FAQ
 1. **问：识别框丢失了内部文字信息**
    - 答：默认使用的rapidocr小模型，如果需要更高精度的效果，可以从 [模型列表](https://rapidai.github.io/RapidOCRDocs/model_list/#_1)
-   下载更高精度的ocr模型,在执行时传入ocr_result即可
-
+   下载更高精度的ocr模型,在执行时传入ocr_result即可, 
+   - 或者尝试调节rapid_ocr的参数, 根据在线demo调节参数， [modelscope](https://www.modelscope.cn/studios/liekkas/RapidOCRDemo/summary) [huggingface](https://huggingface.co/spaces/SWHL/RapidOCRDemo)
+     然后在推理时传入即可
 3. **问：模型支持 gpu 加速吗？**
     - 答：目前表格模型的推理非常快，有线表格在100ms级别，无线表格在500ms级别，
       主要耗时在ocr阶段，可以参考 [rapidocr_paddle](https://rapidai.github.io/RapidOCRDocs/install_usage/rapidocr_paddle/usage/#_3)
diff --git a/README_en.md b/README_en.md
@@ -0,0 +1,238 @@
+<div align="center">
+  <div align="center">
+    <h1><b>📊 Table Structure Recognition</b></h1>
+  </div>
+  <a href=""><img src="https://img.shields.io/badge/Python->=3.6,&lt;3.12-aff.svg"></a>
+  <a href=""><img src="https://img.shields.io/badge/OS-Linux%2C%20Mac%2C%20Win-pink.svg"></a>
+<a href="https://pypi.org/project/lineless-table-rec/"><img alt="PyPI" src="https://img.shields.io/pypi/v/lineless-table-rec"></a>
+<a href="https://pepy.tech/project/lineless-table-rec"><img src="https://static.pepy.tech/personalized-badge/lineless-table-rec?period=total&units=abbreviation&left_color=grey&right_color=blue&left_text=Downloads%20Lineless"></a>
+<a href="https://pepy.tech/project/wired-table-rec"><img src="https://static.pepy.tech/personalized-badge/wired-table-rec?period=total&units=abbreviation&left_color=grey&right_color=blue&left_text=Downloads%20Wired"></a>
+  <a href="https://semver.org/"><img alt="SemVer2.0" src="https://img.shields.io/badge/SemVer-2.0-brightgreen"></a>
+  <a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
+  <a href="https://github.com/RapidAI/TableStructureRec/blob/c41bbd23898cb27a957ed962b0ffee3c74dfeff1/LICENSE"><img alt="GitHub" src="https://img.shields.io/badge/license-Apache%202.0-blue"></a>
+</div>
+
+### Recent Updates
+- **2024.10.22**
+    - Added the complex background multi-table detection and extraction solution [RapidTableDet](https://github.com/RapidAI/RapidTableDetection).
+
+- **2024.10.29**
+    - Retrained the table classifier using YOLO11 to fix the logic coordinate restoration error in wired_table_rec v2 and updated evaluations.
+
+- **2024.11.12**
+    - Extracted model recognition and processing core thresholds for easier fine-tuning according to specific scenarios. See [Core Parameters](#core-parameters).
+
+### Introduction
+💖 This repository serves as an inference library for structured recognition of tables within documents, including models for wired and wireless table recognition from Alibaba DulaLight, a wired table model from llaipython (WeChat), and a built-in table classification model from NetEase Qanything.
+
+[Quick Start](#installation) [Model Evaluation](#evaluation-results) [Usage Recommendations](#usage-recommendations) [Table Rotation & Perspective Correction](#table-rotation-and-perspective-correction) [Fine-tuning Input Parameters Reference](#core-parameters) [Frequently Asked Questions](#faqs) [Update Plan](#update-plan)
+#### Features
+
+⚡ **Fast:** Uses ONNXRuntime as the inference engine, achieving 1-7 seconds per image on CPU.
+
+🎯 **Accurate:** Combines a table type classification model to distinguish between wired and wireless tables, providing more refined tasks and higher accuracy.
+
+🛡️ **Stable:** Does not depend on any third-party training frameworks; relies only on essential base libraries, avoiding package conflicts.
+
+### Online Demonstrations
+[modelscope魔搭](https://www.modelscope.cn/studios/jockerK/TableRec) [huggingface](https://huggingface.co/spaces/Joker1212/TableDetAndRec)
+
+### Effect Showcase
+
+<div align="center">
+    <img src="https://github.com/RapidAI/TableStructureRec/releases/download/v0.0.0/demo_img_output.gif" alt="Demo" width="100%" height="100%">
+</div>
+
+### Evaluation Results
+
+[TableRecognitionMetric Evaluation Tool](https://github.com/SWHL/TableRecognitionMetric)  
+[huggingface Dataset](https://huggingface.co/datasets/SWHL/table_rec_test_dataset)  
+[modelscope Dataset](https://www.modelscope.cn/datasets/jockerK/TEDS_TEST/files)  
+[Rapid OCR](https://github.com/RapidAI/RapidOCR)
+
+Test Environment: Ubuntu 20.04, Python 3.10.10, opencv-python 4.10.0.84
+
+Note:
+StructEqTable outputs in LaTeX format.测评仅选取成功转换为 HTML and stripped of style tags.
+
+Surya-Tabled uses its built-in OCR module, which is a row-column recognition model and cannot identify cell merges, resulting in lower scores.
+
+| Method                                                                                              |    TEDS     | TEDS-only-structure |
+|:------------------------------------------------------------------------------------------------|:-----------:|:-------------------:|
+| [surya-tabled(--skip-detect)](https://github.com/VikParuchuri/tabled)                           |   0.33437   |       0.65865       |
+| [surya-tabled](https://github.com/VikParuchuri/tabled)                                          |   0.33940   |       0.67103       |
+| [deepdoctection(rag-flow)](https://github.com/deepdoctection/deepdoctection?tab=readme-ov-file) |   0.59975   |       0.69918       |
+| [ppstructure_table_master](https://github.com/PaddlePaddle/PaddleOCR/tree/main/ppstructure)     |   0.61606   |       0.73892       |
+| [ppsturcture_table_engine](https://github.com/PaddlePaddle/PaddleOCR/tree/main/ppstructure)     |   0.67924   |       0.78653       |
+| [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy)                     |   0.67310   |       0.81210       |
+| [RapidTable(SLANet)](https://github.com/RapidAI/RapidTable)                                     |   0.71654   |       0.81067       |
+| table_cls + wired_table_rec v1 + lineless_table_rec                                             |   0.75288   |       0.82574       |
+| table_cls + wired_table_rec v2 + lineless_table_rec                                             |   0.77676   |       0.84580       |
+| [RapidTable(SLANet-plus)](https://github.com/RapidAI/RapidTable)                                | **0.84481** |     **0.91369**     |
+
+### Usage Recommendations
+wired_table_rec_v2 (highest precision for wired tables): General scenes for wired tables (papers, magazines, journals, receipts, invoices, bills)
+
+paddlex-SLANet-plus (highest overall precision): Document scene tables (tables in papers, magazines, and journals) [Fine-tuning Input Parameters Reference](#core-parameters)
+
+### Installation
+
+```python
+pip install wired_table_rec lineless_table_rec table_cls
+```
+
+### Quick start
+
+``` python {linenos=table}
+import os
+
+from lineless_table_rec import LinelessTableRecognition
+from lineless_table_rec.utils_table_recover import format_html, plot_rec_box_with_logic_info, plot_rec_box
+from table_cls import TableCls
+from wired_table_rec import WiredTableRecognition
+
+lineless_engine = LinelessTableRecognition()
+wired_engine = WiredTableRecognition()
+# Default small YOLO model (0.1s), can switch to higher precision YOLOX (0.25s), or faster QAnything (0.07s) model
+table_cls = TableCls() # TableCls(model_type="yolox"),TableCls(model_type="q")
+img_path = f'images/img14.jpg'
+
+cls,elasp = table_cls(img_path)
+if cls == 'wired':
+    table_engine = wired_engine
+else:
+    table_engine = lineless_engine
+  
+html, elasp, polygons, logic_points, ocr_res = table_engine(img_path)
+print(f"elasp: {elasp}")
+
+# Use other OCR models
+#ocr_engine =RapidOCR(det_model_dir="xxx/det_server_infer.onnx",rec_model_dir="xxx/rec_server_infer.onnx")
+#ocr_res, _ = ocr_engine(img_path)
+#html, elasp, polygons, logic_points, ocr_res = table_engine(img_path, ocr_result=ocr_res)  
+
+# output_dir = f'outputs'
+# complete_html = format_html(html)
+# os.makedirs(os.path.dirname(f"{output_dir}/table.html"), exist_ok=True)
+# with open(f"{output_dir}/table.html", "w", encoding="utf-8") as file:
+#     file.write(complete_html)
+# Visualize table recognition boxes + logical row and column information
+# plot_rec_box_with_logic_info(
+#     img_path, f"{output_dir}/table_rec_box.jpg", logic_points, polygons
+# )
+# Visualize OCR recognition boxes
+# plot_rec_box(img_path, f"{output_dir}/ocr_box.jpg", ocr_res)
+```
+
+#### Table Rotation and Perspective Correction
+##### 1. Simple Background, Small Angle Scene
+```python
+import cv2
+
+img_path = f'tests/test_files/wired/squeeze_error.jpeg'
+from wired_table_rec.utils import ImageOrientationCorrector
+
+img_orientation_corrector = ImageOrientationCorrector()
+img = cv2.imread(img_path)
+img = img_orientation_corrector(img)
+cv2.imwrite(f'img_rotated.jpg', img)
+```
+##### 2. Complex Background, Multiple Tables Scene
+For GPU or higher precision scenarios, please refer to the [RapidTableDet](https://github.com/RapidAI/RapidTableDetection) project.
+```python
+pip install rapid-table-det
+```
+```python
+import os
+import cv2
+from rapid_table_det.utils import img_loader, visuallize, extract_table_img
+from rapid_table_det.inference import TableDetector
+table_det = TableDetector()
+img_path = f"tests/test_files/chip.jpg"
+result, elapse = table_det(img_path)
+img = img_loader(img_path)
+extract_img = img.copy()
+#There may be multiple tables
+for i, res in enumerate(result):
+    box = res["box"]
+    lt, rt, rb, lb = res["lt"], res["rt"], res["rb"], res["lb"]
+    # Recognition box and top-left corner position
+    img = visuallize(img, box, lt, rt, rb, lb)
+    # Perspective transformation to extract table image
+    wrapped_img = extract_table_img(extract_img.copy(), lt, rt, rb, lb)
+#     cv2.imwrite(f"{out_dir}/{file_name}-extract-{i}.jpg", wrapped_img)
+# cv2.imwrite(f"{out_dir}/{file_name}-visualize.jpg", img)
+```
+
+### Core Parameters
+```python
+wired_table_rec = WiredTableRecognition()
+html, elasp, polygons, logic_points, ocr_res = wired_table_rec(
+    img_path,
+    version="v2", # Default to use v2 line model, switch to Alibaba ReadLight model by changing to v1
+    morph_close=True,# Whether to perform morphological operations to find more lines, default is True
+    more_h_lines=True, # Whether to check for more horizontal lines based on line detection results to find smaller lines, default is True
+    h_lines_threshold = 100, # Must enable more_h_lines, threshold for connecting horizontal line detection pixels, new horizontal lines will be generated if below this value, default is 100
+    more_v_lines=True, # Whether to check for more vertical lines based on line detection results to find smaller lines, default is True
+    v_lines_threshold = 15, # Must enable more_v_lines, threshold for connecting vertical line detection pixels, new vertical lines will be generated if below this value, default is 15
+    extend_line=True, # Whether to extend line segments based on line detection results to find more lines, default is True
+    need_ocr=True, # Whether to perform OCR recognition, default is True
+    rec_again=True,# Whether to re-recognize table boxes that were not recognized, default is True
+)
+lineless_table_rec = LinelessTableRecognition()
+html, elasp, polygons, logic_points, ocr_res = lineless_table_rec(
+    need_ocr=True, # Whether to perform OCR recognition, default is True
+    rec_again=True, # Whether to re-recognize table boxes that were not recognized, default is True
+)
+```
+
+## FAQ
+1. **Q: The recognition box lost internal text information**
+    - **A: The default small RapidOCR model is used. If you need higher precision, you can download a higher precision OCR model from the [model list](https://rapidai.github.io/RapidOCRDocs/model_list/#_1) and pass it in during execution, or try adjusting the parameters of RapidOCR according to the online demo, [modelscope](https://www.modelscope.cn/studios/liekkas/RapidOCRDemo/summary) [huggingface](https://huggingface.co/spaces/SWHL/RapidOCRDemo)
+2. **Q: Does the model support GPU acceleration?**
+    - **A: Currently, the inference of the table model is very fast, with wired tables at the 100ms level and wireless tables at the 500ms level. The main time consumption is in the OCR stage. You can refer to [rapidocr_paddle](https://rapidai.github.io/RapidOCRDocs/install_usage/rapidocr_paddle/usage/#_3) to accelerate the OCR recognition process.
+
+### Update Plan
+
+- [x] Add methods for correcting small-angle image offsets
+- [x] Increase dataset size and add more evaluation comparisons
+- [x] Add complex scene table detection and extraction to solve low recognition rates caused by rotation and perspective
+- [x] Optimize the table classifier
+- [ ] Optimize the wireless table model
+
+### Processing Workflow
+
+```mermaid
+A[/Table Image/] --> B([Table Classification table_cls]) B --> C([Wired Table Recognition wired_table_rec]) & D([Wireless Table Recognition lineless_table_rec]) --> E([Text Recognition rapidocr_onnxruntime]) E --> F[/HTML Structured Output/]
+```
+
+### Acknowledgments
+
+
+[PaddleX Table Recognition](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/ocr_modules/table_structure_recognition.md)
+
+[PaddleOCR Table Recognition](https://github.com/PaddlePaddle/PaddleOCR/blob/4b17511491adcfd0f3e2970895d06814d1ce56cc/ppstructure/table/README_ch.md)
+
+[Damo Academy - Table Structure Recognition - Wired Table](https://www.modelscope.cn/models/damo/cv_dla34_table-structure-recognition_cycle-centernet/summary)
+
+[Damo Academy - Table Structure Recognition - Wireless Table](https://www.modelscope.cn/models/damo/cv_resnet-transformer_table-structure-recognition_lore/summary)
+
+[Qanything-RAG](https://github.com/netease-youdao/QAnything)
+
+Special thanks to llaipython (WeChat, providing a full suite of high-precision table extraction services) for providing the high-precision wired table model.
+
+Special thanks to [MajexH](https://github.com/MajexH) for completing the table recognition test using deepdoctection (rag-flow).
+
+### Contribution Guidelines
+
+Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
+
+Please ensure appropriate updates to tests.
+
+### [Sponsor](https://rapidai.github.io/Knowledge-QA-LLM/docs/sponsor/)
+
+If you want to sponsor this project, you can directly click the Sponsor button at the top of the current page. Please write a note (**Your Github account name**) to facilitate adding to the sponsor list.
+
+### Open Source License
+
+This project is licensed under the [Apache 2.0](https://github.com/RapidAI/TableStructureRec/blob/c41bbd23898cb27a957ed962b0ffee3c74dfeff1/LICENSE) open source license.
diff --git a/lineless_table_rec/utils_table_recover.py b/lineless_table_rec/utils_table_recover.py
@@ -512,13 +512,22 @@ def plot_rec_box_with_logic_info(img_path, output_path, logic_points, sorted_pol
         y1 = round(y1)
         cv2.rectangle(img, (x0, y0), (x1, y1), (0, 0, 255), 1)
         # 增大字体大小和线宽
-        font_scale = 1.0  # 原先是0.5
-        thickness = 2  # 原先是1
-
+        font_scale = 0.7  # 原先是0.5
+        thickness = 1  # 原先是1
+        logic_point = logic_points[idx]
         cv2.putText(
             img,
-            f"{idx}-{logic_points[idx]}",
-            (x1, y1),
+            f"row: {logic_point[0]}-{logic_point[1]}",
+            (x1 + 3, y0 + 8),
+            cv2.FONT_HERSHEY_PLAIN,
+            font_scale,
+            (0, 0, 255),
+            thickness,
+        )
+        cv2.putText(
+            img,
+            f"col: {logic_point[2]}-{logic_point[3]}",
+            (x1 + 3, y0 + 18),
             cv2.FONT_HERSHEY_PLAIN,
             font_scale,
             (0, 0, 255),
diff --git a/wired_table_rec/utils_table_recover.py b/wired_table_rec/utils_table_recover.py
@@ -262,13 +262,22 @@ def plot_rec_box_with_logic_info(img_path, output_path, logic_points, sorted_pol
         y1 = round(y1)
         cv2.rectangle(img, (x0, y0), (x1, y1), (0, 0, 255), 1)
         # 增大字体大小和线宽
-        font_scale = 1.0  # 原先是0.5
-        thickness = 2  # 原先是1
-
+        font_scale = 0.7  # 原先是0.5
+        thickness = 1  # 原先是1
+        logic_point = logic_points[idx]
         cv2.putText(
             img,
-            f"{idx}-{logic_points[idx]}",
-            (x1, y1),
+            f"row: {logic_point[0]}-{logic_point[1]}",
+            (x0 + 3, y0 + 8),
+            cv2.FONT_HERSHEY_PLAIN,
+            font_scale,
+            (0, 0, 255),
+            thickness,
+        )
+        cv2.putText(
+            img,
+            f"col: {logic_point[2]}-{logic_point[3]}",
+            (x0 + 3, y0 + 18),
             cv2.FONT_HERSHEY_PLAIN,
             font_scale,
             (0, 0, 255),