Skip to content

Commit 52dd5a2

Browse files
committed
chore: add en readme and optimize visual fun
1 parent fe60246 commit 52dd5a2

File tree

4 files changed

+271
-12
lines changed

4 files changed

+271
-12
lines changed

README.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@
1010
<a href="https://semver.org/"><img alt="SemVer2.0" src="https://img.shields.io/badge/SemVer-2.0-brightgreen"></a>
1111
<a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
1212
<a href="https://github.com/RapidAI/TableStructureRec/blob/c41bbd23898cb27a957ed962b0ffee3c74dfeff1/LICENSE"><img alt="GitHub" src="https://img.shields.io/badge/license-Apache 2.0-blue"></a>
13+
14+
[English](README_en.md) | 简体中文
1315
</div>
1416

1517
### 最近更新
@@ -185,8 +187,9 @@ html, elasp, polygons, logic_points, ocr_res = lineless_table_rec(
185187
## FAQ
186188
1. **问:识别框丢失了内部文字信息**
187189
- 答:默认使用的rapidocr小模型,如果需要更高精度的效果,可以从 [模型列表](https://rapidai.github.io/RapidOCRDocs/model_list/#_1)
188-
下载更高精度的ocr模型,在执行时传入ocr_result即可
189-
190+
下载更高精度的ocr模型,在执行时传入ocr_result即可,
191+
- 或者尝试调节rapid_ocr的参数, 根据在线demo调节参数, [modelscope](https://www.modelscope.cn/studios/liekkas/RapidOCRDemo/summary) [huggingface](https://huggingface.co/spaces/SWHL/RapidOCRDemo)
192+
然后在推理时传入即可
190193
3. **问:模型支持 gpu 加速吗?**
191194
- 答:目前表格模型的推理非常快,有线表格在100ms级别,无线表格在500ms级别,
192195
主要耗时在ocr阶段,可以参考 [rapidocr_paddle](https://rapidai.github.io/RapidOCRDocs/install_usage/rapidocr_paddle/usage/#_3)

README_en.md

Lines changed: 238 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,238 @@
1+
<div align="center">
2+
<div align="center">
3+
<h1><b>📊 Table Structure Recognition</b></h1>
4+
</div>
5+
<a href=""><img src="https://img.shields.io/badge/Python->=3.6,&lt;3.12-aff.svg"></a>
6+
<a href=""><img src="https://img.shields.io/badge/OS-Linux%2C%20Mac%2C%20Win-pink.svg"></a>
7+
<a href="https://pypi.org/project/lineless-table-rec/"><img alt="PyPI" src="https://img.shields.io/pypi/v/lineless-table-rec"></a>
8+
<a href="https://pepy.tech/project/lineless-table-rec"><img src="https://static.pepy.tech/personalized-badge/lineless-table-rec?period=total&units=abbreviation&left_color=grey&right_color=blue&left_text=Downloads%20Lineless"></a>
9+
<a href="https://pepy.tech/project/wired-table-rec"><img src="https://static.pepy.tech/personalized-badge/wired-table-rec?period=total&units=abbreviation&left_color=grey&right_color=blue&left_text=Downloads%20Wired"></a>
10+
<a href="https://semver.org/"><img alt="SemVer2.0" src="https://img.shields.io/badge/SemVer-2.0-brightgreen"></a>
11+
<a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
12+
<a href="https://github.com/RapidAI/TableStructureRec/blob/c41bbd23898cb27a957ed962b0ffee3c74dfeff1/LICENSE"><img alt="GitHub" src="https://img.shields.io/badge/license-Apache%202.0-blue"></a>
13+
</div>
14+
15+
### Recent Updates
16+
- **2024.10.22**
17+
- Added the complex background multi-table detection and extraction solution [RapidTableDet](https://github.com/RapidAI/RapidTableDetection).
18+
19+
- **2024.10.29**
20+
- Retrained the table classifier using YOLO11 to fix the logic coordinate restoration error in wired_table_rec v2 and updated evaluations.
21+
22+
- **2024.11.12**
23+
- Extracted model recognition and processing core thresholds for easier fine-tuning according to specific scenarios. See [Core Parameters](#core-parameters).
24+
25+
### Introduction
26+
💖 This repository serves as an inference library for structured recognition of tables within documents, including models for wired and wireless table recognition from Alibaba DulaLight, a wired table model from llaipython (WeChat), and a built-in table classification model from NetEase Qanything.
27+
28+
[Quick Start](#installation) [Model Evaluation](#evaluation-results) [Usage Recommendations](#usage-recommendations) [Table Rotation & Perspective Correction](#table-rotation-and-perspective-correction) [Fine-tuning Input Parameters Reference](#core-parameters) [Frequently Asked Questions](#faqs) [Update Plan](#update-plan)
29+
#### Features
30+
31+
**Fast:** Uses ONNXRuntime as the inference engine, achieving 1-7 seconds per image on CPU.
32+
33+
🎯 **Accurate:** Combines a table type classification model to distinguish between wired and wireless tables, providing more refined tasks and higher accuracy.
34+
35+
🛡️ **Stable:** Does not depend on any third-party training frameworks; relies only on essential base libraries, avoiding package conflicts.
36+
37+
### Online Demonstrations
38+
[modelscope魔搭](https://www.modelscope.cn/studios/jockerK/TableRec) [huggingface](https://huggingface.co/spaces/Joker1212/TableDetAndRec)
39+
40+
### Effect Showcase
41+
42+
<div align="center">
43+
<img src="https://github.com/RapidAI/TableStructureRec/releases/download/v0.0.0/demo_img_output.gif" alt="Demo" width="100%" height="100%">
44+
</div>
45+
46+
### Evaluation Results
47+
48+
[TableRecognitionMetric Evaluation Tool](https://github.com/SWHL/TableRecognitionMetric)
49+
[huggingface Dataset](https://huggingface.co/datasets/SWHL/table_rec_test_dataset)
50+
[modelscope Dataset](https://www.modelscope.cn/datasets/jockerK/TEDS_TEST/files)
51+
[Rapid OCR](https://github.com/RapidAI/RapidOCR)
52+
53+
Test Environment: Ubuntu 20.04, Python 3.10.10, opencv-python 4.10.0.84
54+
55+
Note:
56+
StructEqTable outputs in LaTeX format.测评仅选取成功转换为 HTML and stripped of style tags.
57+
58+
Surya-Tabled uses its built-in OCR module, which is a row-column recognition model and cannot identify cell merges, resulting in lower scores.
59+
60+
| Method | TEDS | TEDS-only-structure |
61+
|:------------------------------------------------------------------------------------------------|:-----------:|:-------------------:|
62+
| [surya-tabled(--skip-detect)](https://github.com/VikParuchuri/tabled) | 0.33437 | 0.65865 |
63+
| [surya-tabled](https://github.com/VikParuchuri/tabled) | 0.33940 | 0.67103 |
64+
| [deepdoctection(rag-flow)](https://github.com/deepdoctection/deepdoctection?tab=readme-ov-file) | 0.59975 | 0.69918 |
65+
| [ppstructure_table_master](https://github.com/PaddlePaddle/PaddleOCR/tree/main/ppstructure) | 0.61606 | 0.73892 |
66+
| [ppsturcture_table_engine](https://github.com/PaddlePaddle/PaddleOCR/tree/main/ppstructure) | 0.67924 | 0.78653 |
67+
| [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy) | 0.67310 | 0.81210 |
68+
| [RapidTable(SLANet)](https://github.com/RapidAI/RapidTable) | 0.71654 | 0.81067 |
69+
| table_cls + wired_table_rec v1 + lineless_table_rec | 0.75288 | 0.82574 |
70+
| table_cls + wired_table_rec v2 + lineless_table_rec | 0.77676 | 0.84580 |
71+
| [RapidTable(SLANet-plus)](https://github.com/RapidAI/RapidTable) | **0.84481** | **0.91369** |
72+
73+
### Usage Recommendations
74+
wired_table_rec_v2 (highest precision for wired tables): General scenes for wired tables (papers, magazines, journals, receipts, invoices, bills)
75+
76+
paddlex-SLANet-plus (highest overall precision): Document scene tables (tables in papers, magazines, and journals) [Fine-tuning Input Parameters Reference](#core-parameters)
77+
78+
### Installation
79+
80+
```python
81+
pip install wired_table_rec lineless_table_rec table_cls
82+
```
83+
84+
### Quick start
85+
86+
``` python {linenos=table}
87+
import os
88+
89+
from lineless_table_rec import LinelessTableRecognition
90+
from lineless_table_rec.utils_table_recover import format_html, plot_rec_box_with_logic_info, plot_rec_box
91+
from table_cls import TableCls
92+
from wired_table_rec import WiredTableRecognition
93+
94+
lineless_engine = LinelessTableRecognition()
95+
wired_engine = WiredTableRecognition()
96+
# Default small YOLO model (0.1s), can switch to higher precision YOLOX (0.25s), or faster QAnything (0.07s) model
97+
table_cls = TableCls() # TableCls(model_type="yolox"),TableCls(model_type="q")
98+
img_path = f'images/img14.jpg'
99+
100+
cls,elasp = table_cls(img_path)
101+
if cls == 'wired':
102+
table_engine = wired_engine
103+
else:
104+
table_engine = lineless_engine
105+
106+
html, elasp, polygons, logic_points, ocr_res = table_engine(img_path)
107+
print(f"elasp: {elasp}")
108+
109+
# Use other OCR models
110+
#ocr_engine =RapidOCR(det_model_dir="xxx/det_server_infer.onnx",rec_model_dir="xxx/rec_server_infer.onnx")
111+
#ocr_res, _ = ocr_engine(img_path)
112+
#html, elasp, polygons, logic_points, ocr_res = table_engine(img_path, ocr_result=ocr_res)
113+
114+
# output_dir = f'outputs'
115+
# complete_html = format_html(html)
116+
# os.makedirs(os.path.dirname(f"{output_dir}/table.html"), exist_ok=True)
117+
# with open(f"{output_dir}/table.html", "w", encoding="utf-8") as file:
118+
# file.write(complete_html)
119+
# Visualize table recognition boxes + logical row and column information
120+
# plot_rec_box_with_logic_info(
121+
# img_path, f"{output_dir}/table_rec_box.jpg", logic_points, polygons
122+
# )
123+
# Visualize OCR recognition boxes
124+
# plot_rec_box(img_path, f"{output_dir}/ocr_box.jpg", ocr_res)
125+
```
126+
127+
#### Table Rotation and Perspective Correction
128+
##### 1. Simple Background, Small Angle Scene
129+
```python
130+
import cv2
131+
132+
img_path = f'tests/test_files/wired/squeeze_error.jpeg'
133+
from wired_table_rec.utils import ImageOrientationCorrector
134+
135+
img_orientation_corrector = ImageOrientationCorrector()
136+
img = cv2.imread(img_path)
137+
img = img_orientation_corrector(img)
138+
cv2.imwrite(f'img_rotated.jpg', img)
139+
```
140+
##### 2. Complex Background, Multiple Tables Scene
141+
For GPU or higher precision scenarios, please refer to the [RapidTableDet](https://github.com/RapidAI/RapidTableDetection) project.
142+
```python
143+
pip install rapid-table-det
144+
```
145+
```python
146+
import os
147+
import cv2
148+
from rapid_table_det.utils import img_loader, visuallize, extract_table_img
149+
from rapid_table_det.inference import TableDetector
150+
table_det = TableDetector()
151+
img_path = f"tests/test_files/chip.jpg"
152+
result, elapse = table_det(img_path)
153+
img = img_loader(img_path)
154+
extract_img = img.copy()
155+
#There may be multiple tables
156+
for i, res in enumerate(result):
157+
box = res["box"]
158+
lt, rt, rb, lb = res["lt"], res["rt"], res["rb"], res["lb"]
159+
# Recognition box and top-left corner position
160+
img = visuallize(img, box, lt, rt, rb, lb)
161+
# Perspective transformation to extract table image
162+
wrapped_img = extract_table_img(extract_img.copy(), lt, rt, rb, lb)
163+
# cv2.imwrite(f"{out_dir}/{file_name}-extract-{i}.jpg", wrapped_img)
164+
# cv2.imwrite(f"{out_dir}/{file_name}-visualize.jpg", img)
165+
```
166+
167+
### Core Parameters
168+
```python
169+
wired_table_rec = WiredTableRecognition()
170+
html, elasp, polygons, logic_points, ocr_res = wired_table_rec(
171+
img_path,
172+
version="v2", # Default to use v2 line model, switch to Alibaba ReadLight model by changing to v1
173+
morph_close=True,# Whether to perform morphological operations to find more lines, default is True
174+
more_h_lines=True, # Whether to check for more horizontal lines based on line detection results to find smaller lines, default is True
175+
h_lines_threshold = 100, # Must enable more_h_lines, threshold for connecting horizontal line detection pixels, new horizontal lines will be generated if below this value, default is 100
176+
more_v_lines=True, # Whether to check for more vertical lines based on line detection results to find smaller lines, default is True
177+
v_lines_threshold = 15, # Must enable more_v_lines, threshold for connecting vertical line detection pixels, new vertical lines will be generated if below this value, default is 15
178+
extend_line=True, # Whether to extend line segments based on line detection results to find more lines, default is True
179+
need_ocr=True, # Whether to perform OCR recognition, default is True
180+
rec_again=True,# Whether to re-recognize table boxes that were not recognized, default is True
181+
)
182+
lineless_table_rec = LinelessTableRecognition()
183+
html, elasp, polygons, logic_points, ocr_res = lineless_table_rec(
184+
need_ocr=True, # Whether to perform OCR recognition, default is True
185+
rec_again=True, # Whether to re-recognize table boxes that were not recognized, default is True
186+
)
187+
```
188+
189+
## FAQ
190+
1. **Q: The recognition box lost internal text information**
191+
- **A: The default small RapidOCR model is used. If you need higher precision, you can download a higher precision OCR model from the [model list](https://rapidai.github.io/RapidOCRDocs/model_list/#_1) and pass it in during execution, or try adjusting the parameters of RapidOCR according to the online demo, [modelscope](https://www.modelscope.cn/studios/liekkas/RapidOCRDemo/summary) [huggingface](https://huggingface.co/spaces/SWHL/RapidOCRDemo)
192+
2. **Q: Does the model support GPU acceleration?**
193+
- **A: Currently, the inference of the table model is very fast, with wired tables at the 100ms level and wireless tables at the 500ms level. The main time consumption is in the OCR stage. You can refer to [rapidocr_paddle](https://rapidai.github.io/RapidOCRDocs/install_usage/rapidocr_paddle/usage/#_3) to accelerate the OCR recognition process.
194+
195+
### Update Plan
196+
197+
- [x] Add methods for correcting small-angle image offsets
198+
- [x] Increase dataset size and add more evaluation comparisons
199+
- [x] Add complex scene table detection and extraction to solve low recognition rates caused by rotation and perspective
200+
- [x] Optimize the table classifier
201+
- [ ] Optimize the wireless table model
202+
203+
### Processing Workflow
204+
205+
```mermaid
206+
A[/Table Image/] --> B([Table Classification table_cls]) B --> C([Wired Table Recognition wired_table_rec]) & D([Wireless Table Recognition lineless_table_rec]) --> E([Text Recognition rapidocr_onnxruntime]) E --> F[/HTML Structured Output/]
207+
```
208+
209+
### Acknowledgments
210+
211+
212+
[PaddleX Table Recognition](https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/module_usage/tutorials/ocr_modules/table_structure_recognition.md)
213+
214+
[PaddleOCR Table Recognition](https://github.com/PaddlePaddle/PaddleOCR/blob/4b17511491adcfd0f3e2970895d06814d1ce56cc/ppstructure/table/README_ch.md)
215+
216+
[Damo Academy - Table Structure Recognition - Wired Table](https://www.modelscope.cn/models/damo/cv_dla34_table-structure-recognition_cycle-centernet/summary)
217+
218+
[Damo Academy - Table Structure Recognition - Wireless Table](https://www.modelscope.cn/models/damo/cv_resnet-transformer_table-structure-recognition_lore/summary)
219+
220+
[Qanything-RAG](https://github.com/netease-youdao/QAnything)
221+
222+
Special thanks to llaipython (WeChat, providing a full suite of high-precision table extraction services) for providing the high-precision wired table model.
223+
224+
Special thanks to [MajexH](https://github.com/MajexH) for completing the table recognition test using deepdoctection (rag-flow).
225+
226+
### Contribution Guidelines
227+
228+
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
229+
230+
Please ensure appropriate updates to tests.
231+
232+
### [Sponsor](https://rapidai.github.io/Knowledge-QA-LLM/docs/sponsor/)
233+
234+
If you want to sponsor this project, you can directly click the Sponsor button at the top of the current page. Please write a note (**Your Github account name**) to facilitate adding to the sponsor list.
235+
236+
### Open Source License
237+
238+
This project is licensed under the [Apache 2.0](https://github.com/RapidAI/TableStructureRec/blob/c41bbd23898cb27a957ed962b0ffee3c74dfeff1/LICENSE) open source license.

lineless_table_rec/utils_table_recover.py

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -512,13 +512,22 @@ def plot_rec_box_with_logic_info(img_path, output_path, logic_points, sorted_pol
512512
y1 = round(y1)
513513
cv2.rectangle(img, (x0, y0), (x1, y1), (0, 0, 255), 1)
514514
# 增大字体大小和线宽
515-
font_scale = 1.0 # 原先是0.5
516-
thickness = 2 # 原先是1
517-
515+
font_scale = 0.7 # 原先是0.5
516+
thickness = 1 # 原先是1
517+
logic_point = logic_points[idx]
518518
cv2.putText(
519519
img,
520-
f"{idx}-{logic_points[idx]}",
521-
(x1, y1),
520+
f"row: {logic_point[0]}-{logic_point[1]}",
521+
(x1 + 3, y0 + 8),
522+
cv2.FONT_HERSHEY_PLAIN,
523+
font_scale,
524+
(0, 0, 255),
525+
thickness,
526+
)
527+
cv2.putText(
528+
img,
529+
f"col: {logic_point[2]}-{logic_point[3]}",
530+
(x1 + 3, y0 + 18),
522531
cv2.FONT_HERSHEY_PLAIN,
523532
font_scale,
524533
(0, 0, 255),

wired_table_rec/utils_table_recover.py

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -262,13 +262,22 @@ def plot_rec_box_with_logic_info(img_path, output_path, logic_points, sorted_pol
262262
y1 = round(y1)
263263
cv2.rectangle(img, (x0, y0), (x1, y1), (0, 0, 255), 1)
264264
# 增大字体大小和线宽
265-
font_scale = 1.0 # 原先是0.5
266-
thickness = 2 # 原先是1
267-
265+
font_scale = 0.7 # 原先是0.5
266+
thickness = 1 # 原先是1
267+
logic_point = logic_points[idx]
268268
cv2.putText(
269269
img,
270-
f"{idx}-{logic_points[idx]}",
271-
(x1, y1),
270+
f"row: {logic_point[0]}-{logic_point[1]}",
271+
(x0 + 3, y0 + 8),
272+
cv2.FONT_HERSHEY_PLAIN,
273+
font_scale,
274+
(0, 0, 255),
275+
thickness,
276+
)
277+
cv2.putText(
278+
img,
279+
f"col: {logic_point[2]}-{logic_point[3]}",
280+
(x0 + 3, y0 + 18),
272281
cv2.FONT_HERSHEY_PLAIN,
273282
font_scale,
274283
(0, 0, 255),

0 commit comments

Comments
 (0)