Skip to content

Commit 133e0b4

Browse files
committed
feat: Support DocLayout YOLO model
1 parent 359f72f commit 133e0b4

File tree

7 files changed

+134
-37
lines changed

7 files changed

+134
-37
lines changed

README.md

Lines changed: 38 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,12 @@
1515
</div>
1616

1717
### 简介
18-
主要是做文档类图像的版面分析。具体来说,就是分析给定的文档类别图像(论文截图、研报等),定位其中类别和位置,如标题、段落、表格和图片等各个部分。
18+
19+
该项目主要是汇集全网开源的版面分析的项目,具体来说,就是分析给定的文档类别图像(论文截图、研报等),定位其中类别和位置,如标题、段落、表格和图片等各个部分。
1920

2021
⚠️注意:需要说明的是,由于不同场景下的版面差异较大,现阶段不存在一个模型可以搞定所有场景。如果实际业务需要,以下模型效果不好的话,建议构建自己的训练集微调。
2122

22-
目前支持以下场景的版面分析
23+
目前支持已经支持的版面分析模型如下
2324

2425
|`model_type`| 版面类型 | 模型名称 | 支持类别|
2526
| :------ | :----- | :------ | :----- |
@@ -30,72 +31,90 @@
3031
| `yolov8n_layout_report`| 研报 | `yolov8n_layout_report.onnx` | `['Text', 'Title', 'Header', 'Footer', 'Figure', 'Table', 'Toc', 'Figure caption', 'Table caption']` |
3132
| `yolov8n_layout_publaynet`| 英文 | `yolov8n_layout_publaynet.onnx` | `["Text", "Title", "List", "Table", "Figure"]` |
3233
| `yolov8n_layout_general6`| 通用 | `yolov8n_layout_general6.onnx` | `["Text", "Title", "Figure", "Table", "Caption", "Equation"]` |
34+
| 🔥`doclayout_yolo`| 通用 | `doclayout_yolo_docstructbench_imgsz1024.onnx` | `['title', 'text', 'abandon', 'figure', 'figure_caption', 'table', 'table_caption', 'table_footnote', 'isolate_formula', 'formula_caption']` |
3335

3436
PP模型来源:[PaddleOCR 版面分析](https://github.com/PaddlePaddle/PaddleOCR/blob/133d67f27dc8a241d6b2e30a9f047a0fb75bebbe/ppstructure/layout/README_ch.md)
3537

3638
yolov8n系列来源:[360LayoutAnalysis](https://github.com/360AILAB-NLP/360LayoutAnalysis)
3739

40+
(推荐使用)🔥doclayout_yolo模型来源:[DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO),该模型是目前最为优秀的开源模型,支持学术论文、Textbook、Financial、Exam Paper、Fuzzy Scans、PPT和Poster 7种文档类型的版面检测。值得一提的是,该模型支持的类别中存在`abandon`一类,主要是文档页面的页眉页脚部分,便于后续快速舍弃。
41+
3842
模型下载地址为:[link](https://github.com/RapidAI/RapidLayout/releases/tag/v0.0.0)
3943

4044
### 安装
45+
4146
由于模型较小,预先将中文版面分析模型(`layout_cdla.onnx`)打包进了whl包内,如果做中文版面分析,可直接安装使用
4247

4348
```bash
44-
$ pip install rapid-layout
49+
pip install rapid-layout
4550
```
4651

4752
### 使用方式
53+
4854
#### python脚本运行
55+
4956
```python
5057
import cv2
58+
5159
from rapid_layout import RapidLayout, VisLayout
5260

5361
# model_type类型参见上表。指定不同model_type时,会自动下载相应模型到安装目录下的。
54-
layout_engine = RapidLayout(conf_thres=0.5, model_type="pp_layout_cdla")
62+
layout_engine = RapidLayout(model_type="doclayout_yolo", conf_thres=0.2)
5563

56-
img = cv2.imread('test_images/layout.png')
64+
img_path = "tests/test_files/financial.jpg"
65+
img = cv2.imread(img_path)
5766

5867
boxes, scores, class_names, elapse = layout_engine(img)
5968
ploted_img = VisLayout.draw_detections(img, boxes, scores, class_names)
6069
if ploted_img is not None:
6170
cv2.imwrite("layout_res.png", ploted_img)
6271
```
6372

73+
### 可视化结果
74+
75+
<div align="center">
76+
<img src="https://github.com/RapidAI/RapidLayout/releases/download/v0.0.0/layout_res.png" width="80%" height="80%">
77+
</div>
78+
6479
#### 终端运行
80+
6581
```bash
6682
$ rapid_layout -h
6783
usage: rapid_layout [-h] -img IMG_PATH
68-
[-m {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6}]
69-
[--conf_thres {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6}]
70-
[--iou_thres {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6}]
71-
[--use_cuda] [--use_dml] [-v]
84+
[-m {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6,doclayout_yolo}]
85+
[--conf_thres {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6,doclayout_yolo}]
86+
[--iou_thres {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6,doclayout_yolo}]
87+
[--use_cuda] [--use_dml] [-v]
7288

7389
options:
7490
-h, --help show this help message and exit
7591
-img IMG_PATH, --img_path IMG_PATH
7692
Path to image for layout.
77-
-m {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6}, --model_type {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6}
93+
-m {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6,doclayout_yolo}, --model_type {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6,doclayout_yolo}
7894
Support model type
79-
--conf_thres {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6}
95+
--conf_thres {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6,doclayout_yolo}
8096
Box threshold, the range is [0, 1]
81-
--iou_thres {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6}
97+
--iou_thres {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6,doclayout_yolo}
8298
IoU threshold, the range is [0, 1]
8399
--use_cuda Whether to use cuda.
84100
--use_dml Whether to use DirectML, which only works in Windows10+.
85101
-v, --vis Wheter to visualize the layout results.
86102
```
103+
87104
- 示例:
105+
88106
```bash
89-
$ rapid_layout -v -img test_images/layout.png
107+
rapid_layout -v -img test_images/layout.png
90108
```
91109

92-
93110
### GPU推理
111+
94112
- 因为版面分析模型输入图像尺寸固定,故可使用`onnxruntime-gpu`来提速。
95113
- 因为`rapid_layout`库默认依赖是CPU版`onnxruntime`,如果想要使用GPU推理,需要手动安装`onnxruntime-gpu`
96114
- 详细使用和评测可参见[AI Studio](https://aistudio.baidu.com/projectdetail/8094594)
97115

98116
#### 安装
117+
99118
```bash
100119
pip install rapid_layout
101120
pip uninstall onnxruntime
@@ -106,13 +125,14 @@ pip install onnxruntime-gpu
106125
```
107126

108127
#### 使用
128+
109129
```python
110130
import cv2
111131
from rapid_layout import RapidLayout
112132
from pathlib import Path
113133
114134
# 注意:这里需要使用use_cuda指定参数
115-
layout_engine = RapidLayout(conf_thres=0.5, model_type="pp_layout_cdla", use_cuda=True)
135+
layout_engine = RapidLayout(model_type="doclayout_yolo", conf_thres=0.2, use_cuda=True)
116136
117137
# warm up
118138
layout_engine("images/12027_5.png")
@@ -128,15 +148,10 @@ avg_elapse = sum(elapses) / len(elapses)
128148
print(f'avg elapse: {avg_elapse:.4f}')
129149
```
130150
131-
### 可视化结果
132-
133-
<div align="center">
134-
<img src="https://github.com/RapidAI/RapidLayout/releases/download/v0.0.0/layout_res.png" width="80%" height="80%">
135-
</div>
136-
137-
138151
### 参考项目
152+
153+
- [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
139154
- [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/133d67f27dc8a241d6b2e30a9f047a0fb75bebbe/ppstructure/layout/README_ch.md)
140155
- [360LayoutAnalysis](https://github.com/360AILAB-NLP/360LayoutAnalysis)
141156
- [ONNX-YOLOv8-Object-Detection](https://github.com/ibaiGorordo/ONNX-YOLOv8-Object-Detection)
142-
- [ChineseDocumentPDF](https://github.com/SWHL/ChineseDocumentPDF)
157+
- [ChineseDocumentPDF](https://github.com/SWHL/ChineseDocumentPDF)

demo.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@
55

66
from rapid_layout import RapidLayout, VisLayout
77

8-
layout_engine = RapidLayout(model_type="yolov8n_layout_paper")
8+
layout_engine = RapidLayout(model_type="doclayout_yolo", conf_thres=0.1)
99

10-
img_path = "tests/test_files/layout.png"
10+
img_path = "tests/test_files/PMC3576793_00004.jpg"
1111
img = cv2.imread(img_path)
1212

1313
boxes, scores, class_names, elapse = layout_engine(img)

rapid_layout/main.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@
1010
import numpy as np
1111

1212
from .utils import (
13+
DocLayoutPostProcess,
14+
DocLayoutPreProcess,
1315
DownloadModel,
1416
LoadImage,
1517
OrtInferSession,
@@ -33,6 +35,7 @@
3335
"yolov8n_layout_report": f"{ROOT_URL}/yolov8n_layout_report.onnx",
3436
"yolov8n_layout_publaynet": f"{ROOT_URL}/yolov8n_layout_publaynet.onnx",
3537
"yolov8n_layout_general6": f"{ROOT_URL}/yolov8n_layout_general6.onnx",
38+
"doclayout_yolo": f"{ROOT_URL}/doclayout_yolo_docstructbench_imgsz1024.onnx",
3639
}
3740
DEFAULT_MODEL_PATH = str(ROOT_DIR / "models" / "layout_cdla.onnx")
3841

@@ -72,12 +75,20 @@ def __init__(
7275
self.yolov8_preprocess = YOLOv8PreProcess(img_size=self.yolov8_input_shape)
7376
self.yolov8_postprocess = YOLOv8PostProcess(labels, conf_thres, iou_thres)
7477

78+
# doclayout
79+
self.doclayout_input_shape = (1024, 1024)
80+
self.doclayout_preprocess = DocLayoutPreProcess(
81+
img_size=self.doclayout_input_shape
82+
)
83+
self.doclayout_postprocess = DocLayoutPostProcess(labels, conf_thres, iou_thres)
84+
7585
self.load_img = LoadImage()
7686

7787
self.pp_layout_type = [k for k in KEY_TO_MODEL_URL if k.startswith("pp")]
7888
self.yolov8_layout_type = [
7989
k for k in KEY_TO_MODEL_URL if k.startswith("yolov8n")
8090
]
91+
self.doclayout_type = [k for k in KEY_TO_MODEL_URL if k.startswith("doclayout")]
8192

8293
def __call__(
8394
self, img_content: Union[str, np.ndarray, bytes, Path]
@@ -91,6 +102,9 @@ def __call__(
91102
if self.model_type in self.yolov8_layout_type:
92103
return self.yolov8_layout(img, ori_img_shape)
93104

105+
if self.model_type in self.doclayout_type:
106+
return self.doclayout_layout(img, ori_img_shape)
107+
94108
raise ValueError(f"{self.model_type} is not supported.")
95109

96110
def pp_layout(self, img: np.ndarray, ori_img_shape: Tuple[int, int]):
@@ -114,6 +128,17 @@ def yolov8_layout(self, img: np.ndarray, ori_img_shape: Tuple[int, int]):
114128
elapse = time.time() - s_time
115129
return boxes, scores, class_names, elapse
116130

131+
def doclayout_layout(self, img: np.ndarray, ori_img_shape: Tuple[int, int]):
132+
s_time = time.time()
133+
134+
input_tensor = self.doclayout_preprocess(img)
135+
outputs = self.session(input_tensor)
136+
boxes, scores, class_names = self.doclayout_postprocess(
137+
outputs, ori_img_shape, self.doclayout_input_shape
138+
)
139+
elapse = time.time() - s_time
140+
return boxes, scores, class_names, elapse
141+
117142
@staticmethod
118143
def get_model_path(model_type: str, model_path: Union[str, Path, None]) -> str:
119144
if model_path is not None:

rapid_layout/utils/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,6 @@
55
from .infer_engine import OrtInferSession
66
from .load_image import LoadImage, LoadImageError
77
from .logger import get_logger
8-
from .post_prepross import PPPostProcess, YOLOv8PostProcess
9-
from .pre_procss import PPPreProcess, YOLOv8PreProcess
8+
from .post_prepross import DocLayoutPostProcess, PPPostProcess, YOLOv8PostProcess
9+
from .pre_procss import DocLayoutPreProcess, PPPreProcess, YOLOv8PreProcess
1010
from .vis_res import VisLayout

rapid_layout/utils/post_prepross.py

Lines changed: 47 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -288,23 +288,61 @@ def extract_boxes(self, predictions):
288288
boxes = predictions[:, :4]
289289

290290
# Scale boxes to original image dimensions
291-
boxes = self.rescale_boxes(boxes)
291+
boxes = rescale_boxes(
292+
boxes, self.input_width, self.input_height, self.img_width, self.img_height
293+
)
292294

293295
# Convert boxes to xyxy format
294296
boxes = xywh2xyxy(boxes)
295297

296298
return boxes
297299

298-
def rescale_boxes(self, boxes):
300+
301+
class DocLayoutPostProcess:
302+
def __init__(self, labels: List[str], conf_thres=0.7, iou_thres=0.5):
303+
self.labels = labels
304+
self.conf_threshold = conf_thres
305+
self.iou_threshold = iou_thres
306+
self.input_width, self.input_height = None, None
307+
self.img_width, self.img_height = None, None
308+
309+
def __call__(
310+
self,
311+
output,
312+
ori_img_shape: Tuple[int, int],
313+
img_shape: Tuple[int, int] = (1024, 1024),
314+
):
315+
self.img_height, self.img_width = ori_img_shape
316+
self.input_height, self.input_width = img_shape
317+
318+
output = output[0].squeeze()
319+
boxes = output[:, :-2]
320+
confidences = output[:, -2]
321+
class_ids = output[:, -1].astype(int)
322+
323+
mask = confidences > self.conf_threshold
324+
boxes = boxes[mask, :]
325+
confidences = confidences[mask]
326+
class_ids = class_ids[mask]
327+
299328
# Rescale boxes to original image dimensions
300-
input_shape = np.array(
301-
[self.input_width, self.input_height, self.input_width, self.input_height]
302-
)
303-
boxes = np.divide(boxes, input_shape, dtype=np.float32)
304-
boxes *= np.array(
305-
[self.img_width, self.img_height, self.img_width, self.img_height]
329+
boxes = rescale_boxes(
330+
boxes,
331+
self.input_width,
332+
self.input_height,
333+
self.img_width,
334+
self.img_height,
306335
)
307-
return boxes
336+
labels = [self.labels[i] for i in class_ids]
337+
return boxes, confidences, labels
338+
339+
340+
def rescale_boxes(boxes, input_width, input_height, img_width, img_height):
341+
# Rescale boxes to original image dimensions
342+
input_shape = np.array([input_width, input_height, input_width, input_height])
343+
boxes = np.divide(boxes, input_shape, dtype=np.float32)
344+
boxes *= np.array([img_width, img_height, img_width, img_height])
345+
return boxes
308346

309347

310348
def nms(boxes, scores, iou_threshold):

rapid_layout/utils/pre_procss.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,17 @@ def __call__(self, image: np.ndarray) -> np.ndarray:
5151
input_img = input_img.transpose(2, 0, 1)
5252
input_tensor = input_img[np.newaxis, :, :, :].astype(np.float32)
5353
return input_tensor
54+
55+
56+
class DocLayoutPreProcess:
57+
58+
def __init__(self, img_size: Tuple[int, int]):
59+
self.img_size = img_size
60+
61+
def __call__(self, image: np.ndarray) -> np.ndarray:
62+
input_img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
63+
input_img = cv2.resize(image, self.img_size)
64+
input_img = input_img / 255.0
65+
input_img = input_img.transpose(2, 0, 1)
66+
input_tensor = input_img[np.newaxis, :, :, :].astype(np.float32)
67+
return input_tensor

tests/test_layout.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,12 @@
2222

2323

2424
@pytest.mark.parametrize(
25-
"model_type,gt", [("yolov8n_layout_publaynet", 12), ("yolov8n_layout_general6", 13)]
25+
"model_type,gt",
26+
[
27+
("yolov8n_layout_publaynet", 12),
28+
("yolov8n_layout_general6", 13),
29+
("doclayout_yolo", 14),
30+
],
2631
)
2732
def test_yolov8n_layout(model_type, gt):
2833
img_path = test_file_dir / "PMC3576793_00004.jpg"

0 commit comments

Comments
 (0)