1515</div >
1616
1717### 简介
18- 主要是做文档类图像的版面分析。具体来说,就是分析给定的文档类别图像(论文截图、研报等),定位其中类别和位置,如标题、段落、表格和图片等各个部分。
18+
19+ 该项目主要是汇集全网开源的版面分析的项目,具体来说,就是分析给定的文档类别图像(论文截图、研报等),定位其中类别和位置,如标题、段落、表格和图片等各个部分。
1920
2021⚠️注意:需要说明的是,由于不同场景下的版面差异较大,现阶段不存在一个模型可以搞定所有场景。如果实际业务需要,以下模型效果不好的话,建议构建自己的训练集微调。
2122
22- 目前支持以下场景的版面分析 :
23+ 目前支持已经支持的版面分析模型如下 :
2324
2425| ` model_type ` | 版面类型 | 模型名称 | 支持类别|
2526| :------ | :----- | :------ | :----- |
3031| ` yolov8n_layout_report ` | 研报 | ` yolov8n_layout_report.onnx ` | ` ['Text', 'Title', 'Header', 'Footer', 'Figure', 'Table', 'Toc', 'Figure caption', 'Table caption'] ` |
3132| ` yolov8n_layout_publaynet ` | 英文 | ` yolov8n_layout_publaynet.onnx ` | ` ["Text", "Title", "List", "Table", "Figure"] ` |
3233| ` yolov8n_layout_general6 ` | 通用 | ` yolov8n_layout_general6.onnx ` | ` ["Text", "Title", "Figure", "Table", "Caption", "Equation"] ` |
34+ | 🔥` doclayout_yolo ` | 通用 | ` doclayout_yolo_docstructbench_imgsz1024.onnx ` | ` ['title', 'text', 'abandon', 'figure', 'figure_caption', 'table', 'table_caption', 'table_footnote', 'isolate_formula', 'formula_caption'] ` |
3335
3436PP模型来源:[ PaddleOCR 版面分析] ( https://github.com/PaddlePaddle/PaddleOCR/blob/133d67f27dc8a241d6b2e30a9f047a0fb75bebbe/ppstructure/layout/README_ch.md )
3537
3638yolov8n系列来源:[ 360LayoutAnalysis] ( https://github.com/360AILAB-NLP/360LayoutAnalysis )
3739
40+ (推荐使用)🔥doclayout_yolo模型来源:[ DocLayout-YOLO] ( https://github.com/opendatalab/DocLayout-YOLO ) ,该模型是目前最为优秀的开源模型,支持学术论文、Textbook、Financial、Exam Paper、Fuzzy Scans、PPT和Poster 7种文档类型的版面检测。值得一提的是,该模型支持的类别中存在` abandon ` 一类,主要是文档页面的页眉页脚部分,便于后续快速舍弃。
41+
3842模型下载地址为:[ link] ( https://github.com/RapidAI/RapidLayout/releases/tag/v0.0.0 )
3943
4044### 安装
45+
4146由于模型较小,预先将中文版面分析模型(` layout_cdla.onnx ` )打包进了whl包内,如果做中文版面分析,可直接安装使用
4247
4348``` bash
44- $ pip install rapid-layout
49+ pip install rapid-layout
4550```
4651
4752### 使用方式
53+
4854#### python脚本运行
55+
4956``` python
5057import cv2
58+
5159from rapid_layout import RapidLayout, VisLayout
5260
5361# model_type类型参见上表。指定不同model_type时,会自动下载相应模型到安装目录下的。
54- layout_engine = RapidLayout(conf_thres = 0.5 , model_type = " pp_layout_cdla " )
62+ layout_engine = RapidLayout(model_type = " doclayout_yolo " , conf_thres = 0.2 )
5563
56- img = cv2.imread(' test_images/layout.png' )
64+ img_path = " tests/test_files/financial.jpg"
65+ img = cv2.imread(img_path)
5766
5867boxes, scores, class_names, elapse = layout_engine(img)
5968ploted_img = VisLayout.draw_detections(img, boxes, scores, class_names)
6069if ploted_img is not None :
6170 cv2.imwrite(" layout_res.png" , ploted_img)
6271```
6372
73+ ### 可视化结果
74+
75+ <div align =" center " >
76+ <img src="https://github.com/RapidAI/RapidLayout/releases/download/v0.0.0/layout_res.png" width="80%" height="80%">
77+ </div >
78+
6479#### 终端运行
80+
6581``` bash
6682$ rapid_layout -h
6783usage: rapid_layout [-h] -img IMG_PATH
68- [-m {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6}]
69- [--conf_thres {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6}]
70- [--iou_thres {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6}]
71- [--use_cuda] [--use_dml] [-v]
84+ [-m {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6,doclayout_yolo }]
85+ [--conf_thres {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6,doclayout_yolo }]
86+ [--iou_thres {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6,doclayout_yolo }]
87+ [--use_cuda] [--use_dml] [-v]
7288
7389options:
7490 -h, --help show this help message and exit
7591 -img IMG_PATH, --img_path IMG_PATH
7692 Path to image for layout.
77- -m {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6}, --model_type {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6}
93+ -m {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6,doclayout_yolo }, --model_type {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6,doclayout_yolo }
7894 Support model type
79- --conf_thres {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6}
95+ --conf_thres {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6,doclayout_yolo }
8096 Box threshold, the range is [0, 1]
81- --iou_thres {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6}
97+ --iou_thres {pp_layout_cdla,pp_layout_publaynet,pp_layout_table,yolov8n_layout_paper,yolov8n_layout_report,yolov8n_layout_publaynet,yolov8n_layout_general6,doclayout_yolo }
8298 IoU threshold, the range is [0, 1]
8399 --use_cuda Whether to use cuda.
84100 --use_dml Whether to use DirectML, which only works in Windows10+.
85101 -v, --vis Wheter to visualize the layout results.
86102```
103+
87104- 示例:
105+
88106 ``` bash
89- $ rapid_layout -v -img test_images/layout.png
107+ rapid_layout -v -img test_images/layout.png
90108 ```
91109
92-
93110# ## GPU推理
111+
94112- 因为版面分析模型输入图像尺寸固定,故可使用` onnxruntime-gpu` 来提速。
95113- 因为` rapid_layout` 库默认依赖是CPU版` onnxruntime` ,如果想要使用GPU推理,需要手动安装` onnxruntime-gpu` 。
96114- 详细使用和评测可参见[AI Studio](https://aistudio.baidu.com/projectdetail/8094594)
97115
98116# ### 安装
117+
99118` ` ` bash
100119pip install rapid_layout
101120pip uninstall onnxruntime
@@ -106,13 +125,14 @@ pip install onnxruntime-gpu
106125` ` `
107126
108127# ### 使用
128+
109129` ` ` python
110130import cv2
111131from rapid_layout import RapidLayout
112132from pathlib import Path
113133
114134# 注意:这里需要使用use_cuda指定参数
115- layout_engine = RapidLayout(conf_thres=0.5, model_type=" pp_layout_cdla " , use_cuda=True)
135+ layout_engine = RapidLayout(model_type=" doclayout_yolo " , conf_thres=0.2 , use_cuda=True)
116136
117137# warm up
118138layout_engine(" images/12027_5.png" )
@@ -128,15 +148,10 @@ avg_elapse = sum(elapses) / len(elapses)
128148print(f' avg elapse: {avg_elapse:.4f}' )
129149` ` `
130150
131- # ## 可视化结果
132-
133- < div align=" center" >
134- < img src=" https://github.com/RapidAI/RapidLayout/releases/download/v0.0.0/layout_res.png" width=" 80%" height=" 80%" >
135- < /div>
136-
137-
138151# ## 参考项目
152+
153+ - [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
139154- [PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/133d67f27dc8a241d6b2e30a9f047a0fb75bebbe/ppstructure/layout/README_ch.md)
140155- [360LayoutAnalysis](https://github.com/360AILAB-NLP/360LayoutAnalysis)
141156- [ONNX-YOLOv8-Object-Detection](https://github.com/ibaiGorordo/ONNX-YOLOv8-Object-Detection)
142- - [ChineseDocumentPDF](https://github.com/SWHL/ChineseDocumentPDF)
157+ - [ChineseDocumentPDF](https://github.com/SWHL/ChineseDocumentPDF)
0 commit comments