Skip to content

Commit f08b7b9

Browse files
committed
v0.6.0
1 parent 2c9db83 commit f08b7b9

File tree

7 files changed

+302
-16
lines changed

7 files changed

+302
-16
lines changed

README.md

Lines changed: 294 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## 😺 项目介绍
44

5-
**RapidDoc 是一个轻量级、专注于文档解析的开源框架,支持 **OCR、版面分析、公式识别、表格识别和阅读顺序恢复** 等多种功能**
5+
**RapidDoc 是一个轻量级、专注于文档解析的开源框架,支持 **OCR、版面分析、公式识别、表格识别和阅读顺序恢复** 等多种功能,支持将复杂 PDF 文档转换为 Markdown、JSON、WORD、HTML 多种格式**
66

77
**框架基于 [Mineru](https://github.com/opendatalab/MinerU) 二次开发,移除 VLM,专注于 Pipeline 产线下的高效文档解析,在 CPU 上也能保持不错的解析速度。**
88

@@ -24,8 +24,9 @@
2424
- CPU 下默认使用 OpenVINO,GPU 下默认使用 torch
2525

2626
- **版面识别**
27-
- 模型使用 `PP-DocLayout` 系列 ONNX 模型(plus-L、L、M、S)
28-
- **PP-DocLayout_plus-L**:效果最好,速度稍慢,默认使用
27+
- 模型使用 `PP-DocLayout` 系列 ONNX 模型(v2、plus-L、L、M、S)
28+
- **PP-DocLayoutV2**:PaddleOCR-VL使用的版面模型,自带阅读顺序
29+
- **PP-DocLayout_plus-L**:效果好运行稳定,默认使用
2930
- **PP-DocLayout-L**:速度快,效果也不错
3031
- **PP-DocLayout-S**:速度极快,存在部分漏检
3132

@@ -51,6 +52,291 @@
5152
- 除了 OCR 和 PP-DocLayout-M/S 模型,OpenVINO推理会报错,暂时难以解决。[PaddleOCR/issues/16277](https://github.com/PaddlePaddle/PaddleOCR/issues/16277)
5253
---
5354

55+
## 基准测试结果
56+
57+
### 1. OmniDocBench
58+
59+
以下是RapidDoc在 OmniDocBench 上的评估结果。Pipeline 模型使用 PP-DocLayout_plus-L、PP-OCRv5-mobile、PP-FormulaNet_plus-M、UNET_SLANET_PLUS。
60+
<table style="width:100%; border-collapse: collapse;">
61+
<caption>Comprehensive evaluation of document parsing on OmniDocBench (v1.5)</caption>
62+
<thead>
63+
<tr>
64+
<th>Model Type</th>
65+
<th>Methods</th>
66+
<th>Size</th>
67+
<th>Overall&#x2191;</th>
68+
<th>Text<sup>Edit</sup>&#x2193;</th>
69+
<th>Formula<sup>CDM</sup>&#x2191;</th>
70+
<th>Table<sup>TEDS</sup>&#x2191;</th>
71+
<th>Table<sup>TEDS-S</sup>&#x2191;</th>
72+
<th>Read Order<sup>Edit</sup>&#x2193;</th>
73+
</tr>
74+
</thead>
75+
<tbody>
76+
<tr>
77+
<td rowspan="16"><strong>Specialized</strong><br><strong>VLMs</strong></td>
78+
<td>PaddleOCR-VL</td>
79+
<td>0.9B</td>
80+
<td><strong>92.86</strong></td>
81+
<td><strong>0.035</strong></td>
82+
<td><strong>91.22</strong></td>
83+
<td><strong>90.89</strong></td>
84+
<td><strong>94.76</strong></td>
85+
<td><strong>0.043</strong></td>
86+
</tr>
87+
<td>MinerU2.5</td>
88+
<td>1.2B</td>
89+
<td><ins>90.67</ins></td>
90+
<td><ins>0.047</ins></td>
91+
<td><ins>88.46</ins></td>
92+
<td><ins>88.22</ins></td>
93+
<td><ins>92.38</ins></td>
94+
<td><ins>0.044</ins></td>
95+
</tr>
96+
<tr>
97+
<td>MonkeyOCR-pro-3B</td>
98+
<td>3B</td>
99+
<td>88.85</td>
100+
<td>0.075</td>
101+
<td>87.25</td>
102+
<td>86.78</td>
103+
<td>90.63</td>
104+
<td>0.128</td>
105+
</tr>
106+
<tr>
107+
<td>OCRVerse</td>
108+
<td>4B</td>
109+
<td>88.56</td>
110+
<td>0.058</td>
111+
<td>86.91</td>
112+
<td>84.55</td>
113+
<td>88.45</td>
114+
<td>0.071</td>
115+
</tr>
116+
<tr>
117+
<td>dots.ocr</td>
118+
<td>3B</td>
119+
<td>88.41</td>
120+
<td>0.048</td>
121+
<td>83.22</td>
122+
<td>86.78</td>
123+
<td>90.62</td>
124+
<td>0.053</td>
125+
</tr>
126+
<tr>
127+
<td>MonkeyOCR-3B</td>
128+
<td>3B</td>
129+
<td>87.13</td>
130+
<td>0.075</td>
131+
<td>87.45</td>
132+
<td>81.39</td>
133+
<td>85.92</td>
134+
<td>0.129</td>
135+
</tr>
136+
<tr>
137+
<td>Deepseek-OCR</td>
138+
<td>3B</td>
139+
<td>87.01</td>
140+
<td>0.073</td>
141+
<td>83.37</td>
142+
<td>84.97</td>
143+
<td>88.80</td>
144+
<td>0.086</td>
145+
</tr>
146+
<tr>
147+
<td>MonkeyOCR-pro-1.2B</td>
148+
<td>1.2B</td>
149+
<td>86.96</td>
150+
<td>0.084</td>
151+
<td>85.02</td>
152+
<td>84.24</td>
153+
<td>89.02</td>
154+
<td>0.130</td>
155+
</tr>
156+
<tr>
157+
<td>Nanonets-OCR-s</td>
158+
<td>3B</td>
159+
<td>85.59</td>
160+
<td>0.093</td>
161+
<td>85.90</td>
162+
<td>80.14</td>
163+
<td>85.57</td>
164+
<td>0.108</td>
165+
</tr>
166+
<tr>
167+
<td>MinerU2-VLM</td>
168+
<td>0.9B</td>
169+
<td>85.56</td>
170+
<td>0.078</td>
171+
<td>80.95</td>
172+
<td>83.54</td>
173+
<td>87.66</td>
174+
<td>0.086</td>
175+
</tr>
176+
<tr>
177+
<td>olmOCR</td>
178+
<td>7B</td>
179+
<td>81.79</td>
180+
<td>0.096</td>
181+
<td>86.04</td>
182+
<td>68.92</td>
183+
<td>74.77</td>
184+
<td>0.121</td>
185+
</tr>
186+
<tr>
187+
<td>Dolphin-1.5</td>
188+
<td>0.3B</td>
189+
<td>83.21</td>
190+
<td>0.092</td>
191+
<td>80.78</td>
192+
<td>78.06</td>
193+
<td>84.10</td>
194+
<td>0.080</td>
195+
</tr>
196+
<tr>
197+
<td>POINTS-Reader</td>
198+
<td>3B</td>
199+
<td>80.98</td>
200+
<td>0.134</td>
201+
<td>79.20</td>
202+
<td>77.13</td>
203+
<td>81.66</td>
204+
<td>0.145</td>
205+
</tr>
206+
<tr>
207+
<td>Mistral OCR</td>
208+
<td>-</td>
209+
<td>78.83</td>
210+
<td>0.164</td>
211+
<td>82.84</td>
212+
<td>70.03</td>
213+
<td>78.04</td>
214+
<td>0.144</td>
215+
</tr>
216+
<tr>
217+
<td>OCRFlux</td>
218+
<td>3B</td>
219+
<td>74.82</td>
220+
<td>0.193</td>
221+
<td>68.03</td>
222+
<td>75.75</td>
223+
<td>80.23</td>
224+
<td>0.202</td>
225+
</tr>
226+
<tr>
227+
<td>Dolphin</td>
228+
<td>0.3B</td>
229+
<td>74.67</td>
230+
<td>0.125</td>
231+
<td>67.85</td>
232+
<td>68.70</td>
233+
<td>77.77</td>
234+
<td>0.124</td>
235+
</tr>
236+
<tr>
237+
<td rowspan="6"><strong>General</strong><br><strong>VLMs</strong></td>
238+
<td>Qwen3-VL-235B-A22B-Instruct</td>
239+
<td>235B</td>
240+
<td>89.15</td>
241+
<td>0.069</td>
242+
<td>88.14</td>
243+
<td>86.21</td>
244+
<td>90.55</td>
245+
<td>0.068</td>
246+
</tr>
247+
<td>Gemini-2.5 Pro</td>
248+
<td>-</td>
249+
<td>88.03</td>
250+
<td>0.075</td>
251+
<td>85.82</td>
252+
<td>85.71</td>
253+
<td>90.29</td>
254+
<td>0.097</td>
255+
</tr>
256+
<tr>
257+
<td>Qwen2.5-VL</td>
258+
<td>72B</td>
259+
<td>87.02</td>
260+
<td>0.094</td>
261+
<td>88.27</td>
262+
<td>82.15</td>
263+
<td>86.22</td>
264+
<td>0.102</td>
265+
</tr>
266+
<tr>
267+
<td>InternVL3.5</td>
268+
<td>241B</td>
269+
<td>82.67</td>
270+
<td>0.142</td>
271+
<td>87.23</td>
272+
<td>75.00</td>
273+
<td>81.28</td>
274+
<td>0.125</td>
275+
</tr>
276+
<tr>
277+
<td>InternVL3</td>
278+
<td>78B</td>
279+
<td>80.33</td>
280+
<td>0.131</td>
281+
<td>83.42</td>
282+
<td>70.64</td>
283+
<td>77.74</td>
284+
<td>0.113</td>
285+
</tr>
286+
<tr>
287+
<td>GPT-4o</td>
288+
<td>-</td>
289+
<td>75.02</td>
290+
<td>0.217</td>
291+
<td>79.70</td>
292+
<td>67.07</td>
293+
<td>76.09</td>
294+
<td>0.148</td>
295+
</tr>
296+
<tr>
297+
<td rowspan="4"><strong>Pipeline</strong><br><strong>Tools</strong></td>
298+
<td>PP-StructureV3</td>
299+
<td>-</td>
300+
<td>86.73</td>
301+
<td>0.073</td>
302+
<td>85.79</td>
303+
<td>81.68</td>
304+
<td>89.48</td>
305+
<td>0.073</td>
306+
</tr>
307+
<tr>
308+
<td><strong>RapidDoc</strong></td>
309+
<td>-</td>
310+
<td>85.25</td>
311+
<td>0.085</td>
312+
<td>85.19</td>
313+
<td>79.07</td>
314+
<td>86.35</td>
315+
<td>0.114</td>
316+
</tr>
317+
<tr>
318+
<td>Mineru2-pipeline</td>
319+
<td>-</td>
320+
<td>75.51</td>
321+
<td>0.209</td>
322+
<td>76.55</td>
323+
<td>70.90</td>
324+
<td>79.11</td>
325+
<td>0.225</td>
326+
</tr>
327+
<tr>
328+
<td>Marker-1.8.2</td>
329+
<td>-</td>
330+
<td>71.30</td>
331+
<td>0.206</td>
332+
<td>76.66</td>
333+
<td>57.88</td>
334+
<td>71.17</td>
335+
<td>0.250</td>
336+
</tr>
337+
</tbody>
338+
</table>
339+
54340
## 🛠️ 安装RapidDoc
55341

56342
#### 使用pip安装
@@ -126,14 +412,14 @@ RapidDoc提供了便捷的docker部署方式,这有助于快速搭建环境并
126412
- [x] 文本型pdf,使用pypdfium2提取文本框bbox
127413
- [x] 文本型pdf,支持0/90/270度三个方向的表格解析
128414
- [x] 文本型pdf,使用pypdfium2提取原始图片(默认截图会导致清晰度降低和图片边界可能丢失部分)
129-
- [x] 表格内公式提取
130-
- [x] 表格内图片提取
415+
- [x] 表格内公式提取,表格内图片提取
131416
- [x] 优化阅读顺序,支持多栏、竖排等复杂版面恢复
132417
- [x] 公式支持torch推理,可用GPU加速
133-
- [x] 表格支持openvino
134-
- [ ] 版面支持openvino
418+
- [x] 版面、表格模型支持openvino
419+
- [x] markdown转docx、html
420+
- [x] 支持 PP-DocLayoutV2 版面识别+阅读顺序
421+
- [x] OmniDocBench评测
135422
- [ ] 公式支持openvino
136-
- [ ] 支持 PP-DocLayoutV2 版面识别+阅读顺序
137423

138424

139425
## 🙏 致谢

docker/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ RUN apt-get update && \
2121
WORKDIR /app
2222

2323
# 安装 Python 依赖
24-
RUN python3 -m pip install 'rapid-doc[cpu]==0.5.1' -i https://pypi.org/simple --break-system-packages && \
25-
python3 -m pip install 'rapid-doc[api]==0.5.1' -i https://pypi.org/simple --break-system-packages && \
24+
RUN python3 -m pip install 'rapid-doc[cpu]==0.6.0' -i https://pypi.org/simple --break-system-packages && \
25+
python3 -m pip install 'rapid-doc[api]==0.6.0' -i https://pypi.org/simple --break-system-packages && \
2626
python3 -m pip cache purge
2727

2828
# 复制配置文件和脚本(优先复制,利用Docker缓存)

docker/DockerfileGPU

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ WORKDIR /app
2929

3030
# 安装 Python 依赖
3131
RUN python3 -m pip install --upgrade pip setuptools wheel && \
32-
python3 -m pip install 'rapid-doc[gpu]==0.5.1' 'rapid-doc[api]==0.5.1' -i https://pypi.org/simple && \
32+
python3 -m pip install 'rapid-doc[gpu]==0.6.0' 'rapid-doc[api]==0.6.0' -i https://pypi.org/simple && \
3333
python3 -m pip install 'onnxruntime-gpu==1.23.0' -i https://pypi.org/simple && \
3434
python3 -m pip cache purge
3535

docker/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@
1212
cd docker
1313

1414
# 1. CPU 模式
15-
docker build -f Dockerfile -t hzkitty/rapid-doc:0.5.1 .
15+
docker build -f Dockerfile -t hzkitty/rapid-doc:0.6.0 .
1616

1717
# 2. GPU 模式
18-
docker build -f DockerfileGPU -t hzkitty/rapid-doc:0.5.1-gpu .
18+
docker build -f DockerfileGPU -t hzkitty/rapid-doc:0.6.0-gpu .
1919
```
2020

2121

docker/docker-compose-gpu.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
services:
22
rapid-doc-server:
33
container_name: rapid-doc-server
4-
image: hzkitty/rapid-doc:0.5.1-gpu
4+
image: hzkitty/rapid-doc:0.6.0-gpu
55
deploy:
66
resources:
77
reservations:

docker/docker-compose.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
services:
22
rapid-doc-server:
33
container_name: rapid-doc-server
4-
image: hzkitty/rapid-doc:0.5.1
4+
image: hzkitty/rapid-doc:0.6.0
55
ports:
66
- "8888:8888"
77
environment:

rapid_doc/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
__version__ = "0.5.1"
1+
__version__ = "0.6.0"
22
__mineru_version__ = "2.6.4"

0 commit comments

Comments
 (0)