|
2 | 2 |
|
3 | 3 | ## 😺 项目介绍 |
4 | 4 |
|
5 | | -**RapidDoc 是一个轻量级、专注于文档解析的开源框架,支持 **OCR、版面分析、公式识别、表格识别和阅读顺序恢复** 等多种功能。** |
| 5 | +**RapidDoc 是一个轻量级、专注于文档解析的开源框架,支持 **OCR、版面分析、公式识别、表格识别和阅读顺序恢复** 等多种功能,支持将复杂 PDF 文档转换为 Markdown、JSON、WORD、HTML 多种格式** |
6 | 6 |
|
7 | 7 | **框架基于 [Mineru](https://github.com/opendatalab/MinerU) 二次开发,移除 VLM,专注于 Pipeline 产线下的高效文档解析,在 CPU 上也能保持不错的解析速度。** |
8 | 8 |
|
|
24 | 24 | - CPU 下默认使用 OpenVINO,GPU 下默认使用 torch |
25 | 25 |
|
26 | 26 | - **版面识别** |
27 | | - - 模型使用 `PP-DocLayout` 系列 ONNX 模型(plus-L、L、M、S) |
28 | | - - **PP-DocLayout_plus-L**:效果最好,速度稍慢,默认使用 |
| 27 | + - 模型使用 `PP-DocLayout` 系列 ONNX 模型(v2、plus-L、L、M、S) |
| 28 | + - **PP-DocLayoutV2**:PaddleOCR-VL使用的版面模型,自带阅读顺序 |
| 29 | + - **PP-DocLayout_plus-L**:效果好运行稳定,默认使用 |
29 | 30 | - **PP-DocLayout-L**:速度快,效果也不错 |
30 | 31 | - **PP-DocLayout-S**:速度极快,存在部分漏检 |
31 | 32 |
|
|
51 | 52 | - 除了 OCR 和 PP-DocLayout-M/S 模型,OpenVINO推理会报错,暂时难以解决。[PaddleOCR/issues/16277](https://github.com/PaddlePaddle/PaddleOCR/issues/16277) |
52 | 53 | --- |
53 | 54 |
|
| 55 | +## 基准测试结果 |
| 56 | + |
| 57 | +### 1. OmniDocBench |
| 58 | + |
| 59 | +以下是RapidDoc在 OmniDocBench 上的评估结果。Pipeline 模型使用 PP-DocLayout_plus-L、PP-OCRv5-mobile、PP-FormulaNet_plus-M、UNET_SLANET_PLUS。 |
| 60 | +<table style="width:100%; border-collapse: collapse;"> |
| 61 | + <caption>Comprehensive evaluation of document parsing on OmniDocBench (v1.5)</caption> |
| 62 | + <thead> |
| 63 | + <tr> |
| 64 | + <th>Model Type</th> |
| 65 | + <th>Methods</th> |
| 66 | + <th>Size</th> |
| 67 | + <th>Overall↑</th> |
| 68 | + <th>Text<sup>Edit</sup>↓</th> |
| 69 | + <th>Formula<sup>CDM</sup>↑</th> |
| 70 | + <th>Table<sup>TEDS</sup>↑</th> |
| 71 | + <th>Table<sup>TEDS-S</sup>↑</th> |
| 72 | + <th>Read Order<sup>Edit</sup>↓</th> |
| 73 | + </tr> |
| 74 | + </thead> |
| 75 | + <tbody> |
| 76 | + <tr> |
| 77 | + <td rowspan="16"><strong>Specialized</strong><br><strong>VLMs</strong></td> |
| 78 | + <td>PaddleOCR-VL</td> |
| 79 | + <td>0.9B</td> |
| 80 | + <td><strong>92.86</strong></td> |
| 81 | + <td><strong>0.035</strong></td> |
| 82 | + <td><strong>91.22</strong></td> |
| 83 | + <td><strong>90.89</strong></td> |
| 84 | + <td><strong>94.76</strong></td> |
| 85 | + <td><strong>0.043</strong></td> |
| 86 | + </tr> |
| 87 | + <td>MinerU2.5</td> |
| 88 | + <td>1.2B</td> |
| 89 | + <td><ins>90.67</ins></td> |
| 90 | + <td><ins>0.047</ins></td> |
| 91 | + <td><ins>88.46</ins></td> |
| 92 | + <td><ins>88.22</ins></td> |
| 93 | + <td><ins>92.38</ins></td> |
| 94 | + <td><ins>0.044</ins></td> |
| 95 | + </tr> |
| 96 | + <tr> |
| 97 | + <td>MonkeyOCR-pro-3B</td> |
| 98 | + <td>3B</td> |
| 99 | + <td>88.85</td> |
| 100 | + <td>0.075</td> |
| 101 | + <td>87.25</td> |
| 102 | + <td>86.78</td> |
| 103 | + <td>90.63</td> |
| 104 | + <td>0.128</td> |
| 105 | + </tr> |
| 106 | + <tr> |
| 107 | + <td>OCRVerse</td> |
| 108 | + <td>4B</td> |
| 109 | + <td>88.56</td> |
| 110 | + <td>0.058</td> |
| 111 | + <td>86.91</td> |
| 112 | + <td>84.55</td> |
| 113 | + <td>88.45</td> |
| 114 | + <td>0.071</td> |
| 115 | + </tr> |
| 116 | + <tr> |
| 117 | + <td>dots.ocr</td> |
| 118 | + <td>3B</td> |
| 119 | + <td>88.41</td> |
| 120 | + <td>0.048</td> |
| 121 | + <td>83.22</td> |
| 122 | + <td>86.78</td> |
| 123 | + <td>90.62</td> |
| 124 | + <td>0.053</td> |
| 125 | + </tr> |
| 126 | + <tr> |
| 127 | + <td>MonkeyOCR-3B</td> |
| 128 | + <td>3B</td> |
| 129 | + <td>87.13</td> |
| 130 | + <td>0.075</td> |
| 131 | + <td>87.45</td> |
| 132 | + <td>81.39</td> |
| 133 | + <td>85.92</td> |
| 134 | + <td>0.129</td> |
| 135 | + </tr> |
| 136 | + <tr> |
| 137 | + <td>Deepseek-OCR</td> |
| 138 | + <td>3B</td> |
| 139 | + <td>87.01</td> |
| 140 | + <td>0.073</td> |
| 141 | + <td>83.37</td> |
| 142 | + <td>84.97</td> |
| 143 | + <td>88.80</td> |
| 144 | + <td>0.086</td> |
| 145 | + </tr> |
| 146 | + <tr> |
| 147 | + <td>MonkeyOCR-pro-1.2B</td> |
| 148 | + <td>1.2B</td> |
| 149 | + <td>86.96</td> |
| 150 | + <td>0.084</td> |
| 151 | + <td>85.02</td> |
| 152 | + <td>84.24</td> |
| 153 | + <td>89.02</td> |
| 154 | + <td>0.130</td> |
| 155 | + </tr> |
| 156 | + <tr> |
| 157 | + <td>Nanonets-OCR-s</td> |
| 158 | + <td>3B</td> |
| 159 | + <td>85.59</td> |
| 160 | + <td>0.093</td> |
| 161 | + <td>85.90</td> |
| 162 | + <td>80.14</td> |
| 163 | + <td>85.57</td> |
| 164 | + <td>0.108</td> |
| 165 | + </tr> |
| 166 | + <tr> |
| 167 | + <td>MinerU2-VLM</td> |
| 168 | + <td>0.9B</td> |
| 169 | + <td>85.56</td> |
| 170 | + <td>0.078</td> |
| 171 | + <td>80.95</td> |
| 172 | + <td>83.54</td> |
| 173 | + <td>87.66</td> |
| 174 | + <td>0.086</td> |
| 175 | + </tr> |
| 176 | + <tr> |
| 177 | + <td>olmOCR</td> |
| 178 | + <td>7B</td> |
| 179 | + <td>81.79</td> |
| 180 | + <td>0.096</td> |
| 181 | + <td>86.04</td> |
| 182 | + <td>68.92</td> |
| 183 | + <td>74.77</td> |
| 184 | + <td>0.121</td> |
| 185 | + </tr> |
| 186 | + <tr> |
| 187 | + <td>Dolphin-1.5</td> |
| 188 | + <td>0.3B</td> |
| 189 | + <td>83.21</td> |
| 190 | + <td>0.092</td> |
| 191 | + <td>80.78</td> |
| 192 | + <td>78.06</td> |
| 193 | + <td>84.10</td> |
| 194 | + <td>0.080</td> |
| 195 | + </tr> |
| 196 | + <tr> |
| 197 | + <td>POINTS-Reader</td> |
| 198 | + <td>3B</td> |
| 199 | + <td>80.98</td> |
| 200 | + <td>0.134</td> |
| 201 | + <td>79.20</td> |
| 202 | + <td>77.13</td> |
| 203 | + <td>81.66</td> |
| 204 | + <td>0.145</td> |
| 205 | + </tr> |
| 206 | + <tr> |
| 207 | + <td>Mistral OCR</td> |
| 208 | + <td>-</td> |
| 209 | + <td>78.83</td> |
| 210 | + <td>0.164</td> |
| 211 | + <td>82.84</td> |
| 212 | + <td>70.03</td> |
| 213 | + <td>78.04</td> |
| 214 | + <td>0.144</td> |
| 215 | + </tr> |
| 216 | + <tr> |
| 217 | + <td>OCRFlux</td> |
| 218 | + <td>3B</td> |
| 219 | + <td>74.82</td> |
| 220 | + <td>0.193</td> |
| 221 | + <td>68.03</td> |
| 222 | + <td>75.75</td> |
| 223 | + <td>80.23</td> |
| 224 | + <td>0.202</td> |
| 225 | + </tr> |
| 226 | + <tr> |
| 227 | + <td>Dolphin</td> |
| 228 | + <td>0.3B</td> |
| 229 | + <td>74.67</td> |
| 230 | + <td>0.125</td> |
| 231 | + <td>67.85</td> |
| 232 | + <td>68.70</td> |
| 233 | + <td>77.77</td> |
| 234 | + <td>0.124</td> |
| 235 | + </tr> |
| 236 | + <tr> |
| 237 | + <td rowspan="6"><strong>General</strong><br><strong>VLMs</strong></td> |
| 238 | + <td>Qwen3-VL-235B-A22B-Instruct</td> |
| 239 | + <td>235B</td> |
| 240 | + <td>89.15</td> |
| 241 | + <td>0.069</td> |
| 242 | + <td>88.14</td> |
| 243 | + <td>86.21</td> |
| 244 | + <td>90.55</td> |
| 245 | + <td>0.068</td> |
| 246 | + </tr> |
| 247 | + <td>Gemini-2.5 Pro</td> |
| 248 | + <td>-</td> |
| 249 | + <td>88.03</td> |
| 250 | + <td>0.075</td> |
| 251 | + <td>85.82</td> |
| 252 | + <td>85.71</td> |
| 253 | + <td>90.29</td> |
| 254 | + <td>0.097</td> |
| 255 | + </tr> |
| 256 | + <tr> |
| 257 | + <td>Qwen2.5-VL</td> |
| 258 | + <td>72B</td> |
| 259 | + <td>87.02</td> |
| 260 | + <td>0.094</td> |
| 261 | + <td>88.27</td> |
| 262 | + <td>82.15</td> |
| 263 | + <td>86.22</td> |
| 264 | + <td>0.102</td> |
| 265 | + </tr> |
| 266 | + <tr> |
| 267 | + <td>InternVL3.5</td> |
| 268 | + <td>241B</td> |
| 269 | + <td>82.67</td> |
| 270 | + <td>0.142</td> |
| 271 | + <td>87.23</td> |
| 272 | + <td>75.00</td> |
| 273 | + <td>81.28</td> |
| 274 | + <td>0.125</td> |
| 275 | + </tr> |
| 276 | + <tr> |
| 277 | + <td>InternVL3</td> |
| 278 | + <td>78B</td> |
| 279 | + <td>80.33</td> |
| 280 | + <td>0.131</td> |
| 281 | + <td>83.42</td> |
| 282 | + <td>70.64</td> |
| 283 | + <td>77.74</td> |
| 284 | + <td>0.113</td> |
| 285 | + </tr> |
| 286 | + <tr> |
| 287 | + <td>GPT-4o</td> |
| 288 | + <td>-</td> |
| 289 | + <td>75.02</td> |
| 290 | + <td>0.217</td> |
| 291 | + <td>79.70</td> |
| 292 | + <td>67.07</td> |
| 293 | + <td>76.09</td> |
| 294 | + <td>0.148</td> |
| 295 | + </tr> |
| 296 | + <tr> |
| 297 | + <td rowspan="4"><strong>Pipeline</strong><br><strong>Tools</strong></td> |
| 298 | + <td>PP-StructureV3</td> |
| 299 | + <td>-</td> |
| 300 | + <td>86.73</td> |
| 301 | + <td>0.073</td> |
| 302 | + <td>85.79</td> |
| 303 | + <td>81.68</td> |
| 304 | + <td>89.48</td> |
| 305 | + <td>0.073</td> |
| 306 | + </tr> |
| 307 | + <tr> |
| 308 | + <td><strong>RapidDoc</strong></td> |
| 309 | + <td>-</td> |
| 310 | + <td>85.25</td> |
| 311 | + <td>0.085</td> |
| 312 | + <td>85.19</td> |
| 313 | + <td>79.07</td> |
| 314 | + <td>86.35</td> |
| 315 | + <td>0.114</td> |
| 316 | + </tr> |
| 317 | + <tr> |
| 318 | + <td>Mineru2-pipeline</td> |
| 319 | + <td>-</td> |
| 320 | + <td>75.51</td> |
| 321 | + <td>0.209</td> |
| 322 | + <td>76.55</td> |
| 323 | + <td>70.90</td> |
| 324 | + <td>79.11</td> |
| 325 | + <td>0.225</td> |
| 326 | + </tr> |
| 327 | + <tr> |
| 328 | + <td>Marker-1.8.2</td> |
| 329 | + <td>-</td> |
| 330 | + <td>71.30</td> |
| 331 | + <td>0.206</td> |
| 332 | + <td>76.66</td> |
| 333 | + <td>57.88</td> |
| 334 | + <td>71.17</td> |
| 335 | + <td>0.250</td> |
| 336 | + </tr> |
| 337 | + </tbody> |
| 338 | +</table> |
| 339 | + |
54 | 340 | ## 🛠️ 安装RapidDoc |
55 | 341 |
|
56 | 342 | #### 使用pip安装 |
@@ -126,14 +412,14 @@ RapidDoc提供了便捷的docker部署方式,这有助于快速搭建环境并 |
126 | 412 | - [x] 文本型pdf,使用pypdfium2提取文本框bbox |
127 | 413 | - [x] 文本型pdf,支持0/90/270度三个方向的表格解析 |
128 | 414 | - [x] 文本型pdf,使用pypdfium2提取原始图片(默认截图会导致清晰度降低和图片边界可能丢失部分) |
129 | | -- [x] 表格内公式提取 |
130 | | -- [x] 表格内图片提取 |
| 415 | +- [x] 表格内公式提取,表格内图片提取 |
131 | 416 | - [x] 优化阅读顺序,支持多栏、竖排等复杂版面恢复 |
132 | 417 | - [x] 公式支持torch推理,可用GPU加速 |
133 | | -- [x] 表格支持openvino |
134 | | -- [ ] 版面支持openvino |
| 418 | +- [x] 版面、表格模型支持openvino |
| 419 | +- [x] markdown转docx、html |
| 420 | +- [x] 支持 PP-DocLayoutV2 版面识别+阅读顺序 |
| 421 | +- [x] OmniDocBench评测 |
135 | 422 | - [ ] 公式支持openvino |
136 | | -- [ ] 支持 PP-DocLayoutV2 版面识别+阅读顺序 |
137 | 423 |
|
138 | 424 |
|
139 | 425 | ## 🙏 致谢 |
|
0 commit comments