pdf扫描件表格识别不准确,如何优化? #12957
Replies: 7 comments
-
@PureWaterCatt 能提供原始图片吗 |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
已在main分支修复,将会在2.8.0版本发布 |
Beta Was this translation helpful? Give feedback.
-
@GreatV 大大main分支已经适配了吗,我新pull下来测试了一下好像没变化 |
Beta Was this translation helpful? Give feedback.
-
我是这么测试的
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
是的,表格结果没有识别正确,我只关注了文字识别成功了。可能把图往外padding一些效果会好点。 ![]() ![]() |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
问题描述 / Problem Description
版面分析+表格识别,包括尝试单独截下表格来识别,都无法识别到最后一行的数据。
在版面分析的时候,表格也是被完整的table(label)的矩形方框包裹着。
运行环境 / Runtime Environment
Paddle:develop
PaddleOCR:develop
OS: ubuntu 20.04
GCC version: (Ubuntu/Linaro 8.4.0-3ubuntu2) 8.4.0
Clang version: N/A
CMake version: version 3.27.7
Libc version: glibc 2.31
Python version: 3.10.14
复现代码 / Reproduction Code
python predict_table.py --image_dir=../../output/screenshot-20240528-164403.png --det_model_dir=../inference/ch_PP-OCRv4_det_infer --rec_model_dir=../inference/ch_PP-OCRv4_rec_infer --rec_char_dict_path=../../ppocr/utils/ppocr_keys_v1.txt --table_model_dir=../inference/ch_ppstructure_mobile_v2.0_SLANet_infer --table_char_dict_path=../../ppocr/utils/dict/table_structure_dict_ch.txt --output=../../output/table
完整报错 / Complete Error Message
可能解决方案 / Possible solutions
附件 / Appendix
test.zip
Beta Was this translation helpful? Give feedback.
All reactions