pdf表格识别不准确
#4551
Replies: 1 comment 14 replies
-
|
你好!根据代码分析, 建议你检查以下几点可能导致结果差异的因素:
另外,如果你的表格比较复杂(如含有合并单元格、复杂表头),VLM backend 存在一些已知的空间校正限制。可以考虑使用 能否提供具体的表格解析错误示例(比如两种方式的输出对比)?这样更容易定位问题。 To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
14 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
我用官方的例子,用python命令直接调用是正确的:
if backend.startswith("vlm-"):
backend = backend[4:]
然后我自己在程序里面精简并集成了用的是这个程序:
file_name = str(Path(path).stem)
pdf_bytes = read_fn(path)
new_pdf_bytes = convert_pdf_bytes_to_bytes_by_pypdfium2(pdf_bytes, start_page_id, end_page_id)
backend = backend[4:]
结果发现用命令执行的那个,表格解析是正确的,而我程序里面集成的代码,却解析的结构是错误的,同样的模型镜像
Beta Was this translation helpful? Give feedback.
All reactions