Skip to content
Discussion options

You must be logged in to vote

在 VLM 和 Hybrid 模式下,目前确实无法关闭表格识别功能。这是设计上的限制:

解决方案:

  1. 使用 Pipeline 模式(推荐)- 只有 Pipeline 模式支持完全禁用表格识别:

    export MINERU_TABLE_ENABLE=false
    mineru input.pdf --backend pipeline

    注意:环境变量必须在导入 MinerU 模块之前设置

  2. 后处理过滤表格 - 如果必须使用 VLM/Hybrid 模式,可以在获取结果后过滤掉表格块:

    for page in middle_json.get("pdf_info", []):
        page["preproc_blocks"] = [
            block for block in page.get("preproc_blocks", [])
            if block.get("type") != "table"
        ]

关于"只有 html 没有 content",这是设计行为,表格内容通过 HTML 格式存储。

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose

Replies: 2 comments 10 replies

Comment options

You must be logged in to vote
0 replies
Answer selected by yinghao-xue
Comment options

You must be logged in to vote
10 replies
@yinghao-xue
Comment options

@dosubot
Comment options

@yinghao-xue
Comment options

@dosubot
Comment options

@yinghao-xue
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant