怎么将minerU.do_parse解析得到的bbox转成 fitz 库的bbox？ #4624

1571859588 · 2026-03-16T05:23:17Z

1571859588
Mar 16, 2026

我发现minerU 的工作流程： PDF → pypdfium2渲染 → 完整图像 → YOLO检测 → 图像坐标
导致最终的bbox与直接对pdf使用 fitz 库得到的bbox完全不一样
请问有没有能根据最终的bbox通过一些方法反推回原本的pdf的fitz库的bbox的方法？使minerU输出的bbox能够映射回原来的fitz库的bbox坐标

@1571859588 · 2026-03-16T05:34:02Z

dosubot[bot]
bot Mar 16, 2026

Hi @1571859588! I'm Dosu and I’m helping the MinerU team.

MinerU 的 bbox 是基于图像像素坐标（原点在左上角），而 fitz 使用 PDF 点坐标（原点在左下角），所以需要进行两步转换：

转换公式：

def mineru_bbox_to_fitz(mineru_bbox, page_height, dpi=200):
    """
    将 MinerU bbox 转换为 fitz bbox
    
    Args:
        mineru_bbox: (x0, y0, x1, y1) MinerU 输出的像素坐标
        page_height: PDF 页面高度（点），可通过 fitz 获取
        dpi: MinerU 渲染使用的 DPI（默认 200）
    """
    scale = 72 / dpi  # 像素 → PDF 点
    
    # 1. DPI 缩放
    x0 = mineru_bbox[0] * scale
    y0 = mineru_bbox[1] * scale
    x1 = mineru_bbox[2] * scale
    y1 = mineru_bbox[3] * scale
    
    # 2. Y 轴翻转（MinerU 原点在左上，fitz 原点在左下）
    fitz_y0 = page_height - y1
    fitz_y1 = page_height - y0
    
    return (x0, fitz_y0, x1, fitz_y1)

关键点：

MinerU 默认使用 dpi=200 渲染，所以 scale = 72/200 ≈ 0.36
如果你使用了不同的 DPI 设置，需要相应调整
可以从 MinerU 内部的 page_to_image 返回的 scale 值来获取实际的缩放因子

注意事项：
MinerU 在 __fix_axis 方法中实际上已经将坐标除以 scale 转换回了 PDF 坐标空间，所以如果你使用的是 middle.json 中的坐标，可能已经是 PDF 点坐标了，只需要做 Y 轴翻转即可。建议先检查一下你获取的 bbox 的坐标范围来判断是哪种坐标系。

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

怎么将minerU.do_parse解析得到的bbox转成 fitz 库的bbox？ #4624

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

怎么将minerU.do_parse解析得到的bbox转成 fitz 库的bbox？ #4624

Uh oh!

1571859588 Mar 16, 2026

Replies: 1 comment

Uh oh!

dosubot[bot] bot Mar 16, 2026

1571859588
Mar 16, 2026

dosubot[bot]
bot Mar 16, 2026