Skip to content

OCR Coordinates do not match #4068

@xiaolibuzai-ovo

Description

@xiaolibuzai-ovo

Description of the bug

I used another OCR to recognize the content coordinates of the PDF, and then I used the PyMuPDF library. I hope to extract the coordinates of a specified area, but there is a significant difference between the two sets of coordinates.

These are the coordinates recognized by the other OCR:
{
"text": "Vue Mastery",
"bbox": [
586.0,
178.0,
1250.0,
296.0
],
"type": "ocr",
"score": 1
}
These are the coordinates for the corresponding position in PyMuPDF:
(88.85449981689453, 23.943227767944336, 117.37201690673828, 44.796356201171875, 'Vue', 0, 0, 0), (121.81803131103516, 23.943227767944336, 183.36544799804688, 44.796356201171875, 'Mastery', 0, 0, 1)

this is pdf file
Nuxtjs-Cheat-Sheet.pdf

How to reproduce the bug

see above

Hope to be answered

PyMuPDF version

1.24.14

Operating system

MacOS

Python version

3.10

Metadata

Metadata

Assignees

No one assigned

    Labels

    not a bugnot a bug / user error / unable to reproduce

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions