Skip to content

1.25.3 regression: Generates bbox with negative values #4316

@EdmundsEcho

Description

@EdmundsEcho

Description of the bug

I'm getting bbox value with negative entry values. This seems wrong (and is breaking my logic):

{
      "number": 4,
      "type": 0,
      "bbox": [
         2147483520.0,
         2147483520.0,
         -2147483648.0,
         -2147483648.0
      ],
      "lines": []
   }

from the following snippet where I'm calling get_text on pymupdf.Page:

        text_blocks = list(filter(
            lambda block: block.get("type") == 0, # type 0 is text vs image
            self._page.get_text("rawdict").get("blocks", []) 
        ))

How to reproduce the bug

If I extract a page from the pdf, I can get the parser to work. When I parse all 8 or so pages, I get the negative values.

Finally, if I replace the parser with v1.25.2 I get the expected positive only values.

PyMuPDF version

1.25.3

Operating system

MacOS

Python version

3.11

Metadata

Metadata

Assignees

No one assigned

    Labels

    not a bugnot a bug / user error / unable to reproduce

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions