Skip to content
Discussion options

You must be logged in to vote

Your last comment sheds some more light on the problem.

  1. You are determining some boundary boxes (bbox) for text on the page.
  2. Your downstream tools (as OpenCV) are converting the page to an image at a certain resolution. This introduces the following complications
    • An image has integer dimensions. Everything on the image obviously also is addressable by integer coordinates. A PDF page has float dimensions: width and height need not be integers, the text bboxes have float coordinates.
    • The chosen image resolution (DPI) also changes the dimension: an A4 PDF page (width 595.0, height 842.0) will be turned to an 1240x1755 image when rendered with a DPI of 150.
    • You therefore need to convert all…

Replies: 2 comments 7 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
7 replies
@JorjMcKie
Comment options

@JorjMcKie
Comment options

@rudra0713
Comment options

@JorjMcKie
Comment options

Answer selected by rudra0713
@rudra0713
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #2866 on December 05, 2023 09:17.