Skip to content
Discussion options

You must be logged in to vote

No problem!
Here is the background:
Fonts have their in-built, "natural" line height. This is more often than not larger than the fontsize of the text. For e.g. Helvetica, The line height is 37.4% larger than the fontsize.
If the PDF creator has written his lines with a smaller distance than that, line (word, ...) bboxes and hit rectangles will overlap parts of preceeding or following lines.
And because redaction logic will kill every character overlapping the redact rectangle, you will see your undesired effect.

This can be done to solve it:
Before searching or extracting text, set a global PyMuPDF parameter such that only text (line) heights equal to the font size will be generated: fit…

Replies: 1 comment 6 replies

Comment options

You must be logged in to vote
6 replies
@luclemot
Comment options

@JorjMcKie
Comment options

Answer selected by luclemot
@luclemot
Comment options

@JorjMcKie
Comment options

@luclemot
Comment options

@JorjMcKie
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
not a bug not a bug / user error / unable to reproduce
2 participants
Converted from issue

This discussion was converted from issue #2470 on June 14, 2023 15:14.