-
Bug descriptionHi ! How it happenedI simply ran the following lines for each rectangle I wanted redacted:
From my attempt at debugging this issue, it seems like the add_redact_annot is working fine (it crosses off the right rectangle when commenting out the next line) and that the issue comes from the apply_redactions method. OutputsBelow is the result redaction. The text in red is replacement text I added. On the overlined screenshot, you can see the unwanted white rectangle. Your configuration
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
This is not a bug, but an often seen complication. Let me trasnfer this to "Discussions" first. |
Beta Was this translation helpful? Give feedback.
No problem!
Here is the background:
Fonts have their in-built, "natural" line height. This is more often than not larger than the fontsize of the text. For e.g. Helvetica, The line height is 37.4% larger than the fontsize.
If the PDF creator has written his lines with a smaller distance than that, line (word, ...) bboxes and hit rectangles will overlap parts of preceeding or following lines.
And because redaction logic will kill every character overlapping the redact rectangle, you will see your undesired effect.
This can be done to solve it:
Before searching or extracting text, set a global PyMuPDF parameter such that only text (line) heights equal to the font size will be generated:
fit…