Replacing text in pdf using regex #2808
-
I want to substitute text in the pdf files using the following regex:
I know that there exists
how can I have the same functionality as replace but with a regex provided above? If it's not possible what workaround can you suggest? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
This will not work as straightforwardly as you hope - but there is something close: rl = page.search_for("password") # case-insensitive, potentially multiple occurrences
for bbox in rl:
page.add_redact_annot(bbox, "!!!") # mark this bbox as to erased, note replacement text
page.apply_redactions() # this "executes" all redaction annots I observed that you also want to include the German versions for "password".
|
Beta Was this translation helpful? Give feedback.
-
@JorjMcKie Thank you for your answer! I tried your solution it works Ok, but unfortunately it overlaps other rectangles as well (hence, changes neighbour text). My idea was to use redaction annotations and extend it a little bit along the x-axis (and not touching other rectangles), so that it can delete word "password" and text after it (actual password). Something like that:
But unfortunately it overlaps other rectangles and changes them as well (with and without bbox[2] += 5). |
Beta Was this translation helpful? Give feedback.
-
Yes - with some more effort.
BTW finding the font of the original text usually is a pain in the neck:
|
Beta Was this translation helpful? Give feedback.
To also erase a word following one of the found keywords, you could take a hit rect, enlarge it until the right page border, extract the words inside the result.
This should give you a list of words, where the first is the "password" literal and the second is (hopefully) the password itself.
The take the rectangle of that second item and join it with the hit rectangle to make a common redaction for both, or simple create another redact annot for the second item.
Like that (r being a hit rect of the search):