Getting text remove issue from pdf after search and redaction #2457
Unanswered
ashifaliclientpoint
asked this question in
Q&A
Replies: 1 comment
-
This is not a bug, so let me first transfer this to the "discussions" tab. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello
I am using pymypdf(1.19.6) to search string from a pdf file. And doing redaction. But it removing text which are just above of the result.
Please help me for this strange issue.
I am also attaching the original file and converted file.
DRAFT_Executive.pdf
highlighted_file.pdf
Reproduce step
import re
import fitz
import sys, json
file_path = "DRAFT_Executive.pdf"
pattern = r'[\s*([s|c|d|i|t]):([a-z]):([o|r])\s*]' # Replace with your desired regex pattern
doc = fitz.open(file_path)
resultOutput = []
tagsPerPage = {}
addedTags = set()
for page in doc:
text = page.get_text()
tagsPerPage[page.number]=[]
matches = re.finditer(pattern, text, re.IGNORECASE | re.MULTILINE | re.DOTALL)
if matches:
for match in matches:
start, end = match.span()
coordinates = page.search_for(match.group())
tempDict={}
firstCoordStr = ""
singleTagArr = []
needleStarted=0
for rect in coordinates:
x1, y2, x2, y1 = rect
if tagsPerPage:
for page in doc:
if tagsPerPage[page.number]:
for item in tagsPerPage[page.number]:
currPage= page.number+1
if item['page']==currPage:
y1 = page.rect.height - item['y1']
y2 = page.rect.height - item['y2']
doc.save("highlighted_file.pdf", garbage=3, deflate=True)
doc.close()
Configuration
OS ubuntu
Python 3.8
PyMuPDF 1.19.6
Beta Was this translation helpful? Give feedback.
All reactions