loosing some content in the pdf #1634
Answered
by
JorjMcKie
kvrameshreddy
asked this question in
Looking for help
-
Hi @JorjMcKie , Some of the content on a pdf is missing while I am applying rectangle over, where as add_highlight_annot is giving correct result Can you please have a look into it and help me
the above code is resulting in missing some info on the page
this has all the info of the original pdf, but size increases For your reference original PDF |
Beta Was this translation helpful? Give feedback.
Answered by
JorjMcKie
Mar 10, 2022
Replies: 1 comment 1 reply
-
That PDF contains sloppy handling of geometry changes. Cleaning it first thing after reading it will solve the problem: import fitz
pdf1 = fitz.open("8.pdf")
for page1 in pdf1:
page1.clean_contents() # <=== do this before anything else with the page
shape = page1.new_shape()
words = page1.get_text("words")
for w in words:
shape.draw_rect(w[:4])
shape.finish(fill=(1, 1, 0), fill_opacity=0.3)
shape.commit()
print(f"Added {len(words)} rectangles on page {page1.number}.")
pdf1.save("8_out1.pdf", garbage=3, deflate=True) |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
kvrameshreddy
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
That PDF contains sloppy handling of geometry changes. Cleaning it first thing after reading it will solve the problem: