PDF file size is increasing after adding highlight annot drastically #1623
-
Hi @JorjMcKie, I have a PDF file of 12kb with a table in it, I need to apply annotations for each word present on it, after saving the file, the size is increasing to 288kb (24 times to actual ). Is there a way to reduce the size of the file? (a file of 400 kb is increasing to 10MB) attached is the pdf file and code which I used.
Thankyou. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 5 replies
-
I see no problem given you are adding massive amounts of annotations. Here is your script slightly modified with giving annot numbers. The file size increased from 12 KB to 288 KB. Given a total number of 475 annotations, this means less than 600 bytes per annot. import fitz
pdf1 = fitz.open("3.pdf")
for page1 in pdf1:
words = page1.get_text("words")
for word in words:
page1.add_highlight_annot(word[:4])
print(f"Added {len(words)} annots on page {page1.number}.")
pdf1.save("3-out.pdf", garbage=3, deflate=True) Log: py test.py
Added 352 annots on page 0.
Added 123 annots on page 1. |
Beta Was this translation helpful? Give feedback.
-
The only option you have is not adding annots but drawing semi-transparent rectangles over the text (or underneath). import fitz
pdf1 = fitz.open("3.pdf")
for page1 in pdf1:
shape = page1.new_shape()
words = page1.get_text("words")
for word in words:
shape.draw_rect(word[:4])
shape.finish(fill=(1, 1, 0), fill_opacity=0.3)
shape.commit()
print(f"Added {len(words)} rectangles on page {page1.number}.")
pdf1.save("3-out2.pdf", garbage=3, deflate=True) |
Beta Was this translation helpful? Give feedback.
-
Ah, I should have thought of this earlier 😒 import fitz
pdf1 = fitz.open("3.pdf")
for page1 in pdf1:
words = page1.get_text("words")
word_rects = []
for word in words:
word_rects.append(fitz.Rect(word[:4]))
if word_rects != []:
page1.add_highlight_annot(word_rects)
print(f"Added {len(words)} annots on page {page1.number}.")
pdf1.save("3-out.pdf", garbage=3, deflate=True) This makes a 60 KB file from the original 12 KB. Each page only has one annot - with quads as many as there are words on the page. |
Beta Was this translation helpful? Give feedback.
Ah, I should have thought of this earlier 😒
You can add one highlight for any number of rectangles like this:
This makes a 60 KB file from the original 12 KB. Each page only has one annot - with quads as many as there are words on the page.