Why the file size became huge after I write ocr text to pdf? #2984
-
I tried to read pdf pages with dpi=300, then to get text, and put text to original pdf, but the saved file size is too large, 1MB->13MB tw = fitz.TextWriter(page.rect)
|
Beta Was this translation helpful? Give feedback.
Replies: 10 comments 21 replies
-
Method |
Beta Was this translation helpful? Give feedback.
-
Thank you. It works. |
Beta Was this translation helpful? Give feedback.
-
You can see above image, I need an effect to easily compare result and original text When too many texts are mixed in an area, it is not easy to read and compare. Any other option for better visualization for comparison? |
Beta Was this translation helpful? Give feedback.
-
Any example effect? |
Beta Was this translation helpful? Give feedback.
-
How to convert RGB color to pymupdf color code? |
Beta Was this translation helpful? Give feedback.
-
How to add a light blue background 50% transparent to the text I wrote on pdf? |
Beta Was this translation helpful? Give feedback.
-
No fill option for textwriter? tw.append((int(left), int(top)), text, fontsize=(int(bottom) - int(top))*0.7, font=font, small_caps=True) |
Beta Was this translation helpful? Give feedback.
-
Adobe Acrobat will reduce image quality after saving its result. |
Beta Was this translation helpful? Give feedback.
-
TypeError: draw_rect() got an unexpected keyword argument 'opacity' |
Beta Was this translation helpful? Give feedback.
-
Used morph and multiple colors. effect seems a little better. |
Beta Was this translation helpful? Give feedback.
Method
Document.ez_save
already has the defaults garbage=3 and deflate=True.With
TextWriter
class, all fonts are file-based and hence embedded into the file.As Asian fonts are large (and automatically pulled in when necessary), the file size increases - sometimes dramatically as in your case.
Please use
doc.subset_fonts()
right before saving. This method builds subsets for all eligible fonts: each font is replaced with a version that only contains the glyphs that are actually in use in the PDF.