Replies: 1 comment 1 reply
-
This is a thin wrapper of an original MuPDF function. So there is no way for me to influence the output, sorry. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
The quality of the extracted html output for
PyMuPDF
is far better than what I was getting using some of the other libraries likePDBox
wrapper for python. However, one concern I have is regarding the output file size which is quite larger (1.5 MB) as compared to the other option (400 KB). I am using the flag to skip images usingnot fitz.TEXT_PRESERVE_IMAGES
. Apart from this, how can I further reduce the size of the output html file? I'm looking for minified versions of the html code. Thanks. I want to preserve the whitespaces if possibly since the PDF contains a few tables as well.Beta Was this translation helpful? Give feedback.
All reactions