performance of "mutool -sg " and "page.clean_contents()" #3161
Replies: 8 comments 1 reply
-
All your problems should go away with the following recipe(s):
|
Beta Was this translation helpful? Give feedback.
-
Neither of your suggestion, improves the situation: |
Beta Was this translation helpful? Give feedback.
-
If it helps, the problem with this pdf, as far as I see, is something like this:
The The pdf literally have the above |
Beta Was this translation helpful? Give feedback.
-
I finally understood your problem: Page method |
Beta Was this translation helpful? Give feedback.
-
Yes, I have since looked at about 100 of pdf from the same-ish source. There is only one with this problem (so far), and it is also by far the oldest according to its creation/modification date, too. |
Beta Was this translation helpful? Give feedback.
-
I looked at the output of |
Beta Was this translation helpful? Give feedback.
-
I think However, I'll tidy up the code and post it as a repo at some point - it is not pretty, it dumps all the masks, wrap them into a new pdf, run mtool to get at the front and back covers, join these two with qpdf. Much of it could be neater, but 3 to 4 minutes for a one-off conversion is not bad. |
Beta Was this translation helpful? Give feedback.
-
The code is posted at The code is mainly just for your amusement :-), to see what some wants to use pymupdf for - definitely the lossless-extraction/conversion part. It is doing a fair number of
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
This is sort of a follow-up to my discussion item. While
Page.clean_contents()
does the job, its addition makes the script many times slower. Fortunately it is essentially a one-off operation, so I'll just let it run for how every long it takes.I think
mutool clean -sg ...
is roughly equivalent to what I intend to do in one simple command line. But I found that it is even slower, and also takes up much more memory on the way, that I have not even let it work on the complete doc as is. Since I only want the front and back of the original as they are, so I only domutool clean -sg original.pdf new.pdf 1,last
. ( #3160 (comment) - trying to use front and back of original as they are, but only the mask of page 2 to N -1 . ).So here are 3 bugs/questions/feature requests:
page.clean_contents()
mutool clean -sg ...
seems to be a lot worse then iteratingpage.clean_contents()
over individual pages?doc.insert_page(page_object)
(with a cleaned page_object) would be nice. I see most of thedoc.*page(pno)
methods takes page numbers (insert, copy, move) - wouldn't it be nice to have andoc.insert_page(page_object)
method?Beta Was this translation helpful? Give feedback.
All reactions