-
I can't share this input file but I can probably prepare a similar one: its just output from LaTeX. Suppose I copy the pages of src = fitz.open("version1.pdf")
dest = fitz.open()
for n in range(len(src)):
dest.insert_pdf(src, from_page=n, to_page=n)
dest.ez_save("foo1.pdf") BackgroundBy default this makes quite large output:
(my file has 24 pages, likely scales with number of pages). But this is easy to fix with dest.ez_save("foo2.pdf", clean=True, garbage=4) Today's issuesrc = fitz.open("version1.pdf")
dest = fitz.open()
for n in range(len(src)):
dest.insert_pdf(src, from_page=n, to_page=n)
dest.ez_save("foo2-1.pdf", clean=True, garbage=4)
dest.ez_save("foo2-2.pdf", clean=True, garbage=4)
dest.ez_save("foo2-3.pdf", clean=True, garbage=4)
dest.ez_save("foo2-4.pdf", clean=True, garbage=4) Here I saved the file four times, expected the same output each time. Instead, I get decreasing file sizes:
(I think sometimes I've observed larger files too, not sure). Question 1Why does this happen? On one hand, its useful. But OTOH, saving each file "a couple of times" to get a maybe 50% smaller file seems like a poor style choice... Question 2Saving seems stateful. Is this desirable? Its unexpected to me: my naive intuition is that saving the file would not change the state of So minimum fix is probably to document this behaviour. "Note saving a full can change the in-memory data as well." |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
I spend a few minutes trying to build a shareable MWE. So far, the effect is less dramatic:
(but still noticeable). It seems necessary to have So if needed, I will later create a MWE that is shareable: for now I'm nervous about messing up and sharing the wrong thing. |
Beta Was this translation helpful? Give feedback.
-
If you use |
Beta Was this translation helpful? Give feedback.
-
Overall I think we have no issue here. I have made several experiments without using the |
Beta Was this translation helpful? Give feedback.
If you use
clean=True
then this will always change appearance streams, because MuPDF cannot / does not check whether or not an appearance stream (like a/Contents
object) has already been "cleaned" before.So in order to do an analysis we would need an experiment with
clean=False, deflate=True
. Only that indeed would be worthwhile investigation.