Replaced all deflated images (of a certain type) with jpeg equivalent #3464

HinTak · 2024-05-10T13:59:54Z

HinTak
May 10, 2024

I feel this could be an FAQ but couldn't find a relevant switch/example. What is the mutool convert / pymupdf equivalent of wanting these with ghostscript's pdfwrite, or combinations of?
-dDownsampleColorImages=true
-dDownsampleGrayImages=true
-dDownsampleMonoImages=true

I.e. basically replacing zlib deflated images with their jpeg / ccitt tiff equivalent.

Answered by HinTak

May 11, 2024

Thanks for the tips . Here is what i come up with now: https://github.com/HinTak/pymupdf-jbig2-extract/blob/main/lossy-convert.py

It is mostly doing want I wanted - just convert RGB flatedecoded images to jpeg. I have two questions though:

going via PIL.Image.save vs fitz.Pixmap.save results in much better compression, by default even without the optimized=True PIL key (you can roll back a commit or two before to see the diff - I started off with PIL actually, then wanted to remove that dependency and found the size result to be worse - the public repo has simplified/reverted history compared to my private repo). In one file I use for such test, original is 400MB. Going via PIL gives a …

View full answer

HinTak · 2024-05-10T14:10:36Z

HinTak
May 10, 2024
Author

Actually downsampling isn't exactly what I want - I want to convert all images from zlib deflate to dctdecode/ccittdecode, at its original resolution. This seems to be a useful /frequent request?

0 replies

JorjMcKie · 2024-05-10T15:34:22Z

JorjMcKie
May 10, 2024
Maintainer

We currently have a - possible meagre - version for that: page.replace_image(xref, <new image spec>) re-inserts an arbitrary new image under the xref of the existing one.
So you could extract a given image, massage it some desired way and insert it again as if it would be new.

0 replies

HinTak · 2024-05-11T22:52:23Z

HinTak
May 11, 2024
Author

Thanks for the tips . Here is what i come up with now: https://github.com/HinTak/pymupdf-jbig2-extract/blob/main/lossy-convert.py

It is mostly doing want I wanted - just convert RGB flatedecoded images to jpeg. I have two questions though:

going via PIL.Image.save vs fitz.Pixmap.save results in much better compression, by default even without the optimized=True PIL key (you can roll back a commit or two before to see the diff - I started off with PIL actually, then wanted to remove that dependency and found the size result to be worse - the public repo has simplified/reverted history compared to my private repo). In one file I use for such test, original is 400MB. Going via PIL gives a 75MB output (and down to 72MB with optimized=True), while via fitz.Pixmap.save gives a 200MB pdf. I am quite surprised by this, since it should be compressing a large raw bitmap in both cases, and should just be libjpeg's default, and may even be identical (except for padding at image edges). Why is Pixmap.save so much worse? Or, what settings are not default? I seem to remember Adobe has different ideas about jpeg quality setting from libjpeg's.
the other thing I like to ask is what's mupdf's equivalent for not updating.the 2nd field in the ID with a random number? qpdf offers a --deterministic-ID . This should probably be a doc.save option.

0 replies

HinTak · 2024-05-11T23:05:22Z

HinTak
May 11, 2024
Author

Argh, I have my answer to the 2nd question: no_new_id=True.

0 replies

HinTak · 2024-05-12T01:00:13Z

HinTak
May 12, 2024
Author

Argh, have the answer to my first question also, after extracting the corresponding objects. Upstream mupdf set progressive=True, set dpi=96, disable chroma subsampling, and set quality to 95. Doing all these in PIL I cab get bitwise identical output. Found all of them in upstream mupdf code too, except the last one: upstream default to 90. 95 is a pymupdf setting?

0 replies

HinTak · 2024-05-12T01:05:11Z

HinTak
May 12, 2024
Author

Yes, 95 is a pymupdf setting:

PyMuPDF/docs/pixmap.rst

Line 326 in d464133

.. method:: save(filename, output=None, jpg_quality=95)

This concludes everything I want to know now. I'll add these as comments, but keep the PIL code.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replaced all deflated images (of a certain type) with jpeg equivalent #3464

Uh oh!

{{title}}

Uh oh!

Replies: 6 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Replaced all deflated images (of a certain type) with jpeg equivalent #3464

Uh oh!

HinTak May 10, 2024

Replies: 6 comments

Uh oh!

HinTak May 10, 2024 Author

Uh oh!

JorjMcKie May 10, 2024 Maintainer

Uh oh!

HinTak May 11, 2024 Author

Uh oh!

Uh oh!

HinTak May 11, 2024 Author

Uh oh!

HinTak May 12, 2024 Author

Uh oh!

HinTak May 12, 2024 Author

HinTak
May 10, 2024

HinTak
May 10, 2024
Author

JorjMcKie
May 10, 2024
Maintainer

HinTak
May 11, 2024
Author

HinTak
May 11, 2024
Author

HinTak
May 12, 2024
Author

HinTak
May 12, 2024
Author