Skip to content
Discussion options

You must be logged in to vote

Well here is a hacky approach, which does the job in 0.1 seconds.
Your suspicion that the image binary in PDF is a substring of the TIFF file content, is confirmed:
We need to strip of the first 8 and the last 215 bytes of the file content to get the PDF stuff.
The basic idea is anticipating a later MuPDF optimization by preparing the image xrefs as required using PyMuPDF's low-level code. Then use the insert_image() format which refers to an existing image xref instead to a new image:

import fitz
import time
import os, pathlib

# image object definition without /Filter etc.
xref_templ = """<</Type /XObject /Subtype /Image /Width &width /Height &height /BitsPerComponent 1 
/ColorSpace /De…

Replies: 8 comments 13 replies

Comment options

You must be logged in to vote
1 reply
@JorjMcKie
Comment options

Comment options

You must be logged in to vote
1 reply
@JorjMcKie
Comment options

Comment options

You must be logged in to vote
1 reply
@JorjMcKie
Comment options

Comment options

You must be logged in to vote
2 replies
@JorjMcKie
Comment options

@JorjMcKie
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@JorjMcKie
Comment options

Comment options

You must be logged in to vote
3 replies
@JorjMcKie
Comment options

@jlb6907
Comment options

@JorjMcKie
Comment options

Answer selected by jlb6907
Comment options

You must be logged in to vote
4 replies
@jlb6907
Comment options

@JorjMcKie
Comment options

@jlb6907
Comment options

@JorjMcKie
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants