Replies: 2 comments 3 replies
-
Here's how I do it: pattern = re.compile(b'[^t]xref.*?EOF', flags=re.M+re.S)
# `inc_nb` means the number of incremental update contained in the binary
matches = list(pattern.finditer(doc.data))
inc_nb = len(matches)
if not inc_nb > 1:
return result
# here we loop to separate each incremental update in
# the binary to retrieve each version of the file
# We don't take into account the current version
for sub in matches[:-1]:
stream = doc.data[:sub.span()[1]] # binary code of a version
if not stream:
continue
previous = Doc(stream).as_fitz() # is one of the previous version of the file |
Beta Was this translation helpful? Give feedback.
2 replies
-
This feature would have to be implemented by our base library MuPDF to make any sense. Please talk to the MuPDF colleagues on e.g. their Dicscord channel about options they may see. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I've observed that PyMuPDF provides functionality to determine the number of versions present in a document.
Currently, I'm utilizing pdfresurrect, a C tool/library designed to "extract older 'hidden' versions of a PDF from the current PDF."
While my larger application is built using PyMuPDF, I have to call the compiled C code through Python for version retrieval, which is not an ideal way of building applications.
I propose that enhancing the capability to retrieve all PDF version files directly within PyMuPDF would be beneficial.
Beta Was this translation helpful? Give feedback.
All reactions