pyPDF2 to PyMuPDF #2327
-
Hi everyone, Can someone plz explain/show me how I can rewrite following code to work in PyMuPDF? I have tried with widgets but not fully happy.
Thx in advanced |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
The issue here is that in PyMuPDF fields / widgets are kids of pages. To simulate pypdf2's behavior, one must use PyMuPDF's low-level functions - which still look similar enough to pypdf2. With the low-level functions, you can access all of a PDF's object directly in a syntax close to PDF source code. So your above code would look like this: import fitz, sys
def sort_fields(doc, field_xref): # invoke with a field's cross ref number
kids = doc.xref_get_key(field_xref, "Kids") # are there kids?
if kids[0] == "array": # extract xref numbers from the kids array
xrefs = list(map(int, kids[1][1:-1].replace("0 R", "").split()))
return [sort_fields(doc, i) for i in xrefs] # return list of kid field values
return doc.xref_get_key(field_xref, "V")[1] # return the field value
def main():
doc = fitz.open("input.pdf") # open PDF
root = doc.pdf_catalog() # access its catalog
field_xrefs = doc.xref_get_key(root, "AcroForm/Fields")[1] # extract array of field xref numbers
if field_xrefs == "null":
sys.exit("Document has no fields")
# the array looks like "[4711 0 R 4712 0 R 123 0 R ...]"
# xref numbers of all fields
field_xrefs = list(map(int, field_xrefs[1:-1].replace("0 R", "").split()))
for xref in field_xrefs:
value = sort_fields(doc, xref)
print(f"Field {xref} has value '{value}'.") Note that in PDF, the string "null" represents what is called |
Beta Was this translation helpful? Give feedback.
-
Thank you @JorjMcKie! Your code does exactly what I want :) how can I buy you a beer? |
Beta Was this translation helpful? Give feedback.
The issue here is that in PyMuPDF fields / widgets are kids of pages. To simulate pypdf2's behavior, one must use PyMuPDF's low-level functions - which still look similar enough to pypdf2.
With the low-level functions, you can access all of a PDF's object directly in a syntax close to PDF source code. So your above code would look like this: