-
Notifications
You must be signed in to change notification settings - Fork 677
Description
Is your feature request related to a problem? Please describe.
When merging multiple pdf files the StructTreeRoot object is not kept if new pdf file is created and all other pdf files are appended using the insert_pdf().
import fitz
result = fitz.open()
for pdf in ['tagged1.pdf', 'tagged2.pdf', 'tagged3.pdf']:
with fitz.open(pdf) as mfile:
result.insert_pdf(mfile)
result.save("outfile.pdf")
Or StructTreeRoot object is kept from the very first file if the remaining pdf files are appended to it.
import fitz
taggedresult = fitz.open('tagged1.pdf')
for pdf in ['tagged2.pdf', 'tagged3.pdf']:
with fitz.open(pdf) as mfile:
taggedresult.insert_pdf(mfile)
taggedresult.save("outfile.pdf")
Describe the solution you'd like
If StructTreeRoot object is present in all (or only some) of the input files, then join the StructTreeRoot objects from individual files to keep the structure information.
Describe alternatives you've considered
I'm not aware of any python library that offered this functionality. (iText seems to offer it though https://stackoverflow.com/questions/19839445/merging-tagged-pdf-without-ruining-the-tags) . I have perusing the section 10.6 of PDF Reference version 1.7 and it's beyond my capabilities to implement it on my own. I found the discussion on qpdf with some details with respect to pdf specification that might help you assessing the feasibility qpdf/qpdf#490 (comment).