Replies: 1 comment 1 reply
-
Text extraction works for all document types. That happens if you extract the text of the XML page? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
What would be the best way to insert data from the xml to the new pdf file?
Up until now I parsed the xml file with xmltodict library. Then flattened the nested dictionary, iterated through it and used insert_text function. This works but I am wondering is there some more appropriate method.
I also tried
doc = fitz.open(xml)
xml = doc.convert_to_pdf()
pdf = fitz.open("pdf", xml)
for page in pdf:
text = page.get_text()
this would work but xml file is in shift-JIS encoding and I don't get the correct output.
Does anyone have any suggestions
Beta Was this translation helpful? Give feedback.
All reactions