How to create pdf from xml #2461

igivucko · 2023-06-08T12:42:05Z

igivucko
Jun 8, 2023

What would be the best way to insert data from the xml to the new pdf file?

Up until now I parsed the xml file with xmltodict library. Then flattened the nested dictionary, iterated through it and used insert_text function. This works but I am wondering is there some more appropriate method.

I also tried
doc = fitz.open(xml)
xml = doc.convert_to_pdf()
pdf = fitz.open("pdf", xml)
for page in pdf:
text = page.get_text()

this would work but xml file is in shift-JIS encoding and I don't get the correct output.
Does anyone have any suggestions

JorjMcKie · 2023-06-09T10:10:51Z

JorjMcKie
Jun 9, 2023
Maintainer

Text extraction works for all document types. That happens if you extract the text of the XML page?

1 reply

igivucko Jun 9, 2023
Author

Yes, the xml is in the shift-js encoding.
When I convert xml to dict I use shift js encoding. I add japanese font in pymupdf and insert_text works fine.
But when I use the other method, I am able to open xml, convert it but when I use get_text method the text is not in the proper encoding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to create pdf from xml #2461

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to create pdf from xml #2461

Uh oh!

igivucko Jun 8, 2023

Replies: 1 comment · 1 reply

Uh oh!

JorjMcKie Jun 9, 2023 Maintainer

Uh oh!

igivucko Jun 9, 2023 Author

igivucko
Jun 8, 2023

Replies: 1 comment 1 reply

JorjMcKie
Jun 9, 2023
Maintainer

igivucko Jun 9, 2023
Author