Skip to content
Discussion options

You must be logged in to vote

Is it possible that I can access the font xref of texts in page.get_texttrace()?

No, this is a low-level, high-speed method for easy access to single characters and their glyphs. It must not be overloaded with access to other information. Also remember that all text extractions and searches work for all document types - not only for PDFs. So PDF xrefs either wouldn't make sense within these methods at all, or require code that significantly slows down things - just for corner-case purposes.

The Chinese font names in page.get_fonts() have encoding issue.

No, the font names in get_fonts() are directly taken from the PDF object definition, where each non-Latin character is encoded in PDF…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@yufc2002
Comment options

Answer selected by JorjMcKie
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
not a bug not a bug / user error / unable to reproduce
2 participants
Converted from issue

This discussion was converted from issue #1933 on September 21, 2022 10:49.