-
Describe the bugI want to match fonts of texts captured by page.get_texttrace() with fonts captured by page.get_fonts(). This is the sample PDF
Embedded Fonts are:
Fonts in the page.get_texttrace() are:
The Chinese font names in page.get_fonts() have encoding issue. Is it possible that I can access the font xref of texts in page.get_texttrace()? Thank you! My configuration (mandatory)Python 3.8.8 |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
No, this is a low-level, high-speed method for easy access to single characters and their glyphs. It must not be overloaded with access to other information. Also remember that all text extractions and searches work for all document types - not only for PDFs. So PDF xrefs either wouldn't make sense within these methods at all, or require code that significantly slows down things - just for corner-case purposes.
No, the font names in
You can request the full fontname to be returned in the text extraction functions via |
Beta Was this translation helpful? Give feedback.
No, this is a low-level, high-speed method for easy access to single characters and their glyphs. It must not be overloaded with access to other information. Also remember that all text extractions and searches work for all document types - not only for PDFs. So PDF xrefs either wouldn't make sense within these methods at all, or require code that significantly slows down things - just for corner-case purposes.
No, the font names in
get_fonts()
are directly taken from the PDF object definition, where each non-Latin character is encoded in PDF…