Scaled font size calculation in page.get_texttrace() #2645
-
I'm using texttrace to extract individual characters, their formatting, origin points and bounding boxes. This has been working well, but I've come across a problem with a particular PDF. Texttrace shows a size value for the text in this PDF of 17.33, but Acrobat displays the text at 12.99. Inspecting the text with PDFXplorer shows a size value of 17.33 matching texttrace, but also a scaling transformation of 0.75 (actually the CTM shows 0.75 0 0 -0.75). This perhaps explains the difference between 17.33 and 12.99 as 17.33 * 0.75 = 12.99. Extracting text from the same PDF with get_text("rawdict") gives a size value of 12.99. Is there a way using PyMuPDF that I can extract the CTM value applied to this text, and so recalculate 17.33 as 12.99? Or some other method of getting to the 12.99 value from the 17.33 texttrace returns? I would prefer to use texttrace rather than get_text("rawdict") as it's faster and it gives a spacewidth value which might help me calculate character spacing. PyMuPDF is excellent, many thanks for developing such a great product |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
Thank you for your post and your nice appreciation of PyMuPDF! Inspiring! Let me check what is happening there. My goal is to return the same font size value in |
Beta Was this translation helpful? Give feedback.
-
texttrace example.pdf |
Beta Was this translation helpful? Give feedback.
Great, thanks for your help.