Scaled font size calculation in page.get_texttrace() #2645

sky884 · 2023-09-07T07:48:02Z

sky884
Sep 7, 2023

I'm using texttrace to extract individual characters, their formatting, origin points and bounding boxes. This has been working well, but I've come across a problem with a particular PDF. Texttrace shows a size value for the text in this PDF of 17.33, but Acrobat displays the text at 12.99.

Inspecting the text with PDFXplorer shows a size value of 17.33 matching texttrace, but also a scaling transformation of 0.75 (actually the CTM shows 0.75 0 0 -0.75). This perhaps explains the difference between 17.33 and 12.99 as 17.33 * 0.75 = 12.99.

Extracting text from the same PDF with get_text("rawdict") gives a size value of 12.99.

Is there a way using PyMuPDF that I can extract the CTM value applied to this text, and so recalculate 17.33 as 12.99? Or some other method of getting to the 12.99 value from the 17.33 texttrace returns? I would prefer to use texttrace rather than get_text("rawdict") as it's faster and it gives a spacewidth value which might help me calculate character spacing.

PyMuPDF is excellent, many thanks for developing such a great product

Answered by sky884

Sep 28, 2023

Great, thanks for your help.

View full answer

JorjMcKie · 2023-09-21T09:03:55Z

JorjMcKie
Sep 21, 2023
Maintainer

Thank you for your post and your nice appreciation of PyMuPDF! Inspiring!

Let me check what is happening there. My goal is to return the same font size value in get_texttrace() as in the get_text() method (which is a direct result of MuPDF logic).
Inside get_texttrace() all information is available - including page CTM and text TRM. There unfortunately is now way to hand these matrices to the caller, so I am bound to do it correctly inside the method ... 😎.

2 replies

sky884 Sep 21, 2023
Author

Thanks for taking the time to look into this for me.

JorjMcKie Sep 24, 2023
Maintainer

My code still looks harmless to me - can you please share that PDF example?

sky884 · 2023-09-26T13:48:25Z

sky884
Sep 26, 2023
Author

texttrace example.pdf
Sure here's the file.

2 replies

JorjMcKie Sep 28, 2023
Maintainer

I have found the place where this happens - of course inside C-code.
I am preparing a fix for a version in near future.

sky884 Sep 28, 2023
Author

Great, thanks for your help.

Answer selected by sky884

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scaled font size calculation in page.get_texttrace() #2645

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Scaled font size calculation in page.get_texttrace() #2645

Uh oh!

sky884 Sep 7, 2023

Replies: 2 comments · 4 replies

Uh oh!

JorjMcKie Sep 21, 2023 Maintainer

Uh oh!

sky884 Sep 21, 2023 Author

Uh oh!

JorjMcKie Sep 24, 2023 Maintainer

Uh oh!

sky884 Sep 26, 2023 Author

Uh oh!

JorjMcKie Sep 28, 2023 Maintainer

Uh oh!

sky884 Sep 28, 2023 Author

sky884
Sep 7, 2023

Replies: 2 comments 4 replies

JorjMcKie
Sep 21, 2023
Maintainer

sky884 Sep 21, 2023
Author

JorjMcKie Sep 24, 2023
Maintainer

sky884
Sep 26, 2023
Author

JorjMcKie Sep 28, 2023
Maintainer

sky884 Sep 28, 2023
Author