How to extract the font properties of specific text? #1512

meghanaviyyapu · 2022-01-03T15:46:27Z

meghanaviyyapu
Jan 3, 2022

Can you let me know how to extract the font size and font name of only some part of text in the PDF?

Jan 3, 2022

If you do page.get_text("dict")["blocks"], then each text block (one with block["type"] == 0) is a dictionary containing a list of line dictionaries, with in turn a list of sspan dictionaries.
This hierarchy of dictionaries can be looked up here.
The span dictionaries contain the font name and size of the respective text portion - along with the rectangle containing that text.

So you can either select spans falling inside your area, or you can let PyMuPDF select only that part of the output intersecting your area: page.get_text("dict", clip=area)....

View full answer

JorjMcKie · 2022-01-03T15:59:19Z

JorjMcKie
Jan 3, 2022
Maintainer

If you do page.get_text("dict")["blocks"], then each text block (one with block["type"] == 0) is a dictionary containing a list of line dictionaries, with in turn a list of sspan dictionaries.
This hierarchy of dictionaries can be looked up here.
The span dictionaries contain the font name and size of the respective text portion - along with the rectangle containing that text.

So you can either select spans falling inside your area, or you can let PyMuPDF select only that part of the output intersecting your area: page.get_text("dict", clip=area)....

1 reply

meghanaviyyapu Jan 3, 2022
Author

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to extract the font properties of specific text? #1512

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to extract the font properties of specific text? #1512

Uh oh!

meghanaviyyapu Jan 3, 2022

Replies: 1 comment · 1 reply

Uh oh!

JorjMcKie Jan 3, 2022 Maintainer

Uh oh!

meghanaviyyapu Jan 3, 2022 Author

meghanaviyyapu
Jan 3, 2022

Replies: 1 comment 1 reply

JorjMcKie
Jan 3, 2022
Maintainer

meghanaviyyapu Jan 3, 2022
Author