Skip to content
Discussion options

You must be logged in to vote

If you do page.get_text("dict")["blocks"], then each text block (one with block["type"] == 0) is a dictionary containing a list of line dictionaries, with in turn a list of sspan dictionaries.
This hierarchy of dictionaries can be looked up here.
The span dictionaries contain the font name and size of the respective text portion - along with the rectangle containing that text.

So you can either select spans falling inside your area, or you can let PyMuPDF select only that part of the output intersecting your area: page.get_text("dict", clip=area)....

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@meghanaviyyapu
Comment options

Answer selected by meghanaviyyapu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants