Skip to content
Discussion options

You must be logged in to vote

Yes , when i do page.get_text(clip= widget.rect) .I get the text associated with the widget , but i want the text to be extracted the same way get_text with clip = None. works i.e.. extracted text is in the same order as it appears in the pdf. where widget text is beside whatever the normal pdf text is along.

This cannot work!
All new stuff in a PDF can only be appended to old content - not inserted in the middle of things by some miracle. This a PDF peculiarity - not a PyMuPDF restriction.
The only way you have is sorting extracted text in a suitable way.
There is a sort parameter in get_text() which behaves slightly differently depending on the output option.
In your case however - as…

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
1 reply
@808Code
Comment options

Comment options

You must be logged in to vote
1 reply
@808Code
Comment options

Answer selected by 808Code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants