Get text from pdf page excluding page number #3992
Unanswered
vignesh0710
asked this question in
Looking for help
Replies: 1 comment
-
No one can know where the PDF creator has decided to put header, footer, etc. including page numbers. All this is just text as per the PDF's perspective. blocks=page.get_text("blocks", sort=True)
if "Page" in blocks[-1][4]: # text in the last block, adjust as needed
blocks = blocks[:-1] # ignore last block
text = "\n".join([b[4] for b in blocks]) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Trying to get text from pdf page excluding the page number in the right bottom corner
code:
This works for some cases, but often it removes more text than the page number.
Is there a better way to remove the page number when getting text from page?
Beta Was this translation helpful? Give feedback.
All reactions