Link text from page.get_textbox #2287
Replies: 2 comments
-
Yes, this is possible ... if the link rectangle is defined for an area which actually contains text. But the link "from" rectangle may have been defined in a bad way, so that text is not covered completely - which is also the case in your example: for i, link in enumerate(page.get_links()): # here we take the 'from' rect unmodified
print(f"Link {i}: '{page.get_textbox(link['from'])}'")
Link 0: ''
Link 1: ''
Link 2: ''
Link 3: ''
Link 4: ''
Link 5: ''
d = (-5, -5, 5, 5) # use this delta to enlarge the link rect by 5 pixels in every direction
for i, link in enumerate(page.get_links()):
print(f"Link {i}: '{page.get_textbox(link['from'] + d)}'")
Link 0: 'Antenna House, Inc.'
Link 1: 'Linking to an external file (attachment-sample-1.pdf).'
Link 2: 'Linking to a website (https://www.antennahouse.com/)'
Link 3: 'Linking to an ID'
Link 4: 'Linking to a page number (page 2)'
Link 5: 'Linking to a page number (page 2) and setting the display ratio (200%)' |
Beta Was this translation helpful? Give feedback.
-
Thank you |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
page.get_textbox(link["from"] is not extracting link text from PDF
basic-link-1.pdf
. Can you let me know if link text can be extracted with rect values of link identified using page.get_links().
Beta Was this translation helpful? Give feedback.
All reactions