Get text from coordinates #2128
-
Hy guys, I'm new with this package and what I try to achieve seems basic but I'm struggling.
What I have done so far:
So even when I'm forcing like for example rect = fitz.Rect(250, 116, 292, 130) It can't find anything. Am I missing something on the |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
The I do not understand why you modify that rectangle before you try a text extraction. Doing this of course will not let you find anything. doc = fitz.open("some.pdf")
page = doc[0]
rlist = page.search_for("pixmap")
for rect in rlist:
print(page.get_textbox(rect)) |
Beta Was this translation helpful? Give feedback.
-
Ok, got you.
Why dont you do enlarge you hit rect of "invoice number" by some amount. Assuming rect.y0 -= 5 # make the stripe higher
rect.y1 += 5
rect.x1 = page.rect.width # go until right page border
words = page.get_text("words", clip=rect, sort=True) # ensure left-to-right sorting order
# now items 0 and 1 should reflect "Invoice" and "Number" and the rest should be your actual invoice number. |
Beta Was this translation helpful? Give feedback.
Ok, got you.
You may have a couple of problems though:
Why dont you do enlarge you hit rect of "invoice number" by some amount. Assuming
rect
is that hit rect, do something like this: