How to link to pages with keywords #1923
-
Hi there! This is a follow up question to a response you shared a few years ago: #693 I'm wondering if there's a way to add a page to the PDF with links to every page where the keyword exists. So for example if I searched for the word "rbg" and I got the response that it was available on pg 1, 5, and 8. I'd like to add a page to the front of the PDF that said "rbg: 1, 5, and 8" with links to pg 1, 5, and 8. Is there a way to do this? Hope this makes sense! Thanks for your help! Rebecca |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 15 replies
-
First of all, you need to design the layout of that "index page". Could look like
Then make a new page via for pno in page_numbers:
pno_rects = page.search_for(str(pno))
for rect in pno_rects:
link = {"kind": fitz.LINK_GOTO, "from": tuple(rect), "to": (50, 50)}
page.insert_link(link) The above will point to position (50, 50) on a page where the keyword can be found. Probably not very satisfying. kw_rect = page.search_for("keyword1")[0] # position on the index page
# extend that rect to include the full line:
kw_rect.x1 = page.rect.width
pno_rect = page.search_for(str(pno), clip=kw_rect)
# look up the rectangle of keyword1 on page pno and use that rectangle's top-left as "to" in the link I hope you still are with me at that point ... Once the index page is finished, move it to the desired position inside the document using Please do not hesitate to ask if something is unclear. |
Beta Was this translation helpful? Give feedback.
-
In principle this is the approach.
page = doc.new_page() # this will be our index page
insertion_point = fitz.Point(100,72) # start inserting lines here
for kw in kw_dict.keys():
pno_string = ", ".join([str(v[0]) for v in kw_dict[kw]]) # look like "1, 5, ..."
page.insert_text(insertion_point, f"{kw} {pno_string}")
insertion_point += (0,20) # insertion of next line here
# text lines inserted, now search for page numbers for each key word
for kw in kw_dict.keys():
kw_rect = page.search_for(kw)[0] # kw text is within here
kwrect.x1 = page.width # al resp. page number are with this extended rect!
for pno, rect in kw_dict[kw]:
pno_rect = page.search_for(str(pno))[0] # this is the rect for "1", "5", etc.
link = {"kind":fitz.LINK_GOTO, "page": pno, "from":tuple(pno_rect), "to": tuple(rect.tl)}
page.insert_link(link)
# index page finished with all links etc.
doc.move_page(page.number, 0) # move index page to front of document |
Beta Was this translation helpful? Give feedback.
-
Ok! That makes a lot of sense! Is there a way to automatically create my keyword dictionary from the results of my keyword search? I tried piecing it together below. I see the list of keywords and pages in my output, but not saved in the PDF. Any suggestions? Thanks again for all your help!
|
Beta Was this translation helpful? Give feedback.
-
Method
You forgot to save the document 😎? |
Beta Was this translation helpful? Give feedback.
-
OMG another misconception! |
Beta Was this translation helpful? Give feedback.
-
I have taken the liberty to update your script. Seemed more efficient than the other way. Good luck! |
Beta Was this translation helpful? Give feedback.
-
You can check the length that a text will take when inserted. Instead of just inserting a text line, you can take a rectangle and use page.insert_textbox()`. This method automatically breaks the text into line (at word boundaries). Its return code is a float which informs about the success. A negative number means there was not enough room, other values mean the unused space at the rectangle's bottom. |
Beta Was this translation helpful? Give feedback.
I have taken the liberty to update your script. Seemed more efficient than the other way. Good luck!
test.zip