Skip to content
Discussion options

You must be logged in to vote

Yes, this can be done:
Extract the text with sufficient information: page.get_text("dict",...). The result is a dictionary of stacked dictionaries described here.
The lowest hierarchy level dict contains the font size, the text itself and some other more font attributes.
If you know the desired font size, take the text, the page number and text position and store this information in a list.
When done iterating over the applicable PDFs, you can make a new PDF with a page on to which write a text line from each of the previously created list items.
Each of the written lines can be overlaid with a hyperlink pointing to the respective PDF + page from where the information was previously extra…

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
1 reply
@Shohreh
Comment options

Answer selected by Shohreh
Comment options

You must be logged in to vote
1 reply
@Shohreh
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants