Skip to content

Python Text annotation with PyMuPDF #872

@testingdlrna

Description

@testingdlrna

I'm using PyMuPDF for annotating some text in . pdf document by using:

`import fitz
import re

def data_(text):

    annotation_text = r"(amet)"
    for line in text:
        if re.search(annotation_text, line, re.IGNORECASE): 
            search = re.search(annotation_text, line, re.IGNORECASE) 
            yield search.group(1) 

def includeannotation(path_included): 
    
    document = fitz.open(path_included) 
    
    
    for page in document: 
        page.wrap_contents() 
        obs = data_(page.getText("text") .split('\n'))
        #print (obs)
        for data in obs: 
            catchs = page.searchFor(data) 
            [page.addRedactAnnot(catchs, fontsize=11, fill = (0, 0, 0)) for catch in catchs] 
        page.apply_redactions() 
    doc.save('annotation.pdf') 
    print("end - created") 

path_included = '/content/document.pdf'

save_document=includeannotation(path_included)`

The source .pdf document contains the text:
YtLwm

By applying the above mentioned code, I can include the annotation for the text "amet" obtain the following result:

lvcvu

And the result seems to be in line with the expection, but you can see that the library has included the annotation in black (for "amet") also deleting the word in the line after, but not with the black annotation. And in fact it looks like a restyling problem.

How can I avoid such problem?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions