Skip to content
Discussion options

You must be logged in to vote

Indeed, if you do this:

doc=fitz.open(doc.name)
page=doc[26]
for p in paths:
    for item in p["items"]:
        if item[0] == "re":
            r=item[1]
            if r.width >2 and r.height>2:
                page.draw_rect(item[1],color=(1,0,0),width=0.3)

You get this:

So except for the header, no text is extracted based on drawing rectangles.

Replies: 8 comments 11 replies

Comment options

You must be logged in to vote
4 replies
@alphomeg
Comment options

@alphomeg
Comment options

@JorjMcKie
Comment options

@alphomeg
Comment options

Comment options

You must be logged in to vote
1 reply
@alphomeg
Comment options

Answer selected by JorjMcKie
Comment options

You must be logged in to vote
4 replies
@JorjMcKie
Comment options

@alphomeg
Comment options

@JorjMcKie
Comment options

@alphomeg
Comment options

Comment options

You must be logged in to vote
1 reply
@alphomeg
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@alphomeg
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #1656 on March 30, 2022 09:32.