I want to export each line of text in pdf to both png and txt files separately, any code/method to do this? [email protected] #2026
-
I want to export each line of text in pdf to both png and txt files separately, any code/method to do this? [email protected] |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
Use a variant of text extraction that delivers on line level together with position information. for block in page.get_text("dict", flags=fitz.TEXTFLAGS_TEXT)["blocks"]:
for line in block["lines"]:
bbox = line["bbox"] # the line bbox
text = " ".join([span["text"] for span in line["spans"]]) # text in line
pix = page.get_pixmap(clip=bbox) # pixmap of line bbox
pix.save(...) |
Beta Was this translation helpful? Give feedback.
-
I can save to png, but the text in png is very small, how to enlarge to a height of 32pixels etc.? |
Beta Was this translation helpful? Give feedback.
-
It works. |
Beta Was this translation helpful? Give feedback.
Use a variant of text extraction that delivers on line level together with position information.
Then make a pixmap of the line boundary box to output as png: