I want to export each line of text in pdf to both png and txt files separately, any code/method to do this? [email protected] #2026

nissansz · 2022-11-07T01:57:14Z

nissansz
Nov 7, 2022

I want to export each line of text in pdf to both png and txt files separately, any code/method to do this? [email protected]

Answered by JorjMcKie

Nov 7, 2022

Use a variant of text extraction that delivers on line level together with position information.
Then make a pixmap of the line boundary box to output as png:

for block in page.get_text("dict", flags=fitz.TEXTFLAGS_TEXT)["blocks"]:
    for line in block["lines"]:
        bbox = line["bbox"]  # the line bbox
        text = " ".join([span["text"] for span in line["spans"]])  # text in line
        pix = page.get_pixmap(clip=bbox)  # pixmap of line bbox
        pix.save(...)

View full answer

JorjMcKie · 2022-11-07T07:14:46Z

JorjMcKie
Nov 7, 2022
Maintainer

Use a variant of text extraction that delivers on line level together with position information.
Then make a pixmap of the line boundary box to output as png:

for block in page.get_text("dict", flags=fitz.TEXTFLAGS_TEXT)["blocks"]:
    for line in block["lines"]:
        bbox = line["bbox"]  # the line bbox
        text = " ".join([span["text"] for span in line["spans"]])  # text in line
        pix = page.get_pixmap(clip=bbox)  # pixmap of line bbox
        pix.save(...)

0 replies

nissansz · 2022-11-07T07:24:44Z

nissansz
Nov 7, 2022
Author

I can save to png, but the text in png is very small, how to enlarge to a height of 32pixels etc.?

1 reply

JorjMcKie Nov 7, 2022
Maintainer

use the dpi parameter or the matrix parameter of the get_pixmap() method.

nissansz · 2022-11-07T07:52:10Z

nissansz
Nov 7, 2022
Author

        pix = page.get_pixmap(clip=bbox, dpi=300)  # pixmap of line bbox

It works.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

I want to export each line of text in pdf to both png and txt files separately, any code/method to do this? [email protected] #2026

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

I want to export each line of text in pdf to both png and txt files separately, any code/method to do this? [email protected] #2026

Uh oh!

nissansz Nov 7, 2022

Replies: 3 comments · 1 reply

Uh oh!

JorjMcKie Nov 7, 2022 Maintainer

Uh oh!

nissansz Nov 7, 2022 Author

Uh oh!

JorjMcKie Nov 7, 2022 Maintainer

Uh oh!

nissansz Nov 7, 2022 Author

nissansz
Nov 7, 2022

Replies: 3 comments 1 reply

JorjMcKie
Nov 7, 2022
Maintainer

nissansz
Nov 7, 2022
Author

JorjMcKie Nov 7, 2022
Maintainer

nissansz
Nov 7, 2022
Author