Which is the fastest way to transfer the pdf pages into separate jpeg images? #1648

YushuoGuan · 2022-03-21T09:42:10Z

YushuoGuan
Mar 21, 2022

Dear developers,

  I use PyMuPDF to transfer pdf pages into separate images (`1 page -> 1 image`), and I hope to know if I wrote the appropriate code. 
  I have two requirements for the images: 1. direct output image bytes. 2. the format is `JPEG` instead of `PNG` (I heard opencv reading jpeg is faster that png.)
  The code are something like:

doc = fitz.open('xxx.pdf') 
p0 = doc.load_page(0)
p0.save('page_0.jpg')
with open('page_0.jpg', 'rb') as infile:
	img_bytes = infile.read()

   I know I could use `p0.tobytes()` to get a PNG image bytes, but I don't know how to directly generate `JPEG image bytes.`

Answered by JorjMcKie

Mar 21, 2022

To make an image of a page, you must first take "picture", which is called "pixmap": pix = page.get_pixmap(). You can influence many properties of that pixmap, like resolution, whether gray or RGB, rotation or flipping.
Once you have that, either use

pix.save(...) which can only produce PNG, PNM and a handful of other formats, or
pix.pil_save() which uses pillow with an additional internal step. This allows JPEG output among others.

With .pil_save() you have the full abilities of pillow output available: the method arguments are those of pillow, which you therefore must lookup. Simple example:

page = doc[0]
pix = page.get_pixmap(dpi=300)  # RGB with a high resolution
pix.pil_save("page-…

View full answer

JorjMcKie · 2022-03-21T10:42:56Z

JorjMcKie
Mar 21, 2022
Maintainer

Welcome and let me first transform this issue to a Discussions item.

0 replies

JorjMcKie · 2022-03-21T10:57:55Z

JorjMcKie
Mar 21, 2022
Maintainer

To make an image of a page, you must first take "picture", which is called "pixmap": pix = page.get_pixmap(). You can influence many properties of that pixmap, like resolution, whether gray or RGB, rotation or flipping.
Once you have that, either use

pix.save(...) which can only produce PNG, PNM and a handful of other formats, or
pix.pil_save() which uses pillow with an additional internal step. This allows JPEG output among others.

With .pil_save() you have the full abilities of pillow output available: the method arguments are those of pillow, which you therefore must lookup. Simple example:

page = doc[0]
pix = page.get_pixmap(dpi=300)  # RGB with a high resolution
pix.pil_save("page-0.jpg")

1 reply

JorjMcKie Mar 21, 2022
Maintainer

Please also note that the above is not restricted to PDF files: it works the same way for all supported document types like XPS, EPUB, FB2, CBZ, ...

YushuoGuan · 2022-03-21T11:07:53Z

YushuoGuan
Mar 21, 2022
Author

Thanks for the quick reply. Is there a way to get the image bytes directly? In your solution, I have to 1. save to the disk; 2. read the bytes from the disk, but I think the IO operation is abundant.

2 replies

JorjMcKie Mar 21, 2022
Maintainer

Of course there is 😁!
Instead of pix.save use pix.tobytes, and instead of pix.pil_save use pix.pil_tobytes.
Both cases produce a bytes object of the image.

YushuoGuan Mar 21, 2022
Author

Got it, thanks for your help~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Which is the fastest way to transfer the pdf pages into separate jpeg images? #1648

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Which is the fastest way to transfer the pdf pages into separate jpeg images? #1648

Uh oh!

YushuoGuan Mar 21, 2022

Replies: 3 comments · 3 replies

Uh oh!

JorjMcKie Mar 21, 2022 Maintainer

Uh oh!

JorjMcKie Mar 21, 2022 Maintainer

Uh oh!

JorjMcKie Mar 21, 2022 Maintainer

Uh oh!

YushuoGuan Mar 21, 2022 Author

Uh oh!

JorjMcKie Mar 21, 2022 Maintainer

Uh oh!

YushuoGuan Mar 21, 2022 Author

YushuoGuan
Mar 21, 2022

Replies: 3 comments 3 replies

JorjMcKie
Mar 21, 2022
Maintainer

JorjMcKie
Mar 21, 2022
Maintainer

JorjMcKie Mar 21, 2022
Maintainer

YushuoGuan
Mar 21, 2022
Author

JorjMcKie Mar 21, 2022
Maintainer

YushuoGuan Mar 21, 2022
Author