Replacing bitmaps with PDF drawings/shapes #2155

sfadschm · 2023-01-02T22:30:04Z

sfadschm
Jan 2, 2023

Hi,
I am new to PyMuPDF and I have a specific problem which I believe can be solved by PyMuPDF.

Situation

I have a pdf file (a thesis) that contains text and some bitmap images.
For all of these bitmaps, vector images exist which are saved in individual pdf files.
The page size of the "pdf images" is exactly the dimensions of the respective bitmaps in the original file.

Goal

Replace the bitmaps with the respective vectors.
Keep the "pdf image" at the same positions as the respective bitmap.

From what I read in other discussions so far, one way might be to extract all individual shapes and texts from the "pdf images" and then commit and position them in the original file.

However, I am wondering if there might be a more elegant way of inserting the whole "pdf image" as a whole. E.g. to not accidentially oversee some shapes while extracting.

Any help is much appreciated :-)
Cheers
Alex

Answered by JorjMcKie

Jan 3, 2023

Let me see if I got you.

On page n of your thesis, there is a bitmap image, shown in some bbox of that page (boundary box).
For that bitmap you have a page in a separate PDF, which shows that same picture, but in a vector graphic (SVG?) version.
You want to get rid of the bitmap and instead the SVG image been displayed ... in that same bbox.

So far correct?

If yes:

Determine the xref of the bitmap on the thesis file
Determine the bbox of the bitmap
Delete the bitmap
Insert the vector graphic PDF page (or its vector graphic sub rectangle) in the bbox of the bitmap

Ad 1. & 2.
Make a list of images of page n of thesis:

from pprint imprt pprint
imglist = page.get_images()
pprint(imglist)
[(…

View full answer

JorjMcKie · 2023-01-03T13:05:50Z

JorjMcKie
Jan 3, 2023
Maintainer

Let me see if I got you.

On page n of your thesis, there is a bitmap image, shown in some bbox of that page (boundary box).
For that bitmap you have a page in a separate PDF, which shows that same picture, but in a vector graphic (SVG?) version.
You want to get rid of the bitmap and instead the SVG image been displayed ... in that same bbox.

So far correct?

If yes:

Determine the xref of the bitmap on the thesis file
Determine the bbox of the bitmap
Delete the bitmap
Insert the vector graphic PDF page (or its vector graphic sub rectangle) in the bbox of the bitmap

Ad 1. & 2.
Make a list of images of page n of thesis:

from pprint imprt pprint
imglist = page.get_images()
pprint(imglist)
[(1114, 1126, 1200, 1200, 8, 'DeviceRGB', '', 'Im1', 'FlateDecode')]
# 1114 is the xref
# get the bbox of that image on the page:
pprint(page.get_image_rects(1114))
[Rect(240.00100708007812, 88.93600463867188, 540.0009765625, 388.9360046386719)]
# this is a list because same image may appear multiple times on a page
# so bbox=fitz.Rect(240.00100708007812, 88.93600463867188, 540.0009765625, 388.9360046386719)

Ad 3. Delete bitmap:
Consult this script. In the next PyMuPDF version, this will be easier, just a page method page.delete_image(xref).

Ad 4. Insert the vector PDF page. Assuming page 0 of that PDF is the vector image:

src = fitz.open("vector.pdf")
page.show_pdf_page(bbox, src, 0)
# that's it!

If the vector image in vector.pdf does not cover the full page, but a sub-rectangle, say subrect, do this:

page.show_pdf_page(bbox, src, 0, clip=subrect)

2 replies

sfadschm Jan 4, 2023
Author

This is exactly what I tried to describe.

The missing piece was the show_pdf_page function, which I did not find for some reason.

Many thanks for the quick and detailed response and lots of respect for how you maintain this project and all the discussions!

JorjMcKie Jan 4, 2023
Maintainer

Thanks for the feedback!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replacing bitmaps with PDF drawings/shapes #2155

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Replacing bitmaps with PDF drawings/shapes #2155

Uh oh!

sfadschm Jan 2, 2023

Replies: 1 comment · 2 replies

Uh oh!

JorjMcKie Jan 3, 2023 Maintainer

Uh oh!

sfadschm Jan 4, 2023 Author

Uh oh!

JorjMcKie Jan 4, 2023 Maintainer

sfadschm
Jan 2, 2023

Replies: 1 comment 2 replies

JorjMcKie
Jan 3, 2023
Maintainer

sfadschm Jan 4, 2023
Author

JorjMcKie Jan 4, 2023
Maintainer