Is it possible to replace an image changing the stream? #924

Yiftach-Yaakov · 2021-02-27T09:49:46Z

Yiftach-Yaakov
Feb 27, 2021

Hi, this is my new post in GitHub.

I expect my question won't be so silly.

I have a thousands of pdf archives in the same format with exactly the same image attached to all of them. My goal is to replace this image with another one based on a parameter I have in another file with the same name, so all those images with which I want to replace this single one, have different caractheristics, some are png, others are jpg, and sizes are different. But my first struggle I have arrived with is that I cannot replace the stream obj code, I have proven several things, like changing directly the bytes code in the stream section of the pdf I just got the image changed by a black pixelated image instead the image I tried to replace for. Despite that I need to adjust the sizes and I cant even get my first problem fixed

Until today I found this project and I realized that my problem doesn't come from my method, but from some sort of conversion between the obj streams. But I really doesn't have any idea where I could strive on, I really feel lost.

These are my objects, as you can see


<<
  /Type /Page
  /MediaBox [ 0 0 612 792 ]
  /Resources <<
    /Font <<
      /F1 1 0 R
      /F2 2 0 R
      /F3 3 0 R
    >>
    /XObject <<
      /img0 4 0 R
      /img1 5 0 R
    >>
  >>
  /Contents 6 0 R
  /Parent 7 0 R
>>

The fonts are this, I suposse that in some way mean something for the stream codification:

<</Type/Font/Subtype/Type1/BaseFont/Helvetica/Encoding/WinAnsiEncoding>>
<</Type/Font/Subtype/Type1/BaseFont/Helvetica-BoldOblique/Encoding/WinAnsiEncoding>>
<</Type/Font/Subtype/Type1/BaseFont/Helvetica-Bold/Encoding/WinAnsiEncoding>>

The image I wanna replace is the object 5: img01

Type = ('name', '/XObject')
Subtype = ('name', '/Image')
Width = ('int', '628')
Height = ('int', '182')
Length = ('int', '12078')
ColorSpace = ('name', '/DeviceRGB')
BitsPerComponent = ('int', '8')
Filter = ('name', '/FlateDecode')

I am actually using the next code to try to replace the image, it does, but as I said I just get a black pixelated image over an over again:

pix=fitz.Pixmap("Image.jpg")
pix = fitz.Pixmap(pix, 0)
pix = fitz.Pixmap(fitz.Colorspace(fitz.CS_RGB),pix)
pix = fitz.Pixmap(pix,628,182,fitz.IRect(0,0,628, 182))
doc.update_stream(5, pix.getImageData())

If you could orient me I would be really alliviated, I am tired of walking without a map, thanks you so much

Answered by JorjMcKie

Feb 27, 2021

Presumably, the easiest way for you to walk is using redaction annotations. You did not mention, whether the to-be-replaced image only lives on a certain page or on several simultaneously. The latter would be a slight complication, so lets assume for the moment, that image lives on one page, which you already know.
The principle is this:

Determine the rectangle occupied by the image on that page
Add a redaction annotation with the image's rectangle
"Apply" (i.e. execute) the redaction, which effectively will remove the old image
Insert your new image in the old image's rectangle

The technical details of above steps would be no longer your business. Here is a snippet:

doc = fitz.open(<your

View full answer

JorjMcKie · 2021-02-27T11:18:31Z

JorjMcKie
Feb 27, 2021
Maintainer

Presumably, the easiest way for you to walk is using redaction annotations. You did not mention, whether the to-be-replaced image only lives on a certain page or on several simultaneously. The latter would be a slight complication, so lets assume for the moment, that image lives on one page, which you already know.
The principle is this:

Determine the rectangle occupied by the image on that page
Add a redaction annotation with the image's rectangle
"Apply" (i.e. execute) the redaction, which effectively will remove the old image
Insert your new image in the old image's rectangle

The technical details of above steps would be no longer your business. Here is a snippet:

doc = fitz.open(<your pdf filename>)
page = doc[pno] # read the page at page number pno
img_list = page.get_images(full=True) # a list of all images on that page
# select the item referencing the old image (hope you know how to identify it!)
# Each item looks like: (1315, 0, 1945, 1004, 8, 'DeviceRGB', '', 'Im1', 'DCTDecode', 0)
# first entry is xref, etc.
bbox = page.get_image_bbox(item)  # where the old image lives
ra = page.addRedactAnnot(bbox)  # mark that rectangle as to-be-deleted
page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_REMOVE)  # delete old image
page.insert_image(bbox, filename=<imagefile>)  # insert new image

That is about it - with the above asusmptions.
If the old image lives on multiple page,a few issues need to be sorted out ...

1 reply

JorjMcKie Feb 27, 2021
Maintainer

After finishing the image exchange, the PDF should be saved using certain options, that also cause the physical removal of all traces of the old image ...

Yiftach-Yaakov · 2021-02-27T20:50:24Z

Yiftach-Yaakov
Feb 27, 2021
Author

Oh god, that was great i didn't even know what an annotation was, You made my day man. I am so happy.

BTW yeah the image always lives in the same page, so this gonna be super easy for now, or I expect that. I'll post here any doubt. I owe you one.

Thank you very much!!!

I post the code here for someone who has the same trouble

doc = fitz.open("old archive.pdf")#This creates the Document object doc
factura = doc[0] #my page, factura is invoice in spanish
img_list = factura.get_images(full=True)
bbox = factura.get_image_bbox(img_list[1])  # where the old image lives, my image is the second one 
ra = factura.addRedactAnnot(bbox)  # mark that rectangle as to-be-deleted
factura.apply_redactions(images=fitz.PDF_REDACT_IMAGE_REMOVE)  # delete old image
factura.insert_image(bbox, filename="Citibanamex.jpg")  # insert new image
doc.save("modified archive.pdf")

2 replies

JorjMcKie Feb 27, 2021
Maintainer

Thanks for the compliments!
As I wrote: you should save the document using options to also physically get rid of the deleted image and to compress the inserted new one: doc.save("...", garbage=3, deflate=True).

Another aspect to watch out for future uses: redaction annotations do not pay attention, what their rectangle covers. Whatever it is, that overlaps into the rectangle (text, hyperlinks) will also be deleted. Overlapping other images will even be deleted completely because of the images=... argument you are using.

Yiftach-Yaakov Feb 27, 2021
Author

Thanks for the compliments!
As I wrote: you should save the document using options to also physically get rid of the deleted image and to compress the inserted new one: doc.save("...", garbage=3, deflate=True).

Another aspect to watch out for future uses: redaction annotations do not pay attention, what their rectangle covers. Whatever it is, that overlaps into the rectangle (text, hyperlinks) will also be deleted. Overlapping other images will even be deleted completely because of the images=... argument you are using.

Thank you , I will take it into account, but for me I think it will be not a problem, as you can see there are a lot of space around the image I want to replace

robinp · 2023-10-11T08:01:29Z

robinp
Oct 11, 2023

Hey, dropping my approach here - the above with redaction worked, but also cleared the clipping shape of the image. The below one keeps it (note that I had CMYK colorspace images, and wanted to keep it that way, so printing studio is happy, thus the conversions):

Load doc and page is in above, then:

p = fitz.Pixmap(doc, 6) # or whatever xref id
q = fitz.Pixmap(fitz.Colorspace(fitz.CS_RGB), p)  # can save jpg only in RGB format, this was DeviceCMYK
q.save("6-rgb.jpg")

Now make whatever modding with Gimp, then load the modded back

r = fitz.Pixmap("6-rgb-mod.jpg")
s = fitz.Pixmap(fitz.Colorspace(fitz.CS_CMYK), r)

Aand now allegedly it would be as simple as

page.replace_image(6, pixmap=s)

but maybe I have an older pymupdf which was throwing an exception on missing doc.is_image (in newer source it is doc.xref_is_image, so probably fixed), so I followed the implementation of replace_image:

new_xref = page.insert_image(page.rect, pixmap=s)
doc.xref_copy(new_xref, 6)
last_contents_xref = page.get_contents()[-1]
doc.update_stream(last_contents_xref, b" ")

And finally save

doc.save("output.pdf", garbage=3, deflate=True)

Inspecting with mutool, the old image is still in place, but not used. So if you want to save space, probably this is not the good / full way. But if you want to replace an image quickly, leaving other visuals as-is (say for printing), then can be fine.

Great tool, thanks a lot! Docs are great too.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is it possible to replace an image changing the stream? #924

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is it possible to replace an image changing the stream? #924

Uh oh!

Yiftach-Yaakov Feb 27, 2021

Replies: 3 comments · 3 replies

Uh oh!

JorjMcKie Feb 27, 2021 Maintainer

Uh oh!

JorjMcKie Feb 27, 2021 Maintainer

Uh oh!

Uh oh!

Yiftach-Yaakov Feb 27, 2021 Author

Uh oh!

JorjMcKie Feb 27, 2021 Maintainer

Uh oh!

Yiftach-Yaakov Feb 27, 2021 Author

Uh oh!

robinp Oct 11, 2023

Yiftach-Yaakov
Feb 27, 2021

Replies: 3 comments 3 replies

JorjMcKie
Feb 27, 2021
Maintainer

JorjMcKie Feb 27, 2021
Maintainer

Yiftach-Yaakov
Feb 27, 2021
Author

JorjMcKie Feb 27, 2021
Maintainer

Yiftach-Yaakov Feb 27, 2021
Author

robinp
Oct 11, 2023