Is it possible to replace an image changing the stream? #924
-
Hi, this is my new post in GitHub. I expect my question won't be so silly. I have a thousands of pdf archives in the same format with exactly the same image attached to all of them. My goal is to replace this image with another one based on a parameter I have in another file with the same name, so all those images with which I want to replace this single one, have different caractheristics, some are png, others are jpg, and sizes are different. But my first struggle I have arrived with is that I cannot replace the stream obj code, I have proven several things, like changing directly the bytes code in the stream section of the pdf I just got the image changed by a black pixelated image instead the image I tried to replace for. Despite that I need to adjust the sizes and I cant even get my first problem fixed Until today I found this project and I realized that my problem doesn't come from my method, but from some sort of conversion between the obj streams. But I really doesn't have any idea where I could strive on, I really feel lost. These are my objects, as you can see
The fonts are this, I suposse that in some way mean something for the stream codification:
The image I wanna replace is the object 5: img01
I am actually using the next code to try to replace the image, it does, but as I said I just get a black pixelated image over an over again:
If you could orient me I would be really alliviated, I am tired of walking without a map, thanks you so much |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
Presumably, the easiest way for you to walk is using redaction annotations. You did not mention, whether the to-be-replaced image only lives on a certain page or on several simultaneously. The latter would be a slight complication, so lets assume for the moment, that image lives on one page, which you already know.
The technical details of above steps would be no longer your business. Here is a snippet: doc = fitz.open(<your pdf filename>)
page = doc[pno] # read the page at page number pno
img_list = page.get_images(full=True) # a list of all images on that page
# select the item referencing the old image (hope you know how to identify it!)
# Each item looks like: (1315, 0, 1945, 1004, 8, 'DeviceRGB', '', 'Im1', 'DCTDecode', 0)
# first entry is xref, etc.
bbox = page.get_image_bbox(item) # where the old image lives
ra = page.addRedactAnnot(bbox) # mark that rectangle as to-be-deleted
page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_REMOVE) # delete old image
page.insert_image(bbox, filename=<imagefile>) # insert new image That is about it - with the above asusmptions. |
Beta Was this translation helpful? Give feedback.
-
Oh god, that was great i didn't even know what an annotation was, You made my day man. I am so happy. BTW yeah the image always lives in the same page, so this gonna be super easy for now, or I expect that. I'll post here any doubt. I owe you one. Thank you very much!!! I post the code here for someone who has the same trouble
|
Beta Was this translation helpful? Give feedback.
-
Hey, dropping my approach here - the above with redaction worked, but also cleared the clipping shape of the image. The below one keeps it (note that I had CMYK colorspace images, and wanted to keep it that way, so printing studio is happy, thus the conversions): Load doc and page is in above, then:
Now make whatever modding with Gimp, then load the modded back
Aand now allegedly it would be as simple as
but maybe I have an older pymupdf which was throwing an exception on missing doc.is_image (in newer source it is doc.xref_is_image, so probably fixed), so I followed the implementation of replace_image:
And finally save
Inspecting with mutool, the old image is still in place, but not used. So if you want to save space, probably this is not the good / full way. But if you want to replace an image quickly, leaving other visuals as-is (say for printing), then can be fine. Great tool, thanks a lot! Docs are great too. |
Beta Was this translation helpful? Give feedback.
Presumably, the easiest way for you to walk is using redaction annotations. You did not mention, whether the to-be-replaced image only lives on a certain page or on several simultaneously. The latter would be a slight complication, so lets assume for the moment, that image lives on one page, which you already know.
The principle is this:
The technical details of above steps would be no longer your business. Here is a snippet: