Why editing content is harder than inserting new content in a PDF #2685
-
I am trying to edit the content in existing PDF ### 1. Replacing existing texts or injecting texts in existing sentense using these method
add_redact_annot()
apply_redactions() Q: I am not sure if that is because the char of the font is not embedded in the PDF(the saved PDF compressed the font, so it will remove any useless char). ### 2. Replacing image Q: Is there any way to remove it completely? or am I missing some methods to run after delete_image? ### 3. Relocating text and image Q: is it still hard to edit the text position nowadays? ### 4. Adding custom Metadata in PDF set_metadata({"producer":"test","demo":"test"})
it shows:
bad dict key(s): {'demo'} Q: is it the limitation of the standard PDF format? ### 5. About the comment in binary format Q: is it possible to add some comment/heading in binary format? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
I understand your frustration. What you are trying to do is ... sorry ... impossible. All your issues go back to this:
The above has led to converters: PDF -> Word for example. This forgets the PDF, then uses an office product to work with the result, then convert back to PDF. In the end, this is the only viable solution. Other questions:
|
Beta Was this translation helpful? Give feedback.
A page can have multiple
/Contents
objects. So the definition in the page object either looks something like/Contents 7 0 R
or/Contents [7 0 R 8 0 R ...]
. If you executepage.clean_contents()
, multiple objects will be conctenated, the result replacing the previous lot.So much for some background.
page.clean_contents()
does very much more than this concatenation: it also cleans and standardizes the syntax plus it makes sure that all resources used in the contents are in 1:1 correspondence with the objects named in the page's definition under/Resources
. This applies to images, fonts and several more.I recommend you first execute
page.clean_contents()
. Thenpage.get_contents()
will deli…