Replies: 1 comment 2 replies
-
|
As pypdf is written, this will only work for filters which are image-only and thus do not rely on external libraries like Pillow or jbig2dec. If you do not care about using internal APIs, something like this works: from pypdf import PdfWriter
from pypdf.generic import DecodedStreamObject, EncodedStreamObject
writer = PdfWriter(clone_from="resources/crazyones.pdf")
for index, obj in enumerate(writer._objects):
if not isinstance(obj, EncodedStreamObject):
continue
new_stream = DecodedStreamObject()
new_stream.set_data(obj.get_data())
for key, value in dict(obj).items():
if key not in {"/Filter"}:
new_stream[key] = value
writer._objects[index] = new_stream
writer.write("out.pdf")I have not tested this with inline images or similar though, and relying on internal APIs is not recommended.
It should not be necessary to use this explicitly,
I am not aware of this and it is rather uncommon to have such a file - except for explicit testing purposes. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
What is a good way to do the following:
Iterate over all streams (including non-referenced) and decode them using its filter to produce the original non-encoded data.
Once streams are decoded, any inline images are also decoded. (This part is more difficult.)
Then save as a new PDF.
Python code in this discussion would be ideal. Preferably using this function as it already exists:
pypdf/pypdf/filters.py
Line 766 in 219153e
Optionally, are there any files that contain all or most of the filter types and inline images to test this with?
Beta Was this translation helpful? Give feedback.
All reactions