-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
I'm creating PDF documents which can be a maximum size. I add pages to a PdfWriter object then when it goes over the maximum size I remove the last added page and write to a final document. I noticed my documents were over my maximum which was causing issues in my system. The examples here shows a small increase but with more complex pages with images, megabytes of data can be left in the final output.
Environment
Which environment were you using when you encountered the problem? This occurs on Windows and Linux and both 5.7.0 and 5.9.0.
$ python -m platform
Linux-5.10.0-23-cloud-amd64-x86_64-with-glibc2.31
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.9.0, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=none
Code + PDF
This is a minimal, complete example that shows the issue:
import io
from pypdf import PdfReader
from pypdf import PdfWriter
writer = PdfWriter()
stream = io.BytesIO()
page1 = PdfReader("minimal-document.pdf")
writer.append(page1)
writer.write(stream)
print (f"1 page:\t\t\t{stream.tell()} bytes")
stream = io.BytesIO()
page2 = PdfReader("minimal-document.pdf")
writer.append(page2)
writer.write(stream)
print (f"2 pages:\t\t{stream.tell()} bytes")
writer.remove_page(len(writer.pages) - 1, True)
writer.compress_identical_objects()
stream = io.BytesIO()
writer.write(stream)
print (f"1 page (again):\t{stream.tell()} bytes")
First run with: https://github.com/py-pdf/sample-files/blob/main/001-trivial/minimal-document.pdf
Second run with: https://github.com/py-pdf/sample-files/blob/main/011-google-doc-document/google-doc-document.pdf
Traceback
First run:
1 page: 17387 bytes
2 pages: 34412 bytes
1 page (again): 18077 bytes
Second run:
1 page: 80356 bytes
2 pages: 160178 bytes
1 page (again): 94343 bytes