Skip to content

Commit 01ffe8d

Browse files
committed
PI: Don't load entire file into memory when passed file name
This halves allocated memory when doing a simple PdfWriter(clone_from=«str») I can't just close the self.stream in `__del__` because for some strange reason the unit tests mark it as unflagged even after the test block ends. Something about `__del__` finalizers being run on a second pass while `weakref.finalize()` is run on the first pass.
1 parent 18bd9ec commit 01ffe8d

File tree

1 file changed

+10
-3
lines changed

1 file changed

+10
-3
lines changed

pypdf/_reader.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,8 @@
2929

3030
import os
3131
import re
32-
from io import BytesIO, UnsupportedOperation
32+
import weakref
33+
from io import BytesIO, FileIO, UnsupportedOperation
3334
from pathlib import Path
3435
from typing import (
3536
Any,
@@ -121,9 +122,11 @@ def __init__(
121122
"It may not be read correctly.",
122123
__name__,
123124
)
125+
124126
if isinstance(stream, (str, Path)):
125-
with open(stream, "rb") as fh:
126-
stream = BytesIO(fh.read())
127+
stream = FileIO(stream, "rb")
128+
weakref.finalize(self, stream.close)
129+
127130
self.read(stream)
128131
self.stream = stream
129132

@@ -153,6 +156,10 @@ def __init__(
153156
elif password is not None:
154157
raise PdfReadError("Not encrypted file")
155158

159+
def close(self) -> None:
160+
"""Close the underlying file handle"""
161+
self.stream.close()
162+
156163
@property
157164
def root_object(self) -> DictionaryObject:
158165
"""Provide access to "/Root". standardized with PdfWriter."""

0 commit comments

Comments
 (0)