Skip to content

Commit d57627d

Browse files
DOC: Document new attachment functionality and allow updating content (#3379)
1 parent 9dcf60f commit d57627d

File tree

5 files changed

+93
-31
lines changed

5 files changed

+93
-31
lines changed

docs/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ You can contribute to `pypdf on GitHub <https://github.com/py-pdf/pypdf>`_.
2828
user/extract-text
2929
user/post-processing-in-text-extraction
3030
user/extract-images
31-
user/extract-attachments
31+
user/handle-attachments
3232
user/encryption-decryption
3333
user/merging-pdfs
3434
user/cropping-and-transforming

docs/user/extract-attachments.md

Lines changed: 0 additions & 30 deletions
This file was deleted.

docs/user/handle-attachments.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Handle Attachments
2+
3+
PDF documents can contain attachments, from time to time named embedded file as well.
4+
5+
## Retrieve Attachments
6+
7+
Attachments have a name, but it might not be unique. For this reason, the value of `reader.attachments["attachment_name"]`
8+
is a list.
9+
10+
You can extract all attachments like this:
11+
12+
```python
13+
from pypdf import PdfReader
14+
15+
reader = PdfReader("example.pdf")
16+
17+
for name, content_list in reader.attachments.items():
18+
for i, content in enumerate(content_list):
19+
with open(f"{name}-{i}", "wb") as fp:
20+
fp.write(content)
21+
```
22+
23+
Alternatively, you can retrieve them in an object-oriented fashion if you need
24+
further details for these files:
25+
26+
```python
27+
from pypdf import PdfReader
28+
29+
reader = PdfReader("example.pdf")
30+
31+
for attachment in reader.attachment_list:
32+
print(attachment.name, attachment.alternative_name, attachment.content)
33+
```
34+
35+
## Add Attachments
36+
37+
To add a new attachment, use the following code:
38+
39+
```python
40+
from pypdf import PdfWriter
41+
42+
writer = PdfWriter(clone_from="example.pdf")
43+
writer.add_attachment(filename="test.txt", data=b"Hello World!")
44+
```
45+
46+
As you can see, the basic attachment properties are its name and content. If you
47+
want to modify further properties of it, the returned object provides corresponding
48+
setters:
49+
50+
```python
51+
import datetime
52+
import hashlib
53+
54+
from pypdf import PdfWriter
55+
from pypdf.generic import create_string_object, ByteStringObject, NameObject, NumberObject
56+
57+
58+
writer = PdfWriter(clone_from="example.pdf")
59+
embedded_file = writer.add_attachment(filename="test.txt", data=b"Hello World!")
60+
61+
embedded_file.size = NumberObject(len(b"Hello World!"))
62+
embedded_file.alternative_name = create_string_object("test1.txt")
63+
embedded_file.description = create_string_object("My test file")
64+
embedded_file.subtype = NameObject("/text/plain")
65+
embedded_file.checksum = ByteStringObject(hashlib.md5(b"Hello World!").digest())
66+
embedded_file.modification_date = datetime.datetime.now(tz=datetime.timezone.utc)
67+
# embedded_file.content = "My new content."
68+
69+
embedded_file.write("output.pdf")
70+
```
71+
72+
The same functionality is available if you iterate over the attachments of a writer
73+
using `writer.attachment_list`.

pypdf/generic/_files.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,13 @@ def content(self) -> bytes:
204204
"""Retrieve the actual file content."""
205205
return self._embedded_file.get_data()
206206

207+
@content.setter
208+
def content(self, value: str | bytes) -> None:
209+
"""Set the file content."""
210+
if isinstance(value, str):
211+
value = value.encode("latin-1")
212+
self._embedded_file.set_data(value)
213+
207214
@property
208215
def size(self) -> int | None:
209216
"""Retrieve the size of the uncompressed file in bytes."""

tests/generic/test_files.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -263,6 +263,18 @@ def test_embedded_file_subtype_setter():
263263
assert embedded_file.subtype == "/application#2Fjson"
264264

265265

266+
def test_embedded_file_content_setter():
267+
writer = PdfWriter()
268+
embedded_file = writer.add_attachment("test.txt", b"content")
269+
assert embedded_file.content == b"content"
270+
271+
embedded_file.content = b"Hello World!"
272+
assert embedded_file.content == b"Hello World!"
273+
274+
embedded_file.content = "Lorem ipsum dolor sit amet"
275+
assert embedded_file.content == b"Lorem ipsum dolor sit amet"
276+
277+
266278
def test_embedded_file_size_setter():
267279
writer = PdfWriter()
268280
embedded_file = writer.add_attachment("test.txt", b"content")

0 commit comments

Comments
 (0)