|
| 1 | +# Handle Attachments |
| 2 | + |
| 3 | +PDF documents can contain attachments, from time to time named embedded file as well. |
| 4 | + |
| 5 | +## Retrieve Attachments |
| 6 | + |
| 7 | +Attachments have a name, but it might not be unique. For this reason, the value of `reader.attachments["attachment_name"]` |
| 8 | +is a list. |
| 9 | + |
| 10 | +You can extract all attachments like this: |
| 11 | + |
| 12 | +```python |
| 13 | +from pypdf import PdfReader |
| 14 | + |
| 15 | +reader = PdfReader("example.pdf") |
| 16 | + |
| 17 | +for name, content_list in reader.attachments.items(): |
| 18 | + for i, content in enumerate(content_list): |
| 19 | + with open(f"{name}-{i}", "wb") as fp: |
| 20 | + fp.write(content) |
| 21 | +``` |
| 22 | + |
| 23 | +Alternatively, you can retrieve them in an object-oriented fashion if you need |
| 24 | +further details for these files: |
| 25 | + |
| 26 | +```python |
| 27 | +from pypdf import PdfReader |
| 28 | + |
| 29 | +reader = PdfReader("example.pdf") |
| 30 | + |
| 31 | +for attachment in reader.attachment_list: |
| 32 | + print(attachment.name, attachment.alternative_name, attachment.content) |
| 33 | +``` |
| 34 | + |
| 35 | +## Add Attachments |
| 36 | + |
| 37 | +To add a new attachment, use the following code: |
| 38 | + |
| 39 | +```python |
| 40 | +from pypdf import PdfWriter |
| 41 | + |
| 42 | +writer = PdfWriter(clone_from="example.pdf") |
| 43 | +writer.add_attachment(filename="test.txt", data=b"Hello World!") |
| 44 | +``` |
| 45 | + |
| 46 | +As you can see, the basic attachment properties are its name and content. If you |
| 47 | +want to modify further properties of it, the returned object provides corresponding |
| 48 | +setters: |
| 49 | + |
| 50 | +```python |
| 51 | +import datetime |
| 52 | +import hashlib |
| 53 | + |
| 54 | +from pypdf import PdfWriter |
| 55 | +from pypdf.generic import create_string_object, ByteStringObject, NameObject, NumberObject |
| 56 | + |
| 57 | + |
| 58 | +writer = PdfWriter(clone_from="example.pdf") |
| 59 | +embedded_file = writer.add_attachment(filename="test.txt", data=b"Hello World!") |
| 60 | + |
| 61 | +embedded_file.size = NumberObject(len(b"Hello World!")) |
| 62 | +embedded_file.alternative_name = create_string_object("test1.txt") |
| 63 | +embedded_file.description = create_string_object("My test file") |
| 64 | +embedded_file.subtype = NameObject("/text/plain") |
| 65 | +embedded_file.checksum = ByteStringObject(hashlib.md5(b"Hello World!").digest()) |
| 66 | +embedded_file.modification_date = datetime.datetime.now(tz=datetime.timezone.utc) |
| 67 | +# embedded_file.content = "My new content." |
| 68 | + |
| 69 | +embedded_file.write("output.pdf") |
| 70 | +``` |
| 71 | + |
| 72 | +The same functionality is available if you iterate over the attachments of a writer |
| 73 | +using `writer.attachment_list`. |
0 commit comments