|
1 | 1 | # Metadata
|
2 | 2 |
|
| 3 | +PDF files can have two types of metadata: "Regular" and XMP ones. They can both exist at the same time. |
| 4 | + |
3 | 5 | ## Reading metadata
|
4 | 6 |
|
5 | 7 | ```python
|
@@ -104,3 +106,60 @@ writer.metadata = None
|
104 | 106 | with open("meta-pdf.pdf", "wb") as f:
|
105 | 107 | writer.write(f)
|
106 | 108 | ```
|
| 109 | + |
| 110 | +## Reading XMP metadata |
| 111 | + |
| 112 | +```python |
| 113 | +from pypdf import PdfReader |
| 114 | + |
| 115 | +reader = PdfReader("example.pdf") |
| 116 | + |
| 117 | +meta = reader.xmp_metadata |
| 118 | +if meta: |
| 119 | + print(meta.dc_title) |
| 120 | + print(meta.dc_description) |
| 121 | + print(meta.xmp_create_date) |
| 122 | +``` |
| 123 | + |
| 124 | +## Modifying XMP metadata |
| 125 | + |
| 126 | +Modifying XMP metadata is a bit more complicated. |
| 127 | + |
| 128 | +As an example, we want to add the following PDF/UA identifier section to the XMP metadata: |
| 129 | + |
| 130 | +```xml |
| 131 | +<rdf:Description rdf:about="" xmlns:pdfuaid="http://www.aiim.org/pdfua/ns/id/"> |
| 132 | + <pdfuaid:part>1</pdfuaid:part> |
| 133 | +</rdf:Description> |
| 134 | +``` |
| 135 | + |
| 136 | +This could be written like this: |
| 137 | + |
| 138 | +```python |
| 139 | +from pypdf import PdfWriter |
| 140 | + |
| 141 | +writer = PdfWriter(clone_from="example.pdf") |
| 142 | + |
| 143 | +metadata = writer.xmp_metadata |
| 144 | +assert metadata # Ensure that it is not `None`. |
| 145 | +rdf_root = metadata.rdf_root |
| 146 | +xmp_meta = rdf_root.parentNode |
| 147 | +xmp_document = xmp_meta.parentNode |
| 148 | + |
| 149 | +# Please note that without a text node, the corresponding elements might |
| 150 | +# be omitted completely. |
| 151 | +pdfuaid_description = xmp_document.createElement("rdf:Description") |
| 152 | +pdfuaid_description.setAttribute("rdf:about", "") |
| 153 | +pdfuaid_description.setAttribute("xmlns:pdfuaid", "http://www.aiim.org/pdfua/ns/id/") |
| 154 | +pdfuaid_part = xmp_document.createElement("pdfuaid:part") |
| 155 | +pdfuaid_part_text = xmp_document.createTextNode("1") |
| 156 | +pdfuaid_part.appendChild(pdfuaid_part_text) |
| 157 | +pdfuaid_description.appendChild(pdfuaid_part) |
| 158 | +rdf_root.appendChild(pdfuaid_description) |
| 159 | + |
| 160 | +metadata.stream.set_data(xmp_document.toxml().encode("utf-8")) |
| 161 | + |
| 162 | +writer.write("output.pdf") |
| 163 | +``` |
| 164 | + |
| 165 | +For further details on modifying the structure, please refer to {py:mod}`xml.dom.minidom`. |
0 commit comments