Skip to content

Commit 5252c2f

Browse files
DOC: Document how to read and modify XMP metadata (#3383)
Closes #3325.
1 parent d5d1964 commit 5252c2f

File tree

1 file changed

+59
-0
lines changed

1 file changed

+59
-0
lines changed

docs/user/metadata.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# Metadata
22

3+
PDF files can have two types of metadata: "Regular" and XMP ones. They can both exist at the same time.
4+
35
## Reading metadata
46

57
```python
@@ -104,3 +106,60 @@ writer.metadata = None
104106
with open("meta-pdf.pdf", "wb") as f:
105107
writer.write(f)
106108
```
109+
110+
## Reading XMP metadata
111+
112+
```python
113+
from pypdf import PdfReader
114+
115+
reader = PdfReader("example.pdf")
116+
117+
meta = reader.xmp_metadata
118+
if meta:
119+
print(meta.dc_title)
120+
print(meta.dc_description)
121+
print(meta.xmp_create_date)
122+
```
123+
124+
## Modifying XMP metadata
125+
126+
Modifying XMP metadata is a bit more complicated.
127+
128+
As an example, we want to add the following PDF/UA identifier section to the XMP metadata:
129+
130+
```xml
131+
<rdf:Description rdf:about="" xmlns:pdfuaid="http://www.aiim.org/pdfua/ns/id/">
132+
<pdfuaid:part>1</pdfuaid:part>
133+
</rdf:Description>
134+
```
135+
136+
This could be written like this:
137+
138+
```python
139+
from pypdf import PdfWriter
140+
141+
writer = PdfWriter(clone_from="example.pdf")
142+
143+
metadata = writer.xmp_metadata
144+
assert metadata # Ensure that it is not `None`.
145+
rdf_root = metadata.rdf_root
146+
xmp_meta = rdf_root.parentNode
147+
xmp_document = xmp_meta.parentNode
148+
149+
# Please note that without a text node, the corresponding elements might
150+
# be omitted completely.
151+
pdfuaid_description = xmp_document.createElement("rdf:Description")
152+
pdfuaid_description.setAttribute("rdf:about", "")
153+
pdfuaid_description.setAttribute("xmlns:pdfuaid", "http://www.aiim.org/pdfua/ns/id/")
154+
pdfuaid_part = xmp_document.createElement("pdfuaid:part")
155+
pdfuaid_part_text = xmp_document.createTextNode("1")
156+
pdfuaid_part.appendChild(pdfuaid_part_text)
157+
pdfuaid_description.appendChild(pdfuaid_part)
158+
rdf_root.appendChild(pdfuaid_description)
159+
160+
metadata.stream.set_data(xmp_document.toxml().encode("utf-8"))
161+
162+
writer.write("output.pdf")
163+
```
164+
165+
For further details on modifying the structure, please refer to {py:mod}`xml.dom.minidom`.

0 commit comments

Comments
 (0)