Skip to content

Commit d587391

Browse files
committed
Update xml session
1 parent 650f362 commit d587391

File tree

1 file changed

+143
-15
lines changed

1 file changed

+143
-15
lines changed

10_Working_with_XML.md

Lines changed: 143 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ the decoder follows straight after the opening of a file, a website or an OAI-PM
1111

1212
Lets start with this simple record
1313

14-
```XML
14+
```xml
1515
<?xml version="1.0" encoding="utf-8"?>
1616
<record>
1717
<title>GRM</title>
@@ -21,10 +21,17 @@ Lets start with this simple record
2121
```
2222

2323

24-
Lets open it:
24+
Lets open it with the following Flux:
2525

26-
https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+as-records%0A%7C+print%0A%3B&data=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22utf-8%22%3F%3E%0A%3Crecord%3E%0A++%3Ctitle%3EGRM%3C/title%3E%0A++%3Cauthor%3ESibille+Berg%3C/author%3E%0A++%3CdatePublished%3E2019%3C/datePublished%3E%0A%3C/record%3E
26+
```text
27+
inputFile
28+
| open-file
29+
| as-records
30+
| print
31+
;
32+
```
2733

34+
[See it here in the Playground.](https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+as-records%0A%7C+print%0A%3B&data=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22utf-8%22%3F%3E%0A%3Crecord%3E%0A++%3Ctitle%3EGRM%3C/title%3E%0A++%3Cauthor%3ESibille+Berg%3C/author%3E%0A++%3CdatePublished%3E2019%3C/datePublished%3E%0A%3C/record%3E)
2835

2936
Next lets decode the file and encode it as Yaml.
3037

@@ -33,7 +40,17 @@ Handlers a specific helpers that decode xml in a certain way, based on the metad
3340

3441
For now we need the `handle-generic-xml` function.
3542

36-
https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+decode-xml%0A%7C+handle-generic-xml%0A%7C+encode-yaml%0A%7C+print%0A%3B&data=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22utf-8%22%3F%3E%0A%3Crecord%3E%0A++%3Ctitle%3EGRM%3C/title%3E%0A++%3Cauthor%3ESibille+Berg%3C/author%3E%0A++%3CdatePublished%3E2019%3C/datePublished%3E%0A%3C/record%3E
43+
```text
44+
inputFile
45+
| open-file
46+
| decode-xml
47+
| handle-generic-xml
48+
| encode-yaml
49+
| print
50+
;
51+
```
52+
53+
[See it here in the Playground.](https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+decode-xml%0A%7C+handle-generic-xml%0A%7C+encode-yaml%0A%7C+print%0A%3B&data=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22utf-8%22%3F%3E%0A%3Crecord%3E%0A++%3Ctitle%3EGRM%3C/title%3E%0A++%3Cauthor%3ESibille+Berg%3C/author%3E%0A++%3CdatePublished%3E2019%3C/datePublished%3E%0A%3C/record%3E)
3754

3855

3956
You see this as result:
@@ -59,41 +76,142 @@ See:
5976
<title attribute="test">Test value</title>
6077
```
6178

62-
=>
79+
With the Flux:
80+
81+
```text
82+
inputFile
83+
| open-file
84+
| decode-xml
85+
| handle-generic-xml
86+
| encode-yaml
87+
| print
88+
;
89+
```
6390

6491
```yaml
6592
title:
6693
attribute: "test"
6794
value: "Test value"
6895
```
6996
70-
For our example above to get rid of the value subfields in the yaml we need to change the hirachy:
97+
[For our example above to get rid of the value subfields in the yaml we need to change the hirachy:](https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+decode-xml%0A%7C+handle-generic-xml%0A%7C+fix%28transformationFile%29%0A%7C+encode-yaml%0A%7C+print%0A%3B&transformation=move_field%28%22title.value%22%2C%22@title%22%29%0Amove_field%28%22@title%22%2C%22title%22%29%0Amove_field%28%22author.value%22%2C%22@author%22%29%0Amove_field%28%22@author%22%2C%22author%22%29%0Amove_field%28%22datePublished.value%22%2C%22@datePublished%22%29%0Amove_field%28%22@datePublished%22%2C%22datePublished%22%29&data=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22utf-8%22%3F%3E%0A%3Crecord%3E%0A++%3Ctitle%3EGRM%3C/title%3E%0A++%3Cauthor%3ESibille+Berg%3C/author%3E%0A++%3CdatePublished%3E2019%3C/datePublished%3E%0A%3C/record%3E)
98+
99+
100+
```
101+
inputFile
102+
| open-file
103+
| decode-xml
104+
| handle-generic-xml
105+
| fix(transformationFile)
106+
| encode-yaml
107+
| print
108+
;
109+
```
110+
111+
With Fix:
112+
```perl
113+
move_field("title.value","@title")
114+
move_field("@title","title")
115+
move_field("author.value","@author")
116+
move_field("@author","author")
117+
move_field("datePublished.value","@datePublished")
118+
move_field("@datePublished","datePublished")
119+
```
120+
121+
But when you encode it to XML the value subfields are also kept. Like this:
122+
123+
```text
124+
inputFile
125+
| open-file
126+
| decode-xml
127+
| handle-generic-xml
128+
| encode-xml
129+
| print
130+
;
131+
```
132+
Results in:
133+
134+
```xml
135+
<?xml version="1.0" encoding="UTF-8"?>
136+
<records>
137+
138+
<record>
139+
<title>
140+
<value>GRM</value>
141+
</title>
142+
<author>
143+
<value>Sibille Berg</value>
144+
</author>
145+
<datePublished>
146+
<value>2019</value>
147+
</datePublished>
148+
</record>
149+
150+
</records>
151+
```
71152

72-
https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+decode-xml%0A%7C+handle-generic-xml%0A%7C+fix%28transformationFile%29%0A%7C+encode-yaml%0A%7C+print%0A%3B&transformation=move_field%28%22title.value%22%2C%22@title%22%29%0Amove_field%28%22@title%22%2C%22title%22%29%0Amove_field%28%22author.value%22%2C%22@author%22%29%0Amove_field%28%22@author%22%2C%22author%22%29%0Amove_field%28%22datePublished.value%22%2C%22@datePublished%22%29%0Amove_field%28%22@datePublished%22%2C%22datePublished%22%29&data=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22utf-8%22%3F%3E%0A%3Crecord%3E%0A++%3Ctitle%3EGRM%3C/title%3E%0A++%3Cauthor%3ESibille+Berg%3C/author%3E%0A++%3CdatePublished%3E2019%3C/datePublished%3E%0A%3C/record%3E
153+
[Playground Link](https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+decode-xml%0A%7C+handle-generic-xml%0A%7C+encode-xml%0A%7C+print%0A%3B&data=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22utf-8%22%3F%3E%0A%3Crecord%3E%0A++%3Ctitle%3EGRM%3C/title%3E%0A++%3Cauthor%3ESibille+Berg%3C/author%3E%0A++%3CdatePublished%3E2019%3C/datePublished%3E%0A%3C/record%3E)
73154

155+
Keep in mind that xml elements can have attributes and a value. But also the encoder enable simple flat xml records too.
74156

75-
But when you encode it to XML
157+
You have to add a specific option when encoding xml: `| encode-xml(valueTag="value")` . Then it results in:
76158

77-
https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+decode-xml%0A%7C+handle-generic-xml%0A%7C+encode-xml%0A%7C+print%0A%3B&data=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22utf-8%22%3F%3E%0A%3Crecord%3E%0A++%3Ctitle%3EGRM%3C/title%3E%0A++%3Cauthor%3ESibille+Berg%3C/author%3E%0A++%3CdatePublished%3E2019%3C/datePublished%3E%0A%3C/record%3E
159+
```xml
160+
<?xml version="1.0" encoding="UTF-8"?>
161+
<records>
78162

79-
The value subfields are also kept. Keep in mind that xml elements can have attributes and a value. But also the encoder enable simple flat xml records too.
163+
<record>
164+
<title>GRM</title>
165+
<author>Sibille Berg</author>
166+
<datePublished>2019</datePublished>
167+
</record>
80168

81-
You have to add a specific option when encoding xml: `(valueTag="value")`
169+
</records>
170+
171+
```
82172

83173
If you want to create the other elements as attributes. You have to tell MF which elements are attributes by adding a attributeMarker with the option `attributemarker` in handle generic xml.
84174
Here I use `@` as attribute marker:
85175

176+
```text
177+
inputFile
178+
| open-file
179+
| decode-xml
180+
| handle-generic-xml(attributeMarker="@")
181+
| encode-xml(attributeMarker="@",valueTag="value")
182+
| print
183+
;
184+
```
86185

87-
https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+decode-xml%0A%7C+handle-generic-xml%28attributeMarker%3D%22@%22%29%0A%7C+encode-xml%28attributeMarker%3D%22@%22%2CvalueTag%3D%22value%22%29%0A%7C+print%0A%3B&data=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22utf-8%22%3F%3E%0A%3Crecord%3E%0A++%3Ctitle+attribute%3D%22test%22%3ETest+value%3C/title%3E%0A%3C/record%3E
186+
[Playground Link](https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+decode-xml%0A%7C+handle-generic-xml%28attributeMarker%3D%22@%22%29%0A%7C+encode-xml%28attributeMarker%3D%22@%22%2CvalueTag%3D%22value%22%29%0A%7C+print%0A%3B&data=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22utf-8%22%3F%3E%0A%3Crecord%3E%0A++%3Ctitle+attribute%3D%22test%22%3ETest+value%3C/title%3E%0A%3C/record%3E)
88187

89188
When you encode it as yaml you see the magic behind it:
90189

91-
https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+decode-xml%0A%7C+handle-generic-xml%28attributeMarker%3D%22@%22%29%0A%7C+encode-yaml%0A%7C+print%0A%3B&data=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22utf-8%22%3F%3E%0A%3Crecord%3E%0A++%3Ctitle+attribute%3D%22test%22%3ETest+value%3C/title%3E%0A%3C/record%3E
190+
```text
191+
inputFile
192+
| open-file
193+
| decode-xml
194+
| handle-generic-xml(attributeMarker="@")
195+
| encode-yaml
196+
| print
197+
;
198+
```
92199

200+
[Playground Link](https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A%7C+decode-xml%0A%7C+handle-generic-xml%28attributeMarker%3D%22@%22%29%0A%7C+encode-yaml%0A%7C+print%0A%3B&data=%3C%3Fxml+version%3D%221.0%22+encoding%3D%22utf-8%22%3F%3E%0A%3Crecord%3E%0A++%3Ctitle+attribute%3D%22test%22%3ETest+value%3C/title%3E%0A%3C/record%3E)
93201

94202
Another important thing, when working with xml data sets is to specify the record tag. Default is the tag record. But other data sets have different tags that separate records:
95203

96-
https://metafacture.org/playground/?flux=%22http%3A//www.lido-schema.org/documents/examples/LIDO-v1.1-Example_FMobj00154983-LaPrimavera.xml%22%0A%7C+open-http%0A%7C+decode-xml%0A%7C+handle-generic-xml%28recordtagname%3D%22lido%22%29%0A%7C+encode-yaml%0A%7C+print%0A%3B
204+
```text
205+
"http://www.lido-schema.org/documents/examples/LIDO-v1.1-Example_FMobj00154983-LaPrimavera.xml"
206+
| open-http
207+
| decode-xml
208+
| handle-generic-xml(recordtagname="lido")
209+
| encode-yaml
210+
| print
211+
;
212+
```
213+
214+
[Playground Link](https://metafacture.org/playground/?flux=%22http%3A//www.lido-schema.org/documents/examples/LIDO-v1.1-Example_FMobj00154983-LaPrimavera.xml%22%0A%7C+open-http%0A%7C+decode-xml%0A%7C+handle-generic-xml%28recordtagname%3D%22lido%22%29%0A%7C+encode-yaml%0A%7C+print%0A%3B)
97215

98216

99217
## Bonus: Working with namespaces
@@ -104,14 +222,24 @@ the option `emitnamespace="true"` for the `handle-generic-xml` command.
104222

105223
Add this option to the previous example and see that there are elements belonging to lido as well as skos.
106224

225+
```text
226+
"http://www.lido-schema.org/documents/examples/LIDO-v1.1-Example_FMobj00154983-LaPrimavera.xml"
227+
| open-http
228+
| decode-xml
229+
| handle-generic-xml(recordtagname="lido", emitnamespace="true")
230+
| encode-yaml
231+
| print
232+
;
233+
```
234+
107235
See this in the Playground [here](https://metafacture.org/playground/?flux=%22http%3A//www.lido-schema.org/documents/examples/LIDO-v1.1-Example_FMobj00154983-LaPrimavera.xml%22%0A%7C+open-http%0A%7C+decode-xml%0A%7C+handle-generic-xml%28recordtagname%3D%22lido%22%2C+emitnamespace%3D%22true%22%29%0A%7C+encode-yaml%0A%7C+print%0A%3B).
108236

109237
When you want to add the namespace definition to the output metafacture does not know that by itself but you have to tell metafacture
110238
the new namespace when `encoding-xml` either by a file with the option `namespacefile` or in the flux with the option `namespaces`.
111239

112240
See here an example for adding namespaces in the flux:
113241

114-
```
242+
```text
115243
inputFile
116244
| open-file
117245
| as-lines

0 commit comments

Comments
 (0)