Skip to content

Commit 650f362

Browse files
committed
Update csv session
1 parent 0e9e805 commit 650f362

File tree

1 file changed

+72
-9
lines changed

1 file changed

+72
-9
lines changed

09_Working_with_CSV.md

Lines changed: 72 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -36,17 +36,49 @@ See that the elements have no name literal names but are only numbers.
3636
But the csv has a header we need to add the option `(hasHeader="true")` to `decode-csv` in the flux.
3737

3838

39-
You can extract specified fields while converting to another tabular format by using the fix. This is quite handy for analysis of specific fields or to generate reports.
39+
You can extract specified fields while converting to another tabular format by using the fix. This is quite handy for analysis of specific fields or to generate reports. In the following example we only keep three columns (`ISBN"`,`"Title"`,`"Author"`):
4040

41-
https://metafacture.org/playground/?flux=%22https%3A//lib.ugent.be/download/librecat/data/goodreads.csv%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-csv%28hasHeader%3D%22true%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-csv%28includeHeader%3D%22true%22%29%0A%7C+print%0A%3B&transformation=retain%28%22ISBN%22%2C%22Title%22%2C%22Author%22%29
41+
Flux:
4242

43-
By default Metafactures `decode-csv` expects that CSV fields are separated by comma ‘,’ and strings are quoted with double qoutes ‘”‘ or single quotes `'`. You can specify other characters as separator or quotes with the option ‘separator’ and clean special quote signs with the fix:
43+
```text
44+
"https://lib.ugent.be/download/librecat/data/goodreads.csv"
45+
| open-http
46+
| as-lines
47+
| decode-csv(hasHeader="true")
48+
| fix(transformationFile)
49+
| encode-csv(includeHeader="true")
50+
| print
51+
;
52+
```
53+
54+
With Fix:
55+
```
56+
retain("ISBN","Title","Author")
57+
```
58+
59+
[See the example in the Playground](https://metafacture.org/playground/?flux=%22https%3A//lib.ugent.be/download/librecat/data/goodreads.csv%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-csv%28hasHeader%3D%22true%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-csv%28includeHeader%3D%22true%22%29%0A%7C+print%0A%3B&transformation=retain%28%22ISBN%22%2C%22Title%22%2C%22Author%22%29)
60+
61+
By default Metafactures `decode-csv` expects that CSV fields are separated by comma ‘,’ and strings are quoted with double qoutes ‘”‘ or single quotes `'`. You can specify other characters as separator or quotes with the option ‘separator’ and clean special quote signs with the fix. (In contrast to Catmandu quote-chars cannot be manipulated by the decoder directly, yet.)
62+
63+
Flux:
4464

45-
See:
65+
```text
66+
"12157;$The Journal of Headache and Pain$;2193-1801"
67+
| read-string
68+
| as-lines
69+
| decode-csv(separator=";")
70+
| fix(transformationFile)
71+
| encode-csv(separator="\t", includeheader="true")
72+
| print;
73+
```
4674

47-
https://metafacture.org/playground/?flux=%2212157%3B%24The+Journal+of+Headache+and+Pain%24%3B2193-1801%22%0A%7C+read-string%0A%7C+as-lines%0A%7C+decode-csv%28separator%3D%22%3B%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-csv%28separator%3D%22\t%22%2C+includeheader%3D%22true%22%29%0A%7C+print%3B&transformation=replace_all%28%22%3F%22%2C%22%5E\\%24%7C\\%24%24%22%2C%22%22%29
75+
Fix:
76+
77+
```
78+
replace_all("?","^\\$|\\$$","")
79+
```
4880

49-
(Different to Catmandu quote-chars cannot be manipulated by the decoder directly.)
81+
[See the example in the Playground.](https://metafacture.org/playground/?flux=%2212157%3B%24The+Journal+of+Headache+and+Pain%24%3B2193-1801%22%0A%7C+read-string%0A%7C+as-lines%0A%7C+decode-csv%28separator%3D%22%3B%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-csv%28separator%3D%22\t%22%2C+includeheader%3D%22true%22%29%0A%7C+print%3B&transformation=replace_all%28%22%3F%22%2C%22%5E\\%24%7C\\%24%24%22%2C%22%22%29)
5082

5183
In the example above we read the string as a little CSV fragment using the `read-string` command for our small test. It will read the tiny CSV string which uses “;” and “$” as separation and quotation characters.
5284
The string is then read each line by `as-lines` and decoded as csv with the separator `,`.
@@ -55,13 +87,44 @@ With a little fix you can
5587

5688
## Writing CSVs
5789

58-
When exporting data a tabular format you can change the field names in the header or omit the header:
90+
When harvesting data in tabular format you also can change the field names in the header or omit the header:
91+
92+
Flux:
93+
94+
```text
95+
"https://lib.ugent.be/download/librecat/data/goodreads.csv"
96+
| open-http
97+
| as-lines
98+
| decode-csv(hasheader="true")
99+
| fix(transformationFile)
100+
| encode-csv(includeHeader="true")
101+
| print;
102+
```
103+
104+
Fix:
59105

60-
https://metafacture.org/playground/?flux=%22https%3A//lib.ugent.be/download/librecat/data/goodreads.csv%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-csv%28hasheader%3D%22true%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-csv%28includeHeader%3D%22true%22%29%0A%7C+print%3B&transformation=move_field%28%22ISBN%22%2C%22A%22%29%0Amove_field%28%22Title%22%2C%22B%22%29%0Amove_field%28%22Author%22%2C%22C%22%29%0A%0Aretain%28%22A%22%2C%22B%22%2C%22C%22%29
106+
```perl
107+
move_field("ISBN","A")
108+
move_field("Title","B")
109+
move_field("Author","C")
110+
111+
retain("A","B","C")
112+
```
113+
114+
[See example in he playground.](https://metafacture.org/playground/?flux=%22https%3A//lib.ugent.be/download/librecat/data/goodreads.csv%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-csv%28hasheader%3D%22true%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-csv%28includeHeader%3D%22true%22%29%0A%7C+print%3B&transformation=move_field%28%22ISBN%22%2C%22A%22%29%0Amove_field%28%22Title%22%2C%22B%22%29%0Amove_field%28%22Author%22%2C%22C%22%29%0A%0Aretain%28%22A%22%2C%22B%22%2C%22C%22%29)
61115

62116
You can transform the data to an tsv file with the separator \t which has no header like this.
63117

64-
https://metafacture.org/playground/?flux=%22https%3A//lib.ugent.be/download/librecat/data/goodreads.csv%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-csv%28hasheader%3D%22true%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-csv%28separator%3D%22\t%22%2C+noQuotes%3D%22true%22%29%0A%7C+print%3B&transformation=retain%28%22ISBN%22%2C%22Title%22%2C%22Author%22%29
118+
```text
119+
"https://lib.ugent.be/download/librecat/data/goodreads.csv"
120+
| open-http
121+
| as-lines
122+
| decode-csv(hasheader="true")
123+
| encode-csv(separator="\t", noQuotes="true")
124+
| print;
125+
```
126+
127+
[See example in playground.](https://metafacture.org/playground/?flux=%22https%3A//lib.ugent.be/download/librecat/data/goodreads.csv%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-csv%28hasheader%3D%22true%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-csv%28separator%3D%22\t%22%2C+noQuotes%3D%22true%22%29%0A%7C+print%3B&transformation=retain%28%22ISBN%22%2C%22Title%22%2C%22Author%22%29)
65128

66129
When you create a CSV from a by export complex/nested data structures to a tabular format, you must “flatten” the datastructure. Also
67130
you have to be aware that the order and number of elements in every record is the same otherwise the header does not match the records.

0 commit comments

Comments
 (0)