You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/09_Working_with_CSV.md
+15-20Lines changed: 15 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,13 +8,13 @@ parent: Tutorial
8
8
9
9
# Lesson 9: Working with CSV and TSV files
10
10
11
-
CSV and TSV files are widely-used to store and exchange simple structured data. Many open datasets are published as CSV or TSV files, e.g. datahub.io. Within the library community CSV files are used for the distribution of title lists (KBART), e.g Knowledge Base+.
11
+
CSV and TSV files are widely-used to store and exchange simple structured data. Many open datasets are published as CSV or TSV files, see e.g. datahub.io. Within the library community CSV files are used for the distribution of title lists (KBART), e.g Knowledge Base+.
12
12
13
-
Metafacture implements an decoder and encoder for both formats: decode-csv and encode-csv.
13
+
Metafacture implements a decoder and an encoder which you can youse for both formats: `decode-csv` and `encode-csv`.
@@ -40,11 +40,11 @@ Now you can convert the data to different formats, like JSON, YAML and XML by de
40
40
41
41
[See in playground.](https://metafacture.org/playground/?flux=%22https%3A//lib.ugent.be/download/librecat/data/goodreads.csv%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-csv%0A%7C+encode-json%28prettyPrinting%3D%22true%22%29+//+or+encode-xml+or+encode-yaml%0A%7C+print%0A%3B)
42
42
43
-
See that the elements have no name literal names but are only numbers.
44
-
But the csv has a header we need to add the option `(hasHeader="true")` to `decode-csv` in the flux.
43
+
See that the elements have no literal names but only numbers.
44
+
As the CSV has a header we need to add the option `(hasHeader="true")` to `decode-csv` in the Flux.
45
45
46
46
47
-
You can extract specified fields while converting to another tabular format by using the fix. This is quite handy for analysis of specific fields or to generate reports. In the following example we only keep three columns (`ISBN"`,`"Title"`,`"Author"`):
47
+
You can extract specified fields while converting to another tabular format by using the Fix. This is quite handy for analysis of specific fields or to generate reports. In the following example we only keep three columns (`"ISBN"`,`"Title"`,`"Author"`):
48
48
49
49
Flux:
50
50
@@ -66,7 +66,7 @@ retain("ISBN","Title","Author")
66
66
67
67
[See the example in the Playground](https://metafacture.org/playground/?flux=%22https%3A//lib.ugent.be/download/librecat/data/goodreads.csv%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-csv%28hasHeader%3D%22true%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-csv%28includeHeader%3D%22true%22%29%0A%7C+print%0A%3B&transformation=retain%28%22ISBN%22%2C%22Title%22%2C%22Author%22%29)
68
68
69
-
By default Metafactures `decode-csv` expects that CSV fields are separated by comma ‘,’ and strings are quoted with double qoutes ‘”‘ or single quotes `'`. You can specify other characters as separator or quotes with the option ‘separator’ and clean special quote signs with the fix. (In contrast to Catmandu quote-chars cannot be manipulated by the decoder directly, yet.)
69
+
By default Metafactures `decode-csv` expects that CSV fields are separated by comma `,` and strings are quoted with double qoutes `"` or single quotes `'`. You can specify other characters as separator or quotes with the option `separator` and clean special quote signs using the Fix. (In contrast to Catmandu quote-chars cannot be manipulated by the decoder directly, yet.)
[See the example in the Playground.](https://metafacture.org/playground/?flux=%2212157%3B%24The+Journal+of+Headache+and+Pain%24%3B2193-1801%22%0A%7C+read-string%0A%7C+as-lines%0A%7C+decode-csv%28separator%3D%22%3B%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-csv%28separator%3D%22\t%22%2C+includeheader%3D%22true%22%29%0A%7C+print%3B&transformation=replace_all%28%22%3F%22%2C%22%5E\\%24%7C\\%24%24%22%2C%22%22%29)
90
90
91
-
In the example above we read the string as a little CSV fragment using the `read-string` command for our small test. It will read the tiny CSV string which uses “;” and “$” as separation and quotation characters.
91
+
In the example above we read the string as a little CSV fragment using the `read-string` command for our small test. It will read the tiny CSV string which uses `;` and `$` as separation and quotation characters.
92
92
The string is then read each line by `as-lines` and decoded as csv with the separator `,`.
93
93
94
-
With a little fix you can
95
-
96
94
## Writing CSVs
97
95
98
96
When harvesting data in tabular format you also can change the field names in the header or omit the header:
@@ -121,7 +119,7 @@ retain("A","B","C")
121
119
122
120
[See example in he playground.](https://metafacture.org/playground/?flux=%22https%3A//lib.ugent.be/download/librecat/data/goodreads.csv%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-csv%28hasheader%3D%22true%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-csv%28includeHeader%3D%22true%22%29%0A%7C+print%3B&transformation=move_field%28%22ISBN%22%2C%22A%22%29%0Amove_field%28%22Title%22%2C%22B%22%29%0Amove_field%28%22Author%22%2C%22C%22%29%0A%0Aretain%28%22A%22%2C%22B%22%2C%22C%22%29)
123
121
124
-
You can transform the data to an tsv file with the separator \t which has no header like this.
122
+
You can transform the data to a TSV file with the separator `\t` which has no header like this:
@@ -134,17 +132,14 @@ You can transform the data to an tsv file with the separator \t which has no hea
134
132
135
133
[See example in playground.](https://metafacture.org/playground/?flux=%22https%3A//lib.ugent.be/download/librecat/data/goodreads.csv%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-csv%28hasheader%3D%22true%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-csv%28separator%3D%22\t%22%2C+noQuotes%3D%22true%22%29%0A%7C+print%3B&transformation=retain%28%22ISBN%22%2C%22Title%22%2C%22Author%22%29)
136
134
137
-
When you create a CSV from a by export complex/nested data structures to a tabular format, you must “flatten” the datastructure. Also
138
-
you have to be aware that the order and number of elements in every record is the same otherwise the header does not match the records.
139
-
140
-
But could be done with Metafacture. But be aware that the nested structure if repeatble elements are provided have to be the identical every time. Otherwise the header and the csv file do not fit:
135
+
When you create a CSV from complex/nested data structures to a tabular format, you must “flatten” the datastructure. Also you have to be aware that the order and number of elements in every record is the same as the header should match the records.
So: make sure that the nested structure of repeatable elements is identical every time. Otherwise the [header and the CSV file do not fit](https://metafacture.org/playground/?flux=%22https%3A//lobid.org/organisations/search%3Fq%3Dk%25C3%25B6ln%26size%3D10%22%0A%7C+open-http%28accept%3D%22application/json%22%29%0A%7C+as-records%0A%7C+decode-json%28recordpath%3D%22member%22%29%0A%7C+flatten%0A%7C+encode-csv%28includeheader%3D%22true%22%29%0A%7C+print%3B).
143
138
144
139
Excercises:
145
140
146
-
-[Decode this csv keep the header.](https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A...%0A...%0A%7C+encode-yaml%0A%7C+print%0A%3B&data=%22id%22%2C%22name%22%2C%22creator%22%0A%221%22%2C%22Book+1%22%2C%22Maxi+Muster%22%0A%222%22%2C%22Book+2%22%2C%22Sandy+Sample%22)
147
-
-[Create a tsv with the record idenfier (`_id`), title (`245` > `title`) and isbn (`020` > `isbn`) from a marc dump.](https://metafacture.org/playground/?flux=%22https%3A//raw.githubusercontent.com/metafacture/metafacture-core/master/metafacture-runner/src/main/dist/examples/read/marc21/10.marc21%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-marc21%0A%7C+fix%28transformationFile%29%0A%7C+flatten%0A%7C+encode-csv%28includeHeader%3D%22TRUE%22%2C+separator%3D%22\t%22%2C+noQuotes%3D%22false%22%29%0A%7C+print%0A%3B&transformation=)
141
+
-[Decode this CSV while keeping the header.](https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A...%0A...%0A%7C+encode-yaml%0A%7C+print%0A%3B&data=%22id%22%2C%22name%22%2C%22creator%22%0A%221%22%2C%22Book+1%22%2C%22Maxi+Muster%22%0A%222%22%2C%22Book+2%22%2C%22Sandy+Sample%22)
142
+
-[Create a TSV with the record idenfier (`_id`), title (`245` > `title`) and isbn (`020` > `isbn`) from a marc dump.](https://metafacture.org/playground/?flux=%22https%3A//raw.githubusercontent.com/metafacture/metafacture-core/master/metafacture-runner/src/main/dist/examples/read/marc21/10.marc21%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-marc21%0A%7C+fix%28transformationFile%29%0A%7C+flatten%0A%7C+encode-csv%28includeHeader%3D%22TRUE%22%2C+separator%3D%22\t%22%2C+noQuotes%3D%22false%22%29%0A%7C+print%0A%3B&transformation=)
0 commit comments