Skip to content

Commit 9f8ca77

Browse files
committed
Fix typos and grammar
1 parent 63d7c6d commit 9f8ca77

File tree

1 file changed

+15
-20
lines changed

1 file changed

+15
-20
lines changed

docs/09_Working_with_CSV.md

Lines changed: 15 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,13 @@ parent: Tutorial
88

99
# Lesson 9: Working with CSV and TSV files
1010

11-
CSV and TSV files are widely-used to store and exchange simple structured data. Many open datasets are published as CSV or TSV files, e.g. datahub.io. Within the library community CSV files are used for the distribution of title lists (KBART), e.g Knowledge Base+.
11+
CSV and TSV files are widely-used to store and exchange simple structured data. Many open datasets are published as CSV or TSV files, see e.g. datahub.io. Within the library community CSV files are used for the distribution of title lists (KBART), e.g Knowledge Base+.
1212

13-
Metafacture implements an decoder and encoder for both formats: decode-csv and encode-csv.
13+
Metafacture implements a decoder and an encoder which you can youse for both formats: `decode-csv` and `encode-csv`.
1414

1515
## Reading CSVs
1616

17-
So get some CSV data to work with:
17+
Get some CSV data to work with:
1818

1919
```text
2020
"https://lib.ugent.be/download/librecat/data/goodreads.csv"
@@ -24,9 +24,9 @@ So get some CSV data to work with:
2424
;
2525
```
2626

27-
It shows a CSV file with a header row at the beginnung.
27+
It shows a CSV file with a header row at the beginning.
2828

29-
Now you can convert the data to different formats, like JSON, YAML and XML by decoding the data as csv and encoding it in the desired format:
29+
Convert the data to different serializations, like JSON, YAML and XML by decoding the data as CSV and encoding it in the desired serialization:
3030

3131
```
3232
"https://lib.ugent.be/download/librecat/data/goodreads.csv"
@@ -40,11 +40,11 @@ Now you can convert the data to different formats, like JSON, YAML and XML by de
4040

4141
[See in playground.](https://metafacture.org/playground/?flux=%22https%3A//lib.ugent.be/download/librecat/data/goodreads.csv%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-csv%0A%7C+encode-json%28prettyPrinting%3D%22true%22%29+//+or+encode-xml+or+encode-yaml%0A%7C+print%0A%3B)
4242

43-
See that the elements have no name literal names but are only numbers.
44-
But the csv has a header we need to add the option `(hasHeader="true")` to `decode-csv` in the flux.
43+
See that the elements have no literal names but only numbers.
44+
As the CSV has a header we need to add the option `(hasHeader="true")` to `decode-csv` in the Flux.
4545

4646

47-
You can extract specified fields while converting to another tabular format by using the fix. This is quite handy for analysis of specific fields or to generate reports. In the following example we only keep three columns (`ISBN"`,`"Title"`,`"Author"`):
47+
You can extract specified fields while converting to another tabular format by using the Fix. This is quite handy for analysis of specific fields or to generate reports. In the following example we only keep three columns (`"ISBN"`,`"Title"`,`"Author"`):
4848

4949
Flux:
5050

@@ -66,7 +66,7 @@ retain("ISBN","Title","Author")
6666

6767
[See the example in the Playground](https://metafacture.org/playground/?flux=%22https%3A//lib.ugent.be/download/librecat/data/goodreads.csv%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-csv%28hasHeader%3D%22true%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-csv%28includeHeader%3D%22true%22%29%0A%7C+print%0A%3B&transformation=retain%28%22ISBN%22%2C%22Title%22%2C%22Author%22%29)
6868

69-
By default Metafactures `decode-csv` expects that CSV fields are separated by comma ‘,’ and strings are quoted with double qoutes ‘”‘ or single quotes `'`. You can specify other characters as separator or quotes with the option separator and clean special quote signs with the fix. (In contrast to Catmandu quote-chars cannot be manipulated by the decoder directly, yet.)
69+
By default Metafactures `decode-csv` expects that CSV fields are separated by comma `,` and strings are quoted with double qoutes `"` or single quotes `'`. You can specify other characters as separator or quotes with the option `separator` and clean special quote signs using the Fix. (In contrast to Catmandu quote-chars cannot be manipulated by the decoder directly, yet.)
7070

7171
Flux:
7272

@@ -88,11 +88,9 @@ replace_all("?","^\\$|\\$$","")
8888

8989
[See the example in the Playground.](https://metafacture.org/playground/?flux=%2212157%3B%24The+Journal+of+Headache+and+Pain%24%3B2193-1801%22%0A%7C+read-string%0A%7C+as-lines%0A%7C+decode-csv%28separator%3D%22%3B%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-csv%28separator%3D%22\t%22%2C+includeheader%3D%22true%22%29%0A%7C+print%3B&transformation=replace_all%28%22%3F%22%2C%22%5E\\%24%7C\\%24%24%22%2C%22%22%29)
9090

91-
In the example above we read the string as a little CSV fragment using the `read-string` command for our small test. It will read the tiny CSV string which uses “;” and “$” as separation and quotation characters.
91+
In the example above we read the string as a little CSV fragment using the `read-string` command for our small test. It will read the tiny CSV string which uses `;` and `$` as separation and quotation characters.
9292
The string is then read each line by `as-lines` and decoded as csv with the separator `,`.
9393

94-
With a little fix you can
95-
9694
## Writing CSVs
9795

9896
When harvesting data in tabular format you also can change the field names in the header or omit the header:
@@ -121,7 +119,7 @@ retain("A","B","C")
121119

122120
[See example in he playground.](https://metafacture.org/playground/?flux=%22https%3A//lib.ugent.be/download/librecat/data/goodreads.csv%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-csv%28hasheader%3D%22true%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-csv%28includeHeader%3D%22true%22%29%0A%7C+print%3B&transformation=move_field%28%22ISBN%22%2C%22A%22%29%0Amove_field%28%22Title%22%2C%22B%22%29%0Amove_field%28%22Author%22%2C%22C%22%29%0A%0Aretain%28%22A%22%2C%22B%22%2C%22C%22%29)
123121

124-
You can transform the data to an tsv file with the separator \t which has no header like this.
122+
You can transform the data to a TSV file with the separator `\t` which has no header like this:
125123

126124
```text
127125
"https://lib.ugent.be/download/librecat/data/goodreads.csv"
@@ -134,17 +132,14 @@ You can transform the data to an tsv file with the separator \t which has no hea
134132

135133
[See example in playground.](https://metafacture.org/playground/?flux=%22https%3A//lib.ugent.be/download/librecat/data/goodreads.csv%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-csv%28hasheader%3D%22true%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-csv%28separator%3D%22\t%22%2C+noQuotes%3D%22true%22%29%0A%7C+print%3B&transformation=retain%28%22ISBN%22%2C%22Title%22%2C%22Author%22%29)
136134

137-
When you create a CSV from a by export complex/nested data structures to a tabular format, you must “flatten” the datastructure. Also
138-
you have to be aware that the order and number of elements in every record is the same otherwise the header does not match the records.
139-
140-
But could be done with Metafacture. But be aware that the nested structure if repeatble elements are provided have to be the identical every time. Otherwise the header and the csv file do not fit:
135+
When you create a CSV from complex/nested data structures to a tabular format, you must “flatten” the datastructure. Also you have to be aware that the order and number of elements in every record is the same as the header should match the records.
141136

142-
https://metafacture.org/playground/?flux=%22https%3A//lobid.org/organisations/search%3Fq%3Dk%25C3%25B6ln%26size%3D10%22%0A%7C+open-http%28accept%3D%22application/json%22%29%0A%7C+as-records%0A%7C+decode-json%28recordpath%3D%22member%22%29%0A%7C+flatten%0A%7C+encode-csv%28includeheader%3D%22true%22%29%0A%7C+print%3B
137+
So: make sure that the nested structure of repeatable elements is identical every time. Otherwise the [header and the CSV file do not fit](https://metafacture.org/playground/?flux=%22https%3A//lobid.org/organisations/search%3Fq%3Dk%25C3%25B6ln%26size%3D10%22%0A%7C+open-http%28accept%3D%22application/json%22%29%0A%7C+as-records%0A%7C+decode-json%28recordpath%3D%22member%22%29%0A%7C+flatten%0A%7C+encode-csv%28includeheader%3D%22true%22%29%0A%7C+print%3B).
143138

144139
Excercises:
145140

146-
- [Decode this csv keep the header.](https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A...%0A...%0A%7C+encode-yaml%0A%7C+print%0A%3B&data=%22id%22%2C%22name%22%2C%22creator%22%0A%221%22%2C%22Book+1%22%2C%22Maxi+Muster%22%0A%222%22%2C%22Book+2%22%2C%22Sandy+Sample%22)
147-
- [Create a tsv with the record idenfier (`_id`), title (`245` > `title`) and isbn (`020` > `isbn`) from a marc dump.](https://metafacture.org/playground/?flux=%22https%3A//raw.githubusercontent.com/metafacture/metafacture-core/master/metafacture-runner/src/main/dist/examples/read/marc21/10.marc21%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-marc21%0A%7C+fix%28transformationFile%29%0A%7C+flatten%0A%7C+encode-csv%28includeHeader%3D%22TRUE%22%2C+separator%3D%22\t%22%2C+noQuotes%3D%22false%22%29%0A%7C+print%0A%3B&transformation=)
141+
- [Decode this CSV while keeping the header.](https://metafacture.org/playground/?flux=inputFile%0A%7C+open-file%0A...%0A...%0A%7C+encode-yaml%0A%7C+print%0A%3B&data=%22id%22%2C%22name%22%2C%22creator%22%0A%221%22%2C%22Book+1%22%2C%22Maxi+Muster%22%0A%222%22%2C%22Book+2%22%2C%22Sandy+Sample%22)
142+
- [Create a TSV with the record idenfier (`_id`), title (`245` > `title`) and isbn (`020` > `isbn`) from a marc dump.](https://metafacture.org/playground/?flux=%22https%3A//raw.githubusercontent.com/metafacture/metafacture-core/master/metafacture-runner/src/main/dist/examples/read/marc21/10.marc21%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-marc21%0A%7C+fix%28transformationFile%29%0A%7C+flatten%0A%7C+encode-csv%28includeHeader%3D%22TRUE%22%2C+separator%3D%22\t%22%2C+noQuotes%3D%22false%22%29%0A%7C+print%0A%3B&transformation=)
148143

149144
---------------
150145

0 commit comments

Comments
 (0)