@@ -24,6 +24,7 @@ rows efficiently without having to load the whole file into memory at once.
2424
2525** Table of contents**
2626
27+ * [ CSV format] ( #csv-format )
2728* [ Usage] ( #usage )
2829 * [ Decoder] ( #decoder )
2930 * [ Encoder] ( #encoder )
@@ -32,6 +33,77 @@ rows efficiently without having to load the whole file into memory at once.
3233* [ License] ( #license )
3334* [ More] ( #more )
3435
36+ ## CSV format
37+
38+ CSV (Comma-Separated Values or less commonly Character-Separated Values) is a
39+ very simple text-based format for storing a large number of (uniform) records,
40+ such as a list of user records or log entries.
41+
42+ ```
43+ Alice,30
44+ Bob,50
45+ Carol,40
46+ Dave,30
47+ ```
48+
49+ While this may look somewhat trivial, this simplicity comes at a price. CSV is
50+ limited to untyped, two-dimensional data, so there's no standard way of storing
51+ any nested structures or to differentiate a boolean value from a string or
52+ integer.
53+
54+ CSV allows for optional field names. Whether field names are used is
55+ application-dependant, so this library makes no attempt at * guessing* whether
56+ the first line contains field names or field values. For many common use cases
57+ it's a good idea to include them like this:
58+
59+ ```
60+ name,age
61+ Alice,30
62+ Bob,50
63+ Carol,40
64+ Dave,30
65+ ```
66+
67+ CSV allows handling field values that contain spaces or the delimiting comma
68+ (think of URLs or user-provided descriptions) by enclosing them with quotes like
69+ this:
70+
71+ ```
72+ name,comment
73+ Alice,"Yes, I like cheese"
74+ Bob,"Hello World!"
75+ ```
76+
77+ > Note that these more advanced parsing rules are often handled inconsistently
78+ by other applications. Nowadays, these parsing rules are defined as part of
79+ [ RFC 4180] ( https://tools.ietf.org/html/rfc4180 ) , however many applications
80+ starting using some CSV-variant long before this standard was defined.
81+
82+ Some applications refer to CSV as Character-Separated Values, simply because
83+ using another delimiter (such as semicolon or tab) is a rather common approach
84+ to avoid the need to enclose common values in quotes. This is particularly
85+ common for European systems that use a comma as decimal separator.
86+
87+ ```
88+ name;comment
89+ Alice;Yes, I like cheese
90+ Bob;Turn 22,5 degree clockwise
91+ ```
92+
93+ CSV files are often limited to only ASCII characters for best interoperability.
94+ However, many legacy CSV files often use ISO 8859-1 encoding or some other
95+ variant. Newer CSV files are usually best saved as UTF-8 and may thus also
96+ contain special characters from the Unicode range. The text-encoding is usually
97+ application-dependant, so your best bet would be to convert to (or assume) UTF-8
98+ consistently.
99+
100+ Despite its shortcomings CSV is widely used and this is unlikely to change any
101+ time soon. In particular, CSV is a very common export format for a lot of tools
102+ to interface with spreadsheet processors (such as Exel, Calc etc.). This means
103+ that CSV is often used for historial reasons and using CSV to store structured
104+ application data is usually not a good idea nowadays – but exporting to CSV for
105+ known applications is a very reasonable approach.
106+
35107## Usage
36108
37109### Decoder
0 commit comments