Skip to content

Commit 7ee75ad

Browse files
committed
Improve examples and documentation for CSV format
1 parent 64a010a commit 7ee75ad

File tree

3 files changed

+116
-2
lines changed

3 files changed

+116
-2
lines changed

README.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ rows efficiently without having to load the whole file into memory at once.
2424

2525
**Table of contents**
2626

27+
* [CSV format](#csv-format)
2728
* [Usage](#usage)
2829
* [Decoder](#decoder)
2930
* [Encoder](#encoder)
@@ -32,6 +33,77 @@ rows efficiently without having to load the whole file into memory at once.
3233
* [License](#license)
3334
* [More](#more)
3435

36+
## CSV format
37+
38+
CSV (Comma-Separated Values or less commonly Character-Separated Values) is a
39+
very simple text-based format for storing a large number of (uniform) records,
40+
such as a list of user records or log entries.
41+
42+
```
43+
Alice,30
44+
Bob,50
45+
Carol,40
46+
Dave,30
47+
```
48+
49+
While this may look somewhat trivial, this simplicity comes at a price. CSV is
50+
limited to untyped, two-dimensional data, so there's no standard way of storing
51+
any nested structures or to differentiate a boolean value from a string or
52+
integer.
53+
54+
CSV allows for optional field names. Whether field names are used is
55+
application-dependant, so this library makes no attempt at *guessing* whether
56+
the first line contains field names or field values. For many common use cases
57+
it's a good idea to include them like this:
58+
59+
```
60+
name,age
61+
Alice,30
62+
Bob,50
63+
Carol,40
64+
Dave,30
65+
```
66+
67+
CSV allows handling field values that contain spaces or the delimiting comma
68+
(think of URLs or user-provided descriptions) by enclosing them with quotes like
69+
this:
70+
71+
```
72+
name,comment
73+
Alice,"Yes, I like cheese"
74+
Bob,"Hello World!"
75+
```
76+
77+
> Note that these more advanced parsing rules are often handled inconsistently
78+
by other applications. Nowadays, these parsing rules are defined as part of
79+
[RFC 4180](https://tools.ietf.org/html/rfc4180), however many applications
80+
starting using some CSV-variant long before this standard was defined.
81+
82+
Some applications refer to CSV as Character-Separated Values, simply because
83+
using another delimiter (such as semicolon or tab) is a rather common approach
84+
to avoid the need to enclose common values in quotes. This is particularly
85+
common for European systems that use a comma as decimal separator.
86+
87+
```
88+
name;comment
89+
Alice;Yes, I like cheese
90+
Bob;Turn 22,5 degree clockwise
91+
```
92+
93+
CSV files are often limited to only ASCII characters for best interoperability.
94+
However, many legacy CSV files often use ISO 8859-1 encoding or some other
95+
variant. Newer CSV files are usually best saved as UTF-8 and may thus also
96+
contain special characters from the Unicode range. The text-encoding is usually
97+
application-dependant, so your best bet would be to convert to (or assume) UTF-8
98+
consistently.
99+
100+
Despite its shortcomings CSV is widely used and this is unlikely to change any
101+
time soon. In particular, CSV is a very common export format for a lot of tools
102+
to interface with spreadsheet processors (such as Exel, Calc etc.). This means
103+
that CSV is often used for historial reasons and using CSV to store structured
104+
application data is usually not a good idea nowadays – but exporting to CSV for
105+
known applications is a very reasonable approach.
106+
35107
## Usage
36108

37109
### Decoder

examples/01-count.php

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
<?php
2+
3+
use Clue\React\Csv\Decoder;
4+
use React\EventLoop\Factory;
5+
use React\Stream\ReadableResourceStream;
6+
use React\Stream\WritableResourceStream;
7+
8+
require __DIR__ . '/../vendor/autoload.php';
9+
10+
$loop = Factory::create();
11+
12+
$exit = 0;
13+
$in = new ReadableResourceStream(STDIN, $loop);
14+
$info = new WritableResourceStream(STDERR, $loop);
15+
16+
$delimiter = isset($argv[1]) ? $argv[1] : ',';
17+
18+
$decoder = new Decoder($in, $delimiter);
19+
20+
$count = 0;
21+
$decoder->on('data', function () use (&$count) {
22+
++$count;
23+
});
24+
25+
$decoder->on('end', function () use (&$count) {
26+
echo $count . PHP_EOL;
27+
});
28+
29+
$decoder->on('error', function (Exception $e) use (&$count, &$exit, $info) {
30+
$info->write('ERROR after record ' . $count . ': ' . $e->getMessage() . PHP_EOL);
31+
$exit = 1;
32+
});
33+
34+
$info->write('You can pipe/write a valid CSV stream to STDIN' . PHP_EOL);
35+
$info->write('The resulting number of records (rows) will be printed to STDOUT' . PHP_EOL);
36+
$info->write('Invalid CSV will raise an error on STDERR and exit with code 1' . PHP_EOL);
37+
38+
$loop->run();
39+
40+
exit($exit);
Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,10 @@
1515
$out = new WritableResourceStream(STDOUT, $loop);
1616
$info = new WritableResourceStream(STDERR, $loop);
1717

18-
$decoder = new Decoder($in);
19-
$encoder = new Encoder($out);
18+
$delimiter = isset($argv[1]) ? $argv[1] : ',';
19+
20+
$decoder = new Decoder($in, $delimiter);
21+
$encoder = new Encoder($out, $delimiter);
2022
$decoder->pipe($encoder);
2123

2224
$decoder->on('error', function (Exception $e) use ($info, &$exit) {

0 commit comments

Comments
 (0)