Skip to content

Commit 813855b

Browse files
thisispaulPowerKiKi
authored andcommitted
Fix CSV delimiter detection on line breaks
The CSV Reader can now correctly ignore line breaks inside enclosures which allows it to determine the delimiter correctly. Fixes #716 Fixes #717
1 parent 54efe88 commit 813855b

File tree

4 files changed

+62
-5
lines changed

4 files changed

+62
-5
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
1717
- Xls file cause the exception during open by Xls reader - [#402](https://github.com/PHPOffice/PhpSpreadsheet/issues/402)
1818
- Skip non numeric value in SUMIF - [#618](https://github.com/PHPOffice/PhpSpreadsheet/pull/618)
1919
- OFFSET should allow omitted height and width - [#561](https://github.com/PHPOffice/PhpSpreadsheet/issues/561)
20+
- Correctly determine delimiter when CSV contains line breaks inside enclosures - [#716](https://github.com/PHPOffice/PhpSpreadsheet/issues/716)
2021

2122
## [1.4.1] - 2018-09-30
2223

src/PhpSpreadsheet/Reader/Csv.php

Lines changed: 37 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -163,11 +163,7 @@ protected function inferSeparator()
163163

164164
// Count how many times each of the potential delimiters appears in each line
165165
$numberLines = 0;
166-
while (($line = fgets($this->fileHandle)) !== false && (++$numberLines < 1000)) {
167-
// Drop everything that is enclosed to avoid counting false positives in enclosures
168-
$enclosure = preg_quote($this->enclosure, '/');
169-
$line = preg_replace('/(' . $enclosure . '.*' . $enclosure . ')/U', '', $line);
170-
166+
while (($line = $this->getNextLine()) !== false && (++$numberLines < 1000)) {
171167
$countLine = [];
172168
for ($i = strlen($line) - 1; $i >= 0; --$i) {
173169
$char = $line[$i];
@@ -230,6 +226,42 @@ function ($sum, $value) use ($median) {
230226
return $this->skipBOM();
231227
}
232228

229+
/**
230+
* Get the next full line from the file.
231+
*
232+
* @param string $line
233+
*
234+
* @return bool|string
235+
*/
236+
private function getNextLine($line = '')
237+
{
238+
// Get the next line in the file
239+
$newLine = fgets($this->fileHandle);
240+
241+
// Return false if there is no next line
242+
if ($newLine === false) {
243+
return false;
244+
}
245+
246+
// Add the new line to the line passed in
247+
$line = $line . $newLine;
248+
249+
// Drop everything that is enclosed to avoid counting false positives in enclosures
250+
$enclosure = preg_quote($this->enclosure, '/');
251+
$line = preg_replace('/(' . $enclosure . '.*' . $enclosure . ')/U', '', $line);
252+
253+
// See if we have any enclosures left in the line
254+
$matches = [];
255+
preg_match('/(' . $enclosure . ')/', $line, $matches);
256+
257+
// if we still have an enclosure then we need to read the next line aswell
258+
if (count($matches) > 0) {
259+
$line = $this->getNextLine($line);
260+
}
261+
262+
return $line;
263+
}
264+
233265
/**
234266
* Return worksheet info (Name, Last Column Letter, Last Column Index, Total Rows, Total Columns).
235267
*

tests/PhpSpreadsheetTests/Reader/CsvTest.php

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,12 @@ public function providerDelimiterDetection()
4343
'C2',
4444
'25,5',
4545
],
46+
[
47+
__DIR__ . '/../../data/Reader/CSV/line_break_in_enclosure.csv',
48+
',',
49+
'A3',
50+
'Test',
51+
],
4652
[
4753
__DIR__ . '/../../data/Reader/HTML/csv_with_angle_bracket.csv',
4854
',',
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
Name,Copy,URL
2+
Test,"This is a test
3+
with line breaks
4+
that breaks the
5+
delimiters",http://google.com
6+
Test,"This is a test
7+
with line breaks
8+
that breaks the
9+
delimiters",http://google.com
10+
Test,"This is a test
11+
with line breaks
12+
that breaks the
13+
delimiters",http://google.com
14+
Test,"This is a test
15+
with line breaks
16+
that breaks the
17+
delimiters",http://google.com
18+
Test,"This is a test",http://google.com

0 commit comments

Comments
 (0)