Skip to content

Commit 39bba33

Browse files
author
Mark Baker
authored
Merge pull request #2640 from PHPOffice/CSV-Reader-Dataype-casting-improvements
Csv reader dataype casting improvements
2 parents 23631cb + dd981a3 commit 39bba33

File tree

10 files changed

+480
-10
lines changed

10 files changed

+480
-10
lines changed

CHANGELOG.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,23 @@ and this project adheres to [Semantic Versioning](https://semver.org).
99

1010
### Added
1111

12-
- Implementation of the ISREF() information function
12+
- Implementation of the ISREF() information function.
13+
- Added support for reading "formatted" numeric values from Csv files; although default behaviour of reading these values as strings is preserved.
14+
15+
(i.e a value of "12,345.67" will be read as numeric `1235.67`, not as a string `"12,345.67"`.
16+
17+
This functionality is locale-aware, using the server's locale settings to identify the thousands and decimal separators.
1318

1419
### Changed
1520

1621
- Gnumeric Reader now loads number formatting for cells.
1722
- Gnumeric Reader now correctly identifies selected worksheet.
1823
- Some Refactoring of the Ods Reader, moving all formula and address translation from Ods to Excel into a separate class to eliminate code duplication and ensure consistency.
24+
- Make Boolean Conversion in Csv Reader locale-aware when using the String Value Binder.
25+
26+
This is determined b the Calculation Engine locale setting.
27+
28+
(i.e. `"Vrai"` wil be converted to a boolean `true` if the Locale is set to `fr`.)
1929

2030
### Deprecated
2131

@@ -27,7 +37,7 @@ and this project adheres to [Semantic Versioning](https://semver.org).
2737

2838
### Fixed
2939

30-
- Fixed behaviour of XLSX font style vertical align settings
40+
- Fixed behaviour of XLSX font style vertical align settings.
3141
- Resolved formula translations to handle separators (row and column) for array functions as well as for function argument separators; and cleanly handle nesting levels.
3242

3343
Note that this method is used when translating Excel functions between en and other locale languages, as well as when converting formulae between different spreadsheet formats (e.g. Ods to Excel).

docs/topics/accessing-cells.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,7 @@ $spreadsheet->getActiveSheet()
3737
### Creating a new Cell
3838

3939
If you make a call to `getCell()`, and the cell doesn't already exist, then
40-
PhpSpreadsheet will (by default) create the cell for you. If you don't want
41-
to create a new cell, then you can pass a second argument of false, and then
42-
`getCell()` will return a null if the cell doesn't exist.
40+
PhpSpreadsheet will create that cell for you.
4341

4442
### BEWARE: Cells assigned to variables as a Detached Reference
4543

@@ -532,7 +530,7 @@ types of entered data using a cell's `setValue()` method (the
532530
Optionally, the default behaviour of PhpSpreadsheet can be modified,
533531
allowing easier data entry. For example, a
534532
`\PhpOffice\PhpSpreadsheet\Cell\AdvancedValueBinder` class is available.
535-
It automatically converts percentages, number in scientific format, and
533+
It automatically converts percentages, numbers in scientific format, and
536534
dates entered as strings to the correct format, also setting the cell's
537535
style information. The following example demonstrates how to set the
538536
value binder in PhpSpreadsheet:
@@ -577,7 +575,9 @@ $stringValueBinder->setNumericConversion(false)
577575
\PhpOffice\PhpSpreadsheet\Cell\Cell::setValueBinder( $stringValueBinder );
578576
```
579577

580-
**Creating your own value binder is relatively straightforward.** When more specialised
578+
### Creating your own value binder
579+
580+
Creating your own value binder is relatively straightforward. When more specialised
581581
value binding is required, you can implement the
582582
`\PhpOffice\PhpSpreadsheet\Cell\IValueBinder` interface or extend the existing
583583
`\PhpOffice\PhpSpreadsheet\Cell\DefaultValueBinder` or

docs/topics/reading-files.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -560,6 +560,44 @@ Xlsx | NO | Xls | NO | Xml | NO |
560560
Ods | NO | SYLK | NO | Gnumeric | NO |
561561
CSV | YES | HTML | NO
562562

563+
564+
### Reading formatted Numbers from a CSV File
565+
566+
Unfortunately, numbers in a CSV file may be formatted as strings.
567+
If that number is a simple integer or float (with a decimal `.` separator) without any thousands separator, then it will be treated as a number.
568+
However, if the value has a thousands separator (e.g. `12,345`), or a decimal separator that isn't a `.` (e.g. `123,45` for a European locale), then it will be loaded as a string with that formatting.
569+
If you want the Csv Reader to convert that value to a numeric when it loads the file, the you need to tell it to do so. The `castFormattedNumberToNumeric()` lets you do this.
570+
571+
(Assuming that our server is configured with German locale settings: otherwise it may be necessary to call `setlocale()` before loading the file.)
572+
```php
573+
$inputFileType = 'Csv';
574+
$inputFileName = './sampleData/example1.de.csv';
575+
576+
/** It may be necessary to call setlocale() first if this is not your default locale */
577+
// setlocale(LC_ALL, 'de_DE.UTF-8', 'deu_deu');
578+
579+
/** Create a new Reader of the type defined in $inputFileType **/
580+
$reader = \PhpOffice\PhpSpreadsheet\IOFactory::createReader($inputFileType);
581+
/** Enable loading numeric values formatted with German , decimal separator and . thousands separator **/
582+
$reader->castFormattedNumberToNumeric(true);
583+
584+
/** Load the file to a Spreadsheet Object **/
585+
$spreadsheet = $reader->load($inputFileName);
586+
```
587+
This will attempt to load those formatted numeric values as numbers, based on the server's locale settings.
588+
589+
If you want to load those values as numbers, but also to retain the formatting as a number format mask, then you can pass a boolean `true` as a second argument to the `castFormattedNumberToNumeric()` method to tell the Reader to identify the format masking to use for that value. This option does have an arbitrary limit of 6 decimal places.
590+
591+
If your Csv file includes other formats for numbers (currencies, scientific format, etc); then you should probably also use the Advanced Value Binder to handle these cases.
592+
593+
Applies to:
594+
595+
Reader | Y/N |Reader | Y/N |Reader | Y/N |
596+
----------|:---:|--------|:---:|--------------|:---:|
597+
Xlsx | NO | Xls | NO | Xml | NO |
598+
Ods | NO | SYLK | NO | Gnumeric | NO |
599+
CSV | YES | HTML | NO
600+
563601
### A Brief Word about the Advanced Value Binder
564602

565603
When loading data from a file that contains no formatting information,

src/PhpSpreadsheet/Reader/Csv.php

Lines changed: 60 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,14 @@
22

33
namespace PhpOffice\PhpSpreadsheet\Reader;
44

5+
use PhpOffice\PhpSpreadsheet\Calculation\Calculation;
56
use PhpOffice\PhpSpreadsheet\Cell\Cell;
67
use PhpOffice\PhpSpreadsheet\Cell\Coordinate;
78
use PhpOffice\PhpSpreadsheet\Reader\Csv\Delimiter;
89
use PhpOffice\PhpSpreadsheet\Reader\Exception as ReaderException;
910
use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
1011
use PhpOffice\PhpSpreadsheet\Spreadsheet;
12+
use PhpOffice\PhpSpreadsheet\Style\NumberFormat;
1113

1214
class Csv extends BaseReader
1315
{
@@ -91,6 +93,16 @@ class Csv extends BaseReader
9193
*/
9294
private $testAutodetect = true;
9395

96+
/**
97+
* @var bool
98+
*/
99+
protected $castFormattedNumberToNumeric = false;
100+
101+
/**
102+
* @var bool
103+
*/
104+
protected $preserveNumericFormatting = false;
105+
94106
/**
95107
* Create a new CSV Reader instance.
96108
*/
@@ -294,6 +306,14 @@ private function setAutoDetect(?string $value): ?string
294306
return $retVal;
295307
}
296308

309+
public function castFormattedNumberToNumeric(
310+
bool $castFormattedNumberToNumeric,
311+
bool $preserveNumericFormatting = false
312+
): void {
313+
$this->castFormattedNumberToNumeric = $castFormattedNumberToNumeric;
314+
$this->preserveNumericFormatting = $preserveNumericFormatting;
315+
}
316+
297317
/**
298318
* Loads PhpSpreadsheet from file into PhpSpreadsheet instance.
299319
*/
@@ -330,6 +350,7 @@ public function loadIntoExisting(string $filename, Spreadsheet $spreadsheet): Sp
330350
$columnLetter = 'A';
331351
foreach ($rowData as $rowDatum) {
332352
$this->convertBoolean($rowDatum, $preserveBooleanString);
353+
$numberFormatMask = $this->convertFormattedNumber($rowDatum);
333354
if ($rowDatum !== '' && $this->readFilter->readCell($columnLetter, $currentRow)) {
334355
if ($this->contiguous) {
335356
if ($noOutputYet) {
@@ -339,6 +360,10 @@ public function loadIntoExisting(string $filename, Spreadsheet $spreadsheet): Sp
339360
} else {
340361
$outRow = $currentRow;
341362
}
363+
// Set basic styling for the value (Note that this could be overloaded by styling in a value binder)
364+
$sheet->getCell($columnLetter . $outRow)->getStyle()
365+
->getNumberFormat()
366+
->setFormatCode($numberFormatMask);
342367
// Set cell value
343368
$sheet->getCell($columnLetter . $outRow)->setValue($rowDatum);
344369
}
@@ -365,16 +390,49 @@ public function loadIntoExisting(string $filename, Spreadsheet $spreadsheet): Sp
365390
private function convertBoolean(&$rowDatum, bool $preserveBooleanString): void
366391
{
367392
if (is_string($rowDatum) && !$preserveBooleanString) {
368-
if (strcasecmp('true', $rowDatum) === 0) {
393+
if (strcasecmp(Calculation::getTRUE(), $rowDatum) === 0 || strcasecmp('true', $rowDatum) === 0) {
369394
$rowDatum = true;
370-
} elseif (strcasecmp('false', $rowDatum) === 0) {
395+
} elseif (strcasecmp(Calculation::getFALSE(), $rowDatum) === 0 || strcasecmp('false', $rowDatum) === 0) {
371396
$rowDatum = false;
372397
}
373398
} elseif ($rowDatum === null) {
374399
$rowDatum = '';
375400
}
376401
}
377402

403+
/**
404+
* Convert numeric strings to int or float values.
405+
*
406+
* @param mixed $rowDatum
407+
*/
408+
private function convertFormattedNumber(&$rowDatum): string
409+
{
410+
$numberFormatMask = NumberFormat::FORMAT_GENERAL;
411+
if ($this->castFormattedNumberToNumeric === true && is_string($rowDatum)) {
412+
$numeric = str_replace(
413+
[StringHelper::getThousandsSeparator(), StringHelper::getDecimalSeparator()],
414+
['', '.'],
415+
$rowDatum
416+
);
417+
418+
if (is_numeric($numeric)) {
419+
$decimalPos = strpos($rowDatum, StringHelper::getDecimalSeparator());
420+
if ($this->preserveNumericFormatting === true) {
421+
$numberFormatMask = (strpos($rowDatum, StringHelper::getThousandsSeparator()) !== false)
422+
? '#,##0' : '0';
423+
if ($decimalPos !== false) {
424+
$decimals = strlen($rowDatum) - $decimalPos - 1;
425+
$numberFormatMask .= '.' . str_repeat('0', min($decimals, 6));
426+
}
427+
}
428+
429+
$rowDatum = ($decimalPos !== false) ? (float) $numeric : (int) $numeric;
430+
}
431+
}
432+
433+
return $numberFormatMask;
434+
}
435+
378436
public function getDelimiter(): ?string
379437
{
380438
return $this->delimiter;

tests/PhpSpreadsheetTests/Reader/Csv/CsvIssue2232Test.php

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
namespace PhpOffice\PhpSpreadsheetTests\Reader\Csv;
44

5+
use PhpOffice\PhpSpreadsheet\Calculation\Calculation;
56
use PhpOffice\PhpSpreadsheet\Cell\Cell;
67
use PhpOffice\PhpSpreadsheet\Cell\IValueBinder;
78
use PhpOffice\PhpSpreadsheet\Cell\StringValueBinder;
@@ -31,7 +32,7 @@ protected function tearDown(): void
3132
* @param mixed $b2Value
3233
* @param mixed $b3Value
3334
*/
34-
public function testEncodings(bool $useStringBinder, ?bool $preserveBoolString, $b2Value, $b3Value): void
35+
public function testBooleanConversions(bool $useStringBinder, ?bool $preserveBoolString, $b2Value, $b3Value): void
3536
{
3637
if ($useStringBinder) {
3738
$binder = new StringValueBinder();
@@ -60,4 +61,41 @@ public function providerIssue2232(): array
6061
[true, true, 'FaLSe', 'tRUE'],
6162
];
6263
}
64+
65+
/**
66+
* @dataProvider providerIssue2232locale
67+
*
68+
* @param mixed $b4Value
69+
* @param mixed $b5Value
70+
*/
71+
public function testBooleanConversionsLocaleAware(bool $useStringBinder, ?bool $preserveBoolString, $b4Value, $b5Value): void
72+
{
73+
if ($useStringBinder) {
74+
$binder = new StringValueBinder();
75+
if (is_bool($preserveBoolString)) {
76+
$binder->setBooleanConversion($preserveBoolString);
77+
}
78+
Cell::setValueBinder($binder);
79+
}
80+
81+
Calculation::getInstance()->setLocale('fr');
82+
83+
$reader = new Csv();
84+
$filename = 'tests/data/Reader/CSV/issue.2232.csv';
85+
$spreadsheet = $reader->load($filename);
86+
$sheet = $spreadsheet->getActiveSheet();
87+
self::assertSame($b4Value, $sheet->getCell('B4')->getValue());
88+
self::assertSame($b5Value, $sheet->getCell('B5')->getValue());
89+
$spreadsheet->disconnectWorksheets();
90+
}
91+
92+
public function providerIssue2232locale(): array
93+
{
94+
return [
95+
[true, true, 'Faux', 'Vrai'],
96+
[true, true, 'Faux', 'Vrai'],
97+
[false, false, false, true],
98+
[false, false, false, true],
99+
];
100+
}
63101
}

0 commit comments

Comments
 (0)