Skip to content

Commit 9b122cd

Browse files
committed
Use Composer/Pcre Part 1 of Many
The native preg functions (preg_match, preg_replace, etc.) often require us to add a lot of useless boilerplate code to satisfy Phpstan, Scrutinizer, etc. Composer/Pcre offers us a way to remove that boilerplate, thereby giving us a cleaner codebase. I decided to try it on a few modules, and saw a result that clearly demonstrated the usefulness of doing this (aside from the cleaner codebase). Sample 22_Reader_issue1767 reads an Xlsx spreadsheet with complex sheet names used in defined names, and writes it to Xlsx and Xls output files. When I changed Writer/Xls/Parser to use Composer/Pcre, this sample failed in many different places writing the Xls file. It turns out that some regexes were failing not because the string didn't match, but because the regex encountered "catastrophic backtracing". Composer/Pcre throws an exception when this happens; the native preg_match does return false, but we were not checking for that. The regexes in question are now changed to something which works, and formal unit tests are added for them. Finding this previously undetected error indicates that we should proceed with this change. An alternative to using Composer/Pcre would be to test for false after all the preg calls. I have done this in the two samples changed with this PR. That seems adequate for a small number of changes, but it really just makes for more clutter considering the large number of regexps that we use in our code. I think Composer/Pcre is a better choice. It isn't quite transparent. Composer forces all regexps to use PREG_UNMATCHED_AS_NULL, so some match fields will now be null instead of null-string (or non-existent if the unmatched field comes at the end). Our test suite doesn't report any problem (yet) due to this change, although Phpstan is sensitive to it. Several Phpstan annotations were eliminated due to this change, but some others are now needed. It is not necessary to do this all at once. This PR addresses all the calls in Writer. I intend to address other components in several tickets.
1 parent fb757cf commit 9b122cd

File tree

18 files changed

+387
-191
lines changed

18 files changed

+387
-191
lines changed

composer.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@
7979
"ext-xmlwriter": "*",
8080
"ext-zip": "*",
8181
"ext-zlib": "*",
82+
"composer/pcre": "^3.3",
8283
"maennchen/zipstream-php": "^2.1 || ^3.0",
8384
"markbaker/complex": "^3.0",
8485
"markbaker/matrix": "^3.0",

composer.lock

Lines changed: 80 additions & 80 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

samples/Pdf/21b_Pdf.php

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
<?php
22

3+
use PhpOffice\PhpSpreadsheet\Exception as SpreadsheetException;
34
use PhpOffice\PhpSpreadsheet\IOFactory;
45
use PhpOffice\PhpSpreadsheet\Worksheet\PageSetup;
56
use PhpOffice\PhpSpreadsheet\Writer\Pdf\Dompdf;
@@ -21,7 +22,7 @@ function replaceBody(string $html): string
2122
</body>
2223
EOF;
2324

24-
return preg_replace($bodystring, $bodyrepl, $html) ?? '';
25+
return preg_replace($bodystring, $bodyrepl, $html) ?? throw new SpreadsheetException('preg failed');
2526
}
2627

2728
require __DIR__ . '/../Header.php';

samples/Pdf/21c_Pdf.php

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
<?php
22

3+
use PhpOffice\PhpSpreadsheet\Exception as SpreadsheetException;
34
use PhpOffice\PhpSpreadsheet\IOFactory;
45
use PhpOffice\PhpSpreadsheet\Spreadsheet;
56
use PhpOffice\PhpSpreadsheet\Writer\Pdf\Mpdf;
@@ -16,7 +17,7 @@ function addHeadersFootersMpdf2000(string $html): string
1617
odd-footer-name: html_myFooter2;
1718
1819
EOF;
19-
$html = preg_replace('/@page page0 {/', $pagerepl, $html) ?? '';
20+
$html = preg_replace('/@page page0 {/', $pagerepl, $html) ?? throw new SpreadsheetException('preg 1 failed');
2021
$bodystring = '/<body>/';
2122
$simulatedBodyStart = Mpdf::SIMULATED_BODY_START;
2223
$bodyrepl = <<<EOF
@@ -40,7 +41,7 @@ function addHeadersFootersMpdf2000(string $html): string
4041
4142
EOF;
4243

43-
return preg_replace($bodystring, $bodyrepl, $html) ?? '';
44+
return preg_replace($bodystring, $bodyrepl, $html) ?? throw new SpreadsheetException('preg 2 failed');
4445
}
4546

4647
$spreadsheet = new Spreadsheet();

src/PhpSpreadsheet/Shared/StringHelper.php

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -638,4 +638,9 @@ public static function testStringAsNumeric(string $textValue): float|string
638638

639639
return (is_numeric(substr($textValue, 0, strlen((string) $v)))) ? $v : $textValue;
640640
}
641+
642+
public static function strlenAllowNull(?string $string): int
643+
{
644+
return strlen("$string");
645+
}
641646
}

src/PhpSpreadsheet/Writer/Html.php

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
namespace PhpOffice\PhpSpreadsheet\Writer;
44

5+
use Composer\Pcre\Preg;
56
use PhpOffice\PhpSpreadsheet\Calculation\Calculation;
67
use PhpOffice\PhpSpreadsheet\Cell\Cell;
78
use PhpOffice\PhpSpreadsheet\Cell\Coordinate;
@@ -646,13 +647,13 @@ private function writeImageInCell(string $coordinates): string
646647
$filename = $drawing->getPath();
647648

648649
// Strip off eventual '.'
649-
$filename = (string) preg_replace('/^[.]/', '', $filename);
650+
$filename = Preg::replace('/^[.]/', '', $filename);
650651

651652
// Prepend images root
652653
$filename = $this->getImagesRoot() . $filename;
653654

654655
// Strip off eventual '.' if followed by non-/
655-
$filename = (string) preg_replace('@^[.]([^/])@', '$1', $filename);
656+
$filename = Preg::replace('@^[.]([^/])@', '$1', $filename);
656657

657658
// Convert UTF8 data to PCDATA
658659
$filename = htmlspecialchars($filename, Settings::htmlEntityFlags());
@@ -1411,7 +1412,7 @@ private function generateRowCellData(Worksheet $worksheet, null|Cell|string $cel
14111412

14121413
// Converts the cell content so that spaces occuring at beginning of each new line are replaced by &nbsp;
14131414
// Example: " Hello\n to the world" is converted to "&nbsp;&nbsp;Hello\n&nbsp;to the world"
1414-
$cellData = (string) preg_replace('/(?m)(?:^|\\G) /', '&nbsp;', $cellData);
1415+
$cellData = Preg::replace('/(?m)(?:^|\\G) /', '&nbsp;', $cellData);
14151416

14161417
// convert newline "\n" to '<br>'
14171418
$cellData = nl2br($cellData);
@@ -1587,9 +1588,9 @@ private function generateRow(Worksheet $worksheet, array $values, int $row, stri
15871588
if ($worksheet->hyperlinkExists($coordinate) && !$worksheet->getHyperlink($coordinate)->isInternal()) {
15881589
$url = $worksheet->getHyperlink($coordinate)->getUrl();
15891590
$urlDecode1 = html_entity_decode($url, ENT_QUOTES | ENT_SUBSTITUTE, 'UTF-8');
1590-
$urlTrim = preg_replace('/^\\s+/u', '', $urlDecode1) ?? $urlDecode1;
1591-
$parseScheme = preg_match('/^([\\w\\s]+):/u', strtolower($urlTrim), $matches);
1592-
if ($parseScheme === 1 && !in_array($matches[1], ['http', 'https', 'file', 'ftp', 's3'], true)) {
1591+
$urlTrim = Preg::replace('/^\\s+/u', '', $urlDecode1);
1592+
$parseScheme = Preg::isMatch('/^([\\w\\s]+):/u', strtolower($urlTrim), $matches);
1593+
if ($parseScheme && !in_array($matches[1], ['http', 'https', 'file', 'ftp', 's3'], true)) {
15931594
$cellData = htmlspecialchars($url, Settings::htmlEntityFlags());
15941595
} else {
15951596
$cellData = '<a href="' . htmlspecialchars($url, Settings::htmlEntityFlags()) . '" title="' . htmlspecialchars($worksheet->getHyperlink($coordinate)->getTooltip(), Settings::htmlEntityFlags()) . '">' . $cellData . '</a>';
@@ -1738,8 +1739,8 @@ public static function formatColorStatic(string $value, string $format): string
17381739
$matches = [];
17391740

17401741
$color_regex = '/^\\[[a-zA-Z]+\\]/';
1741-
if (preg_match($color_regex, $format, $matches)) {
1742-
$color = str_replace(['[', ']'], '', $matches[0]);
1742+
if (Preg::isMatch($color_regex, $format, $matches)) {
1743+
$color = str_replace(['[', ']'], '', $matches[0]); // @phpstan-ignore-line
17431744
$color = strtolower($color);
17441745
}
17451746

src/PhpSpreadsheet/Writer/Ods/Content.php

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
namespace PhpOffice\PhpSpreadsheet\Writer\Ods;
44

5+
use Composer\Pcre\Preg;
56
use PhpOffice\PhpSpreadsheet\Calculation\Calculation;
67
use PhpOffice\PhpSpreadsheet\Calculation\Exception as CalculationException;
78
use PhpOffice\PhpSpreadsheet\Cell\Cell;
@@ -227,16 +228,16 @@ private function writeCells(XMLWriter $objWriter, RowCellIterator $cells): void
227228
}
228229
}
229230
if (isset($attributes['ref'])) {
230-
if (preg_match('/^([A-Z]{1,3})([0-9]{1,7})(:([A-Z]{1,3})([0-9]{1,7}))?$/', (string) $attributes['ref'], $matches) == 1) {
231+
if (Preg::isMatch('/^([A-Z]{1,3})([0-9]{1,7})(:([A-Z]{1,3})([0-9]{1,7}))?$/', (string) $attributes['ref'], $matches)) {
231232
$matrixRowSpan = 1;
232233
$matrixColSpan = 1;
233234
if (isset($matches[3])) {
234235
$minRow = (int) $matches[2];
235236
// https://github.com/phpstan/phpstan/issues/11602
236-
$maxRow = (int) $matches[5]; // @phpstan-ignore-line
237+
$maxRow = (int) $matches[5];
237238
$matrixRowSpan = $maxRow - $minRow + 1;
238239
$minCol = Coordinate::columnIndexFromString($matches[1]);
239-
$maxCol = Coordinate::columnIndexFromString($matches[4]); // @phpstan-ignore-line
240+
$maxCol = Coordinate::columnIndexFromString($matches[4]);
240241
$matrixColSpan = $maxCol - $minCol + 1;
241242
}
242243
$objWriter->writeAttribute('table:number-matrix-columns-spanned', "$matrixColSpan");

src/PhpSpreadsheet/Writer/Ods/Formula.php

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,10 @@
22

33
namespace PhpOffice\PhpSpreadsheet\Writer\Ods;
44

5+
use Composer\Pcre\Preg;
56
use PhpOffice\PhpSpreadsheet\Calculation\Calculation;
67
use PhpOffice\PhpSpreadsheet\DefinedName;
8+
use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
79

810
class Formula
911
{
@@ -33,14 +35,13 @@ public function convertFormula(string $formula, string $worksheetName = ''): str
3335

3436
private function convertDefinedNames(string $formula): string
3537
{
36-
$splitCount = preg_match_all(
38+
$splitCount = Preg::matchAllWithOffsets(
3739
'/' . Calculation::CALCULATION_REGEXP_DEFINEDNAME . '/mui',
3840
$formula,
39-
$splitRanges,
40-
PREG_OFFSET_CAPTURE
41+
$splitRanges
4142
);
4243

43-
$lengths = array_map('strlen', array_column($splitRanges[0], 0));
44+
$lengths = array_map([StringHelper::class, 'strlenAllowNull'], array_column($splitRanges[0], 0));
4445
$offsets = array_column($splitRanges[0], 1);
4546
$values = array_column($splitRanges[0], 0);
4647

@@ -60,14 +61,13 @@ private function convertDefinedNames(string $formula): string
6061

6162
private function convertCellReferences(string $formula, string $worksheetName): string
6263
{
63-
$splitCount = preg_match_all(
64+
$splitCount = Preg::matchAllWithOffsets(
6465
'/' . Calculation::CALCULATION_REGEXP_CELLREF_RELATIVE . '/mui',
6566
$formula,
66-
$splitRanges,
67-
PREG_OFFSET_CAPTURE
67+
$splitRanges
6868
);
6969

70-
$lengths = array_map('strlen', array_column($splitRanges[0], 0));
70+
$lengths = array_map([StringHelper::class, 'strlenAllowNull'], array_column($splitRanges[0], 0));
7171
$offsets = array_column($splitRanges[0], 1);
7272

7373
$worksheets = $splitRanges[2];
@@ -76,7 +76,7 @@ private function convertCellReferences(string $formula, string $worksheetName):
7676

7777
// Replace any commas in the formula with semi-colons for Ods
7878
// If by chance there are commas in worksheet names, then they will be "fixed" again in the loop
79-
// because we've already extracted worksheet names with our preg_match_all()
79+
// because we've already extracted worksheet names with our Preg::matchAllWithOffsets()
8080
$formula = str_replace(',', ';', $formula);
8181
while ($splitCount > 0) {
8282
--$splitCount;

0 commit comments

Comments
 (0)