Skip to content

Commit 3b07468

Browse files
authored
Allow Xlsx Reader to Specify ParseHuge Release390 (#4518)
* Allow Xlsx Reader to Specify ParseHuge Release390 Fix #4260. A number of Security Advisories related to libxml_options were opened. In the end, we disabled the ability to specify any libxml_options. However, some users were adversely affected because they needed LIBXML_PARSEHUGE for some of their files. Having finally obtained access to a file demonstrating this problem, we can restore this ability. - The operation is potentially dangerous, a vector for memory leaks and out-of-memory errors. It is not recommended unless absolutely needed. - It will not be permitted as a global (static) property with the ability to adversely affect other users on the same server. - It will instead be implemented as an instance property of Xlsx Reader (default to false), with a setter. I do not see a use case for a getter. - People will need to set this property individually for each file which they think needs it. - This change will be backported to all supported releases. - The sheer size and processing time for the file involved makes it impractical to add a formal test case. It has, nevertheless, been tested satisfactorily. * Spurious Space * Update CHANGELOG.md
1 parent d893ec3 commit 3b07468

File tree

2 files changed

+25
-7
lines changed

2 files changed

+25
-7
lines changed

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com)
66
and this project adheres to [Semantic Versioning](https://semver.org).
77

8-
# TBD - 3.9.2
8+
# 2025-06-22 - 3.9.2
99

1010
### Changed
1111

@@ -19,6 +19,7 @@ and this project adheres to [Semantic Versioning](https://semver.org).
1919

2020
- TEXT and TIMEVALUE functions. [Issue #4249](https://github.com/PHPOffice/PhpSpreadsheet/issues/4249) [PR #4355](https://github.com/PHPOffice/PhpSpreadsheet/pull/4355)
2121
- Removing Columns/Rows Containing Merged Cells. Backport of [PR #4465](https://github.com/PHPOffice/PhpSpreadsheet/pull/4465)
22+
- Allow Xlsx Reader to Specify ParseHuge. [Issue #4260](https://github.com/PHPOffice/PhpSpreadsheet/issues/4260) [PR #4518](https://github.com/PHPOffice/PhpSpreadsheet/pull/4518)
2223

2324
## 2025-02-07 - 3.9.1
2425

src/PhpSpreadsheet/Reader/Xlsx.php

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,19 @@ class Xlsx extends BaseReader
6060

6161
private array $sharedFormulae = [];
6262

63+
private bool $parseHuge = false;
64+
65+
/**
66+
* Allow use of LIBXML_PARSEHUGE.
67+
* This option can lead to memory leaks and failures,
68+
* and is not recommended. But some very large spreadsheets
69+
* seem to require it.
70+
*/
71+
public function setParseHuge(bool $parseHuge): void
72+
{
73+
$this->parseHuge = $parseHuge;
74+
}
75+
6376
/**
6477
* Create a new Xlsx Reader instance.
6578
*/
@@ -121,8 +134,8 @@ private function loadZip(string $filename, string $ns = '', bool $replaceUnclose
121134
}
122135
$rels = @simplexml_load_string(
123136
$this->getSecurityScannerOrThrow()->scan($contents),
124-
'SimpleXMLElement',
125-
0,
137+
SimpleXMLElement::class,
138+
$this->parseHuge ? LIBXML_PARSEHUGE : 0,
126139
$ns
127140
);
128141

@@ -136,8 +149,8 @@ private function loadZipNonamespace(string $filename, string $ns): SimpleXMLElem
136149
$contents = $this->getFromZipArchive($this->zip, $filename);
137150
$rels = simplexml_load_string(
138151
$this->getSecurityScannerOrThrow()->scan($contents),
139-
'SimpleXMLElement',
140-
0,
152+
SimpleXMLElement::class,
153+
$this->parseHuge ? LIBXML_PARSEHUGE : 0,
141154
($ns === '' ? $ns : '')
142155
);
143156

@@ -250,7 +263,9 @@ public function listWorksheetInfo(string $filename): array
250263
$this->zip,
251264
$fileWorksheetPath
252265
)
253-
)
266+
),
267+
null,
268+
$this->parseHuge ? LIBXML_PARSEHUGE : 0
254269
);
255270
$xml->setParserProperty(2, true);
256271

@@ -2005,7 +2020,9 @@ private function readRibbon(Spreadsheet $excel, string $customUITarget, ZipArchi
20052020
// exists and not empty if the ribbon have some pictures (other than internal MSO)
20062021
$UIRels = simplexml_load_string(
20072022
$this->getSecurityScannerOrThrow()
2008-
->scan($dataRels)
2023+
->scan($dataRels),
2024+
SimpleXMLElement::class,
2025+
$this->parseHuge ? LIBXML_PARSEHUGE : 0
20092026
);
20102027
if (false !== $UIRels) {
20112028
// we need to save id and target to avoid parsing customUI.xml and "guess" if it's a pseudo callback who load the image

0 commit comments

Comments
 (0)