Skip to content

Commit f04bea4

Browse files
authored
gh-90949: add Expat API to prevent XML deadly allocations (CVE-2025-59375) (#139234)
Expose the XML Expat 2.7.2 mitigation APIs to disallow use of disproportional amounts of dynamic memory from within an Expat parser (see CVE-2025-59375 for instance). The exposed APIs are available on Expat parsers, that is, parsers created by `xml.parsers.expat.ParserCreate()`, as: - `parser.SetAllocTrackerActivationThreshold(threshold)`, and - `parser.SetAllocTrackerMaximumAmplification(max_factor)`.
1 parent 0aab07c commit f04bea4

File tree

8 files changed

+583
-29
lines changed

8 files changed

+583
-29
lines changed

Doc/library/pyexpat.rst

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,13 @@ The :mod:`xml.parsers.expat` module contains two functions:
7272
*encoding* [1]_ is given it will override the implicit or explicit encoding of the
7373
document.
7474

75+
.. _xmlparser-non-root:
76+
77+
Parsers created through :func:`!ParserCreate` are called "root" parsers,
78+
in the sense that they do not have any parent parser attached. Non-root
79+
parsers are created by :meth:`parser.ExternalEntityParserCreate
80+
<xmlparser.ExternalEntityParserCreate>`.
81+
7582
Expat can optionally do XML namespace processing for you, enabled by providing a
7683
value for *namespace_separator*. The value must be a one-character string; a
7784
:exc:`ValueError` will be raised if the string has an illegal length (``None``
@@ -231,6 +238,55 @@ XMLParser Objects
231238
.. versionadded:: 3.13
232239

233240

241+
:class:`!xmlparser` objects have the following methods to mitigate some
242+
common XML vulnerabilities.
243+
244+
.. method:: xmlparser.SetAllocTrackerActivationThreshold(threshold, /)
245+
246+
Sets the number of allocated bytes of dynamic memory needed to activate
247+
protection against disproportionate use of RAM.
248+
249+
By default, parser objects have an allocation activation threshold of 64 MiB,
250+
or equivalently 67,108,864 bytes.
251+
252+
An :exc:`ExpatError` is raised if this method is called on a
253+
|xml-non-root-parser| parser.
254+
The corresponding :attr:`~ExpatError.lineno` and :attr:`~ExpatError.offset`
255+
should not be used as they may have no special meaning.
256+
257+
.. versionadded:: next
258+
259+
.. method:: xmlparser.SetAllocTrackerMaximumAmplification(max_factor, /)
260+
261+
Sets the maximum amplification factor between direct input and bytes
262+
of dynamic memory allocated.
263+
264+
The amplification factor is calculated as ``allocated / direct``
265+
while parsing, where ``direct`` is the number of bytes read from
266+
the primary document in parsing and ``allocated`` is the number
267+
of bytes of dynamic memory allocated in the parser hierarchy.
268+
269+
The *max_factor* value must be a non-NaN :class:`float` value greater than
270+
or equal to 1.0. Amplification factors greater than 100.0 can be observed
271+
near the start of parsing even with benign files in practice. In particular,
272+
the activation threshold should be carefully chosen to avoid false positives.
273+
274+
By default, parser objects have a maximum amplification factor of 100.0.
275+
276+
An :exc:`ExpatError` is raised if this method is called on a
277+
|xml-non-root-parser| parser or if *max_factor* is outside the valid range.
278+
The corresponding :attr:`~ExpatError.lineno` and :attr:`~ExpatError.offset`
279+
should not be used as they may have no special meaning.
280+
281+
.. note::
282+
283+
The maximum amplification factor is only considered if the threshold
284+
that can be adjusted :meth:`.SetAllocTrackerActivationThreshold` is
285+
exceeded.
286+
287+
.. versionadded:: next
288+
289+
234290
:class:`xmlparser` objects have the following attributes:
235291

236292

@@ -954,3 +1010,4 @@ The ``errors`` module has the following attributes:
9541010
not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
9551011
and https://www.iana.org/assignments/character-sets/character-sets.xhtml.
9561012
1013+
.. |xml-non-root-parser| replace:: :ref:`non-root <xmlparser-non-root>`

Doc/whatsnew/3.15.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -553,6 +553,16 @@ unittest
553553
(Contributed by Garry Cairns in :gh:`134567`.)
554554

555555

556+
xml.parsers.expat
557+
-----------------
558+
559+
* Add :func:`~xml.parsers.expat.xmlparser.SetAllocTrackerActivationThreshold`
560+
and :func:`~xml.parsers.expat.xmlparser.SetAllocTrackerMaximumAmplification`
561+
to :ref:`xmlparser <xmlparser-objects>` objects to prevent use of
562+
disproportional amounts of dynamic memory from within an Expat parser.
563+
(Contributed by Bénédikt Tran in :gh:`90949`.)
564+
565+
556566
zlib
557567
----
558568

Include/pyexpat.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,11 @@ struct PyExpat_CAPI
5252
int (*SetHashSalt)(XML_Parser parser, unsigned long hash_salt);
5353
/* might be NULL for expat < 2.6.0 */
5454
XML_Bool (*SetReparseDeferralEnabled)(XML_Parser parser, XML_Bool enabled);
55+
/* might be NULL for expat < 2.7.2 */
56+
XML_Bool (*SetAllocTrackerActivationThreshold)(
57+
XML_Parser parser, unsigned long long activationThresholdBytes);
58+
XML_Bool (*SetAllocTrackerMaximumAmplification)(
59+
XML_Parser parser, float maxAmplificationFactor);
5560
/* always add new stuff to the end! */
5661
};
5762

Lib/test/test_pyexpat.py

Lines changed: 199 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,18 @@
11
# XXX TypeErrors on calling handlers, or on bad return values from a
22
# handler, are obscure and unhelpful.
33

4+
import abc
5+
import functools
46
import os
7+
import re
58
import sys
69
import sysconfig
10+
import textwrap
711
import unittest
812
import traceback
913
from io import BytesIO
1014
from test import support
11-
from test.support import os_helper
15+
from test.support import import_helper, os_helper
1216
from test.support import sortdict
1317
from unittest import mock
1418
from xml.parsers import expat
@@ -821,5 +825,199 @@ def start_element(name, _):
821825
self.assertEqual(started, ['doc'])
822826

823827

828+
class AttackProtectionTestBase(abc.ABC):
829+
"""
830+
Base class for testing protections against XML payloads with
831+
disproportionate amplification.
832+
833+
The protections being tested should detect and prevent attacks
834+
that leverage disproportionate amplification from small inputs.
835+
"""
836+
837+
@staticmethod
838+
def exponential_expansion_payload(*, nrows, ncols, text='.'):
839+
"""Create a billion laughs attack payload.
840+
841+
Be careful: the number of total items is pow(n, k), thereby
842+
requiring at least pow(ncols, nrows) * sizeof(text) memory!
843+
"""
844+
template = textwrap.dedent(f"""\
845+
<?xml version="1.0"?>
846+
<!DOCTYPE doc [
847+
<!ENTITY row0 "{text}">
848+
<!ELEMENT doc (#PCDATA)>
849+
{{body}}
850+
]>
851+
<doc>&row{nrows};</doc>
852+
""").rstrip()
853+
854+
body = '\n'.join(
855+
f'<!ENTITY row{i + 1} "{f"&row{i};" * ncols}">'
856+
for i in range(nrows)
857+
)
858+
body = textwrap.indent(body, ' ' * 4)
859+
return template.format(body=body)
860+
861+
def test_payload_generation(self):
862+
# self-test for exponential_expansion_payload()
863+
payload = self.exponential_expansion_payload(nrows=2, ncols=3)
864+
self.assertEqual(payload, textwrap.dedent("""\
865+
<?xml version="1.0"?>
866+
<!DOCTYPE doc [
867+
<!ENTITY row0 ".">
868+
<!ELEMENT doc (#PCDATA)>
869+
<!ENTITY row1 "&row0;&row0;&row0;">
870+
<!ENTITY row2 "&row1;&row1;&row1;">
871+
]>
872+
<doc>&row2;</doc>
873+
""").rstrip())
874+
875+
def assert_root_parser_failure(self, func, /, *args, **kwargs):
876+
"""Check that func(*args, **kwargs) is invalid for a sub-parser."""
877+
msg = "parser must be a root parser"
878+
self.assertRaisesRegex(expat.ExpatError, msg, func, *args, **kwargs)
879+
880+
@abc.abstractmethod
881+
def assert_rejected(self, func, /, *args, **kwargs):
882+
"""Assert that func(*args, **kwargs) triggers the attack protection.
883+
884+
Note: this method must ensure that the attack protection being tested
885+
is the one that is actually triggered at runtime, e.g., by matching
886+
the exact error message.
887+
"""
888+
889+
@abc.abstractmethod
890+
def set_activation_threshold(self, parser, threshold):
891+
"""Set the activation threshold for the tested protection."""
892+
893+
@abc.abstractmethod
894+
def set_maximum_amplification(self, parser, max_factor):
895+
"""Set the maximum amplification factor for the tested protection."""
896+
897+
@abc.abstractmethod
898+
def test_set_activation_threshold__threshold_reached(self):
899+
"""Test when the activation threshold is exceeded."""
900+
901+
@abc.abstractmethod
902+
def test_set_activation_threshold__threshold_not_reached(self):
903+
"""Test when the activation threshold is not exceeded."""
904+
905+
def test_set_activation_threshold__invalid_threshold_type(self):
906+
parser = expat.ParserCreate()
907+
setter = functools.partial(self.set_activation_threshold, parser)
908+
909+
self.assertRaises(TypeError, setter, 1.0)
910+
self.assertRaises(TypeError, setter, -1.5)
911+
self.assertRaises(ValueError, setter, -5)
912+
913+
def test_set_activation_threshold__invalid_threshold_range(self):
914+
_testcapi = import_helper.import_module("_testcapi")
915+
parser = expat.ParserCreate()
916+
setter = functools.partial(self.set_activation_threshold, parser)
917+
918+
self.assertRaises(OverflowError, setter, _testcapi.ULLONG_MAX + 1)
919+
920+
def test_set_activation_threshold__fail_for_subparser(self):
921+
parser = expat.ParserCreate()
922+
subparser = parser.ExternalEntityParserCreate(None)
923+
setter = functools.partial(self.set_activation_threshold, subparser)
924+
self.assert_root_parser_failure(setter, 12345)
925+
926+
@abc.abstractmethod
927+
def test_set_maximum_amplification__amplification_exceeded(self):
928+
"""Test when the amplification factor is exceeded."""
929+
930+
@abc.abstractmethod
931+
def test_set_maximum_amplification__amplification_not_exceeded(self):
932+
"""Test when the amplification factor is not exceeded."""
933+
934+
def test_set_maximum_amplification__infinity(self):
935+
inf = float('inf') # an 'inf' threshold is allowed by Expat
936+
parser = expat.ParserCreate()
937+
self.assertIsNone(self.set_maximum_amplification(parser, inf))
938+
939+
def test_set_maximum_amplification__invalid_max_factor_type(self):
940+
parser = expat.ParserCreate()
941+
setter = functools.partial(self.set_maximum_amplification, parser)
942+
943+
self.assertRaises(TypeError, setter, None)
944+
self.assertRaises(TypeError, setter, 'abc')
945+
946+
def test_set_maximum_amplification__invalid_max_factor_range(self):
947+
parser = expat.ParserCreate()
948+
setter = functools.partial(self.set_maximum_amplification, parser)
949+
950+
msg = re.escape("'max_factor' must be at least 1.0")
951+
self.assertRaisesRegex(expat.ExpatError, msg, setter, float('nan'))
952+
self.assertRaisesRegex(expat.ExpatError, msg, setter, 0.99)
953+
954+
def test_set_maximum_amplification__fail_for_subparser(self):
955+
parser = expat.ParserCreate()
956+
subparser = parser.ExternalEntityParserCreate(None)
957+
setter = functools.partial(self.set_maximum_amplification, subparser)
958+
self.assert_root_parser_failure(setter, 123.45)
959+
960+
961+
@unittest.skipIf(expat.version_info < (2, 7, 2), "requires Expat >= 2.7.2")
962+
class MemoryProtectionTest(AttackProtectionTestBase, unittest.TestCase):
963+
964+
# NOTE: with the default Expat configuration, the billion laughs protection
965+
# may hit before the allocation limiter if exponential_expansion_payload()
966+
# is not carefully parametrized. As such, the payloads should be chosen so
967+
# that either the allocation limiter is hit before other protections are
968+
# triggered or no protection at all is triggered.
969+
970+
def assert_rejected(self, func, /, *args, **kwargs):
971+
"""Check that func(*args, **kwargs) hits the allocation limit."""
972+
msg = r"out of memory: line \d+, column \d+"
973+
self.assertRaisesRegex(expat.ExpatError, msg, func, *args, **kwargs)
974+
975+
def set_activation_threshold(self, parser, threshold):
976+
return parser.SetAllocTrackerActivationThreshold(threshold)
977+
978+
def set_maximum_amplification(self, parser, max_factor):
979+
return parser.SetAllocTrackerMaximumAmplification(max_factor)
980+
981+
def test_set_activation_threshold__threshold_reached(self):
982+
parser = expat.ParserCreate()
983+
# Choose a threshold expected to be always reached.
984+
self.set_activation_threshold(parser, 3)
985+
# Check that the threshold is reached by choosing a small factor
986+
# and a payload whose peak amplification factor exceeds it.
987+
self.assertIsNone(self.set_maximum_amplification(parser, 1.0))
988+
payload = self.exponential_expansion_payload(ncols=10, nrows=4)
989+
self.assert_rejected(parser.Parse, payload, True)
990+
991+
def test_set_activation_threshold__threshold_not_reached(self):
992+
parser = expat.ParserCreate()
993+
# Choose a threshold expected to be never reached.
994+
self.set_activation_threshold(parser, pow(10, 5))
995+
# Check that the threshold is reached by choosing a small factor
996+
# and a payload whose peak amplification factor exceeds it.
997+
self.assertIsNone(self.set_maximum_amplification(parser, 1.0))
998+
payload = self.exponential_expansion_payload(ncols=10, nrows=4)
999+
self.assertIsNotNone(parser.Parse(payload, True))
1000+
1001+
def test_set_maximum_amplification__amplification_exceeded(self):
1002+
parser = expat.ParserCreate()
1003+
# Unconditionally enable maximum activation factor.
1004+
self.set_activation_threshold(parser, 0)
1005+
# Choose a max amplification factor expected to always be exceeded.
1006+
self.assertIsNone(self.set_maximum_amplification(parser, 1.0))
1007+
# Craft a payload for which the peak amplification factor is > 1.0.
1008+
payload = self.exponential_expansion_payload(ncols=1, nrows=2)
1009+
self.assert_rejected(parser.Parse, payload, True)
1010+
1011+
def test_set_maximum_amplification__amplification_not_exceeded(self):
1012+
parser = expat.ParserCreate()
1013+
# Unconditionally enable maximum activation factor.
1014+
self.set_activation_threshold(parser, 0)
1015+
# Choose a max amplification factor expected to never be exceeded.
1016+
self.assertIsNone(self.set_maximum_amplification(parser, 1e4))
1017+
# Craft a payload for which the peak amplification factor is < 1e4.
1018+
payload = self.exponential_expansion_payload(ncols=1, nrows=2)
1019+
self.assertIsNotNone(parser.Parse(payload, True))
1020+
1021+
8241022
if __name__ == "__main__":
8251023
unittest.main()
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
Add :func:`~xml.parsers.expat.xmlparser.SetAllocTrackerActivationThreshold`
2+
and :func:`~xml.parsers.expat.xmlparser.SetAllocTrackerMaximumAmplification`
3+
to :ref:`xmlparser <xmlparser-objects>` objects to prevent use of
4+
disproportional amounts of dynamic memory from within an Expat parser.
5+
Patch by Bénédikt Tran.

0 commit comments

Comments
 (0)