Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
c1c23fb
Expose XML Expat 2.7.2 mitigation APIs
picnixz Sep 22, 2025
12bef9c
add tests
picnixz Sep 22, 2025
1d7e599
docs
picnixz Sep 22, 2025
3dcd9bd
NEWS
picnixz Sep 22, 2025
192fe08
Merge branch 'main' into feat/xml/mitigation-api-90949
picnixz Sep 22, 2025
0ecbd55
fix docs
picnixz Sep 22, 2025
07445ad
fix tests
picnixz Sep 22, 2025
9c7371f
regen SBOM
picnixz Sep 22, 2025
1085584
remove unused include
picnixz Sep 22, 2025
c10fe91
fix possible error handling
picnixz Sep 22, 2025
18d175f
undef macro after usage
picnixz Sep 22, 2025
911b2b7
Update Lib/test/test_pyexpat.py
picnixz Sep 22, 2025
d636685
update comments
picnixz Sep 22, 2025
b951065
use better test names
picnixz Sep 22, 2025
e11bf14
simplify roles usage
picnixz Sep 22, 2025
3e45613
prevent reparse deferral of Expat to blow up
picnixz Sep 22, 2025
fb83fb5
test better numeric values
picnixz Sep 22, 2025
7f91f2e
update docs
picnixz Sep 22, 2025
64af05c
avoid deprecated `XML_GetError{Line,Column}Number`
picnixz Sep 22, 2025
b01e53d
raise `NotImplementedError` for unavailable mitigation APIs
picnixz Sep 22, 2025
cd040bf
improve various wordings
picnixz Sep 23, 2025
a09cd15
avoid SBOM alteration
picnixz Sep 23, 2025
a3fc3b3
regen files
picnixz Sep 23, 2025
5afe1ad
improve test assertions
picnixz Sep 23, 2025
b6949dd
improve test genericity
picnixz Sep 23, 2025
bdbd382
split tests even if they could end up duplicated
picnixz Sep 23, 2025
0c03735
raise AttributeError when method is not available and reorder interface
picnixz Sep 23, 2025
220f3e2
reoder docs
picnixz Sep 23, 2025
9d538e4
address review
picnixz Sep 24, 2025
209f300
update capsule API
picnixz Sep 24, 2025
8407cfc
amend docs
picnixz Sep 24, 2025
e898811
use "NOTE:"
picnixz Sep 24, 2025
ce8cb48
address final comments
picnixz Sep 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions Doc/library/pyexpat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,14 @@
*encoding* [1]_ is given it will override the implicit or explicit encoding of the
document.

.. _xmlparser-non-root:

Parsers created through :func:`!ParserCreate` are called "root" parsers,
in the sense that they do not have any parent parser attached. Non-root
parsers are created by :meth:`parser.ExternalEntityParserCreate
<xmlparser.ExternalEntityParserCreate>`.

Expat can optionally do XML namespace processing for you, enabled by providing a

Check warning on line 82 in Doc/library/pyexpat.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: EndElementHandler [ref.attr]

Check warning on line 82 in Doc/library/pyexpat.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: StartElementHandler [ref.attr]
value for *namespace_separator*. The value must be a one-character string; a
:exc:`ValueError` will be raised if the string has an illegal length (``None``
is considered the same as omission). When namespace processing is enabled,
Expand Down Expand Up @@ -231,7 +238,56 @@
.. versionadded:: 3.13


:class:`!xmlparser` objects have the following methods to mitigate some
common XML vulnerabilities.

.. method:: xmlparser.SetAllocTrackerActivationThreshold(threshold, /)

Sets the number of allocated bytes of dynamic memory needed to activate
protection against disproportionate use of RAM.

By default, parser objects have an allocation activation threshold of 64 MiB,
or equivalently 67,108,864 bytes.

An :exc:`ExpatError` is raised if this method is called on a
|xml-non-root-parser| parser.
The corresponding :attr:`~ExpatError.lineno` and :attr:`~ExpatError.offset`
should not be used as they may have no special meaning.

.. versionadded:: next

.. method:: xmlparser.SetAllocTrackerMaximumAmplification(max_factor, /)

Sets the maximum amplification factor between direct input and bytes
of dynamic memory allocated.

The amplification factor is calculated as ``allocated / direct``
while parsing, where ``direct`` is the number of bytes read from
the primary document in parsing and ``allocated`` is the number
of bytes of dynamic memory allocated in the parser hierarchy.

The *max_factor* value must be a non-NaN :class:`float` value greater than
or equal to 1.0. Amplification factors greater than 100.0 can be observed
near the start of parsing even with benign files in practice. In particular,
the activation threshold should be carefully chosen to avoid false positives.

By default, parser objects have a maximum amplification factor of 100.0.

An :exc:`ExpatError` is raised if this method is called on a
|xml-non-root-parser| parser or if *max_factor* is outside the valid range.
The corresponding :attr:`~ExpatError.lineno` and :attr:`~ExpatError.offset`
should not be used as they may have no special meaning.

.. note::

The maximum amplification factor is only considered if the threshold
that can be adjusted :meth:`.SetAllocTrackerActivationThreshold` is
exceeded.

.. versionadded:: next


:class:`xmlparser` objects have the following attributes:

Check warning on line 290 in Doc/library/pyexpat.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:class reference target not found: xmlparser [ref.class]


.. attribute:: xmlparser.buffer_size
Expand Down Expand Up @@ -954,3 +1010,4 @@
not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
and https://www.iana.org/assignments/character-sets/character-sets.xhtml.

.. |xml-non-root-parser| replace:: :ref:`non-root <xmlparser-non-root>`
10 changes: 10 additions & 0 deletions Doc/whatsnew/3.15.rst
Original file line number Diff line number Diff line change
Expand Up @@ -540,6 +540,16 @@ unittest
(Contributed by Garry Cairns in :gh:`134567`.)


xml.parsers.expat
-----------------

* Add :func:`~xml.parsers.expat.xmlparser.SetAllocTrackerActivationThreshold`
and :func:`~xml.parsers.expat.xmlparser.SetAllocTrackerMaximumAmplification`
to :ref:`xmlparser <xmlparser-objects>` objects to prevent use of
disproportional amounts of dynamic memory from within an Expat parser.
(Contributed by Bénédikt Tran in :gh:`90949`.)


zlib
----

Expand Down
5 changes: 5 additions & 0 deletions Include/pyexpat.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,11 @@ struct PyExpat_CAPI
int (*SetHashSalt)(XML_Parser parser, unsigned long hash_salt);
/* might be NULL for expat < 2.6.0 */
XML_Bool (*SetReparseDeferralEnabled)(XML_Parser parser, XML_Bool enabled);
/* might be NULL for expat < 2.7.2 */
XML_Bool (*SetAllocTrackerActivationThreshold)(
XML_Parser parser, unsigned long long activationThresholdBytes);
XML_Bool (*SetAllocTrackerMaximumAmplification)(
XML_Parser parser, float maxAmplificationFactor);
/* always add new stuff to the end! */
};

200 changes: 199 additions & 1 deletion Lib/test/test_pyexpat.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
# XXX TypeErrors on calling handlers, or on bad return values from a
# handler, are obscure and unhelpful.

import abc
import functools
import os
import re
import sys
import sysconfig
import textwrap
import unittest
import traceback
from io import BytesIO
from test import support
from test.support import os_helper
from test.support import import_helper, os_helper
from test.support import sortdict
from unittest import mock
from xml.parsers import expat
Expand Down Expand Up @@ -821,5 +825,199 @@ def start_element(name, _):
self.assertEqual(started, ['doc'])


class AttackProtectionTestBase(abc.ABC):
"""
Base class for testing protections against XML payloads with
disproportionate amplification.

The protections being tested should detect and prevent attacks
that leverage disproportionate amplification from small inputs.
"""

@staticmethod
def exponential_expansion_payload(*, nrows, ncols, text='.'):
"""Create a billion laughs attack payload.

Be careful: the number of total items is pow(n, k), thereby
requiring at least pow(ncols, nrows) * sizeof(text) memory!
"""
template = textwrap.dedent(f"""\
<?xml version="1.0"?>
<!DOCTYPE doc [
<!ENTITY row0 "{text}">
<!ELEMENT doc (#PCDATA)>
{{body}}
]>
<doc>&row{nrows};</doc>
""").rstrip()

body = '\n'.join(
f'<!ENTITY row{i + 1} "{f"&row{i};" * ncols}">'
for i in range(nrows)
)
body = textwrap.indent(body, ' ' * 4)
return template.format(body=body)

def test_payload_generation(self):
# self-test for exponential_expansion_payload()
payload = self.exponential_expansion_payload(nrows=2, ncols=3)
self.assertEqual(payload, textwrap.dedent("""\
<?xml version="1.0"?>
<!DOCTYPE doc [
<!ENTITY row0 ".">
<!ELEMENT doc (#PCDATA)>
<!ENTITY row1 "&row0;&row0;&row0;">
<!ENTITY row2 "&row1;&row1;&row1;">
]>
<doc>&row2;</doc>
""").rstrip())

def assert_root_parser_failure(self, func, /, *args, **kwargs):
"""Check that func(*args, **kwargs) is invalid for a sub-parser."""
msg = "parser must be a root parser"
self.assertRaisesRegex(expat.ExpatError, msg, func, *args, **kwargs)

@abc.abstractmethod
def assert_rejected(self, func, /, *args, **kwargs):
"""Assert that func(*args, **kwargs) triggers the attack protection.

Note: this method must ensure that the attack protection being tested
is the one that is actually triggered at runtime, e.g., by matching
the exact error message.
"""

@abc.abstractmethod
def set_activation_threshold(self, parser, threshold):
"""Set the activation threshold for the tested protection."""

@abc.abstractmethod
def set_maximum_amplification(self, parser, max_factor):
"""Set the maximum amplification factor for the tested protection."""

@abc.abstractmethod
def test_set_activation_threshold__threshold_reached(self):
"""Test when the activation threshold is exceeded."""

@abc.abstractmethod
def test_set_activation_threshold__threshold_not_reached(self):
"""Test when the activation threshold is not exceeded."""

def test_set_activation_threshold__invalid_threshold_type(self):
parser = expat.ParserCreate()
setter = functools.partial(self.set_activation_threshold, parser)

self.assertRaises(TypeError, setter, 1.0)
self.assertRaises(TypeError, setter, -1.5)
self.assertRaises(ValueError, setter, -5)

def test_set_activation_threshold__invalid_threshold_range(self):
_testcapi = import_helper.import_module("_testcapi")
parser = expat.ParserCreate()
setter = functools.partial(self.set_activation_threshold, parser)

self.assertRaises(OverflowError, setter, _testcapi.ULLONG_MAX + 1)

def test_set_activation_threshold__fail_for_subparser(self):
parser = expat.ParserCreate()
subparser = parser.ExternalEntityParserCreate(None)
setter = functools.partial(self.set_activation_threshold, subparser)
self.assert_root_parser_failure(setter, 12345)

@abc.abstractmethod
def test_set_maximum_amplification__amplification_exceeded(self):
"""Test when the amplification factor is exceeded."""

@abc.abstractmethod
def test_set_maximum_amplification__amplification_not_exceeded(self):
"""Test when the amplification factor is not exceeded."""

def test_set_maximum_amplification__infinity(self):
inf = float('inf') # an 'inf' threshold is allowed by Expat
parser = expat.ParserCreate()
self.assertIsNone(self.set_maximum_amplification(parser, inf))

def test_set_maximum_amplification__invalid_max_factor_type(self):
parser = expat.ParserCreate()
setter = functools.partial(self.set_maximum_amplification, parser)

self.assertRaises(TypeError, setter, None)
self.assertRaises(TypeError, setter, 'abc')

def test_set_maximum_amplification__invalid_max_factor_range(self):
parser = expat.ParserCreate()
setter = functools.partial(self.set_maximum_amplification, parser)

msg = re.escape("'max_factor' must be at least 1.0")
self.assertRaisesRegex(expat.ExpatError, msg, setter, float('nan'))
self.assertRaisesRegex(expat.ExpatError, msg, setter, 0.99)

def test_set_maximum_amplification__fail_for_subparser(self):
parser = expat.ParserCreate()
subparser = parser.ExternalEntityParserCreate(None)
setter = functools.partial(self.set_maximum_amplification, subparser)
self.assert_root_parser_failure(setter, 123.45)


@unittest.skipIf(expat.version_info < (2, 7, 2), "requires Expat >= 2.7.2")
class MemoryProtectionTest(AttackProtectionTestBase, unittest.TestCase):

# NOTE: with the default Expat configuration, the billion laughs protection
# may hit before the allocation limiter if exponential_expansion_payload()
# is not carefully parametrized. As such, the payloads should be chosen so
# that either the allocation limiter is hit before other protections are
# triggered or no protection at all is triggered.

def assert_rejected(self, func, /, *args, **kwargs):
"""Check that func(*args, **kwargs) hits the allocation limit."""
msg = r"out of memory: line \d+, column \d+"
self.assertRaisesRegex(expat.ExpatError, msg, func, *args, **kwargs)

def set_activation_threshold(self, parser, threshold):
return parser.SetAllocTrackerActivationThreshold(threshold)

def set_maximum_amplification(self, parser, max_factor):
return parser.SetAllocTrackerMaximumAmplification(max_factor)

def test_set_activation_threshold__threshold_reached(self):
parser = expat.ParserCreate()
# Choose a threshold expected to be always reached.
self.set_activation_threshold(parser, 3)
# Check that the threshold is reached by choosing a small factor
# and a payload whose peak amplification factor exceeds it.
self.assertIsNone(self.set_maximum_amplification(parser, 1.0))
payload = self.exponential_expansion_payload(ncols=10, nrows=4)
self.assert_rejected(parser.Parse, payload, True)

def test_set_activation_threshold__threshold_not_reached(self):
parser = expat.ParserCreate()
# Choose a threshold expected to be never reached.
self.set_activation_threshold(parser, pow(10, 5))
# Check that the threshold is reached by choosing a small factor
# and a payload whose peak amplification factor exceeds it.
self.assertIsNone(self.set_maximum_amplification(parser, 1.0))
payload = self.exponential_expansion_payload(ncols=10, nrows=4)
self.assertIsNotNone(parser.Parse(payload, True))

def test_set_maximum_amplification__amplification_exceeded(self):
parser = expat.ParserCreate()
# Unconditionally enable maximum activation factor.
self.set_activation_threshold(parser, 0)
# Choose a max amplification factor expected to always be exceeded.
self.assertIsNone(self.set_maximum_amplification(parser, 1.0))
# Craft a payload for which the peak amplification factor is > 1.0.
payload = self.exponential_expansion_payload(ncols=1, nrows=2)
self.assert_rejected(parser.Parse, payload, True)

def test_set_maximum_amplification__amplification_not_exceeded(self):
parser = expat.ParserCreate()
# Unconditionally enable maximum activation factor.
self.set_activation_threshold(parser, 0)
# Choose a max amplification factor expected to never be exceeded.
self.assertIsNone(self.set_maximum_amplification(parser, 1e4))
# Craft a payload for which the peak amplification factor is < 1e4.
payload = self.exponential_expansion_payload(ncols=1, nrows=2)
self.assertIsNotNone(parser.Parse(payload, True))


if __name__ == "__main__":
unittest.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Add :func:`~xml.parsers.expat.xmlparser.SetAllocTrackerActivationThreshold`
and :func:`~xml.parsers.expat.xmlparser.SetAllocTrackerMaximumAmplification`
to :ref:`xmlparser <xmlparser-objects>` objects to prevent use of
disproportional amounts of dynamic memory from within an Expat parser.
Patch by Bénédikt Tran.
Loading
Loading