Skip to content
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
c1c23fb
Expose XML Expat 2.7.2 mitigation APIs
picnixz Sep 22, 2025
12bef9c
add tests
picnixz Sep 22, 2025
1d7e599
docs
picnixz Sep 22, 2025
3dcd9bd
NEWS
picnixz Sep 22, 2025
192fe08
Merge branch 'main' into feat/xml/mitigation-api-90949
picnixz Sep 22, 2025
0ecbd55
fix docs
picnixz Sep 22, 2025
07445ad
fix tests
picnixz Sep 22, 2025
9c7371f
regen SBOM
picnixz Sep 22, 2025
1085584
remove unused include
picnixz Sep 22, 2025
c10fe91
fix possible error handling
picnixz Sep 22, 2025
18d175f
undef macro after usage
picnixz Sep 22, 2025
911b2b7
Update Lib/test/test_pyexpat.py
picnixz Sep 22, 2025
d636685
update comments
picnixz Sep 22, 2025
b951065
use better test names
picnixz Sep 22, 2025
e11bf14
simplify roles usage
picnixz Sep 22, 2025
3e45613
prevent reparse deferral of Expat to blow up
picnixz Sep 22, 2025
fb83fb5
test better numeric values
picnixz Sep 22, 2025
7f91f2e
update docs
picnixz Sep 22, 2025
64af05c
avoid deprecated `XML_GetError{Line,Column}Number`
picnixz Sep 22, 2025
b01e53d
raise `NotImplementedError` for unavailable mitigation APIs
picnixz Sep 22, 2025
cd040bf
improve various wordings
picnixz Sep 23, 2025
a09cd15
avoid SBOM alteration
picnixz Sep 23, 2025
a3fc3b3
regen files
picnixz Sep 23, 2025
5afe1ad
improve test assertions
picnixz Sep 23, 2025
b6949dd
improve test genericity
picnixz Sep 23, 2025
bdbd382
split tests even if they could end up duplicated
picnixz Sep 23, 2025
0c03735
raise AttributeError when method is not available and reorder interface
picnixz Sep 23, 2025
220f3e2
reoder docs
picnixz Sep 23, 2025
9d538e4
address review
picnixz Sep 24, 2025
209f300
update capsule API
picnixz Sep 24, 2025
8407cfc
amend docs
picnixz Sep 24, 2025
e898811
use "NOTE:"
picnixz Sep 24, 2025
ce8cb48
address final comments
picnixz Sep 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions Doc/library/pyexpat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,54 @@
.. versionadded:: 3.13


:class:`xmlparser` objects have the following methods to mitigate some

Check warning on line 234 in Doc/library/pyexpat.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:class reference target not found: xmlparser [ref.class]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

py:class reference target not found: xmlparser [ref.class]

Seems that the xmlparser class is not documented, this introduces new reference warnings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, why wasn't it caught by the CI? For now, I'll just suppress the link. We don't have a class description actually. We don't have a real description for this class, except some "XML Expat parser". What's important are the methods, not the constructor (which is actually not available; users need to create a parser through expat.ParserCreate()). I can add:

.. class:: xmlparser

   The type of an Expat XML parser created by :func:`ParserCreate`.

if you want.

well-known XML vulnerabilities.

.. method:: xmlparser.SetAllocTrackerMaximumAmplification(max_factor, /)

Sets the maximum amplification factor between direct input and bytes
of dynamic memory allocated.

By default, parsers objects have a maximum amplification factor of 100.

The amplification factor is calculated as ``allocated / direct``
while parsing, where ``direct`` is the number of bytes read from
the primary document in parsing and ``allocated`` is the number
of bytes of dynamic memory allocated in the parser hierarchy.

The *max_factor* value must be a non-NaN :class:`float` value greater than
or equal to 1.0. Amplifications factors greater than 100 can been observed
near the start of parsing even with benign files in practice. As such, the
upper bound must be carefully chosen so to avoid false positives.

An :exc:`ExpatError` is raised if this method is called by a non-root

Check warning on line 254 in Doc/library/pyexpat.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: ExpatError.column [ref.attr]
parser or if *max_factor* is outside the valid range. The corresponding
:attr:`~.ExpatError.lineno` and :attr:`~.ExpatError.column` should not be
used as they will have no special meaning.

.. note::

The maximum amplification factor is only considered if the threshold
specified by :meth:`.SetAllocTrackerActivationThreshold` is reached.

.. versionadded:: next

.. method:: xmlparser.SetAllocTrackerActivationThreshold(threshold, /)

Sets the number of allocated bytes of dynamic memory needed to activate
protection against disproportionate use of RAM.

By default, parsers objects have an allocation activation threshold of 64 MiB,
or equivalently 67,108,864 bytes.

An :exc:`ExpatError` is raised if this method is called by a non-root parser.

Check warning on line 274 in Doc/library/pyexpat.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: ExpatError.column [ref.attr]
The corresponding :attr:`~.ExpatError.lineno` and :attr:`~.ExpatError.column`
should not be used as they will have no special meaning.

.. versionadded:: next


:class:`xmlparser` objects have the following attributes:

Check warning on line 281 in Doc/library/pyexpat.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:class reference target not found: xmlparser [ref.class]


.. attribute:: xmlparser.buffer_size
Expand Down
10 changes: 10 additions & 0 deletions Doc/whatsnew/3.15.rst
Original file line number Diff line number Diff line change
Expand Up @@ -540,6 +540,16 @@ unittest
(Contributed by Garry Cairns in :gh:`134567`.)


xml.parsers.expat
-----------------

* Add :func:`~xml.parsers.expat.xmlparser.SetAllocTrackerMaximumAmplification`
and :func:`~xml.parsers.expat.xmlparser.SetAllocTrackerActivationThreshold`
to :ref:`xmlparser <xmlparser-objects>` objects to prevent use of
disproportional amounts of dynamic memory from within an Expat parser.
(Contributed by Bénédikt Tran in :gh:`90949`.)


zlib
----

Expand Down
94 changes: 93 additions & 1 deletion Lib/test/test_pyexpat.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@
# handler, are obscure and unhelpful.

import os
import re
import sys
import sysconfig
import unittest
import textwrap
import traceback
from io import BytesIO
from test import support
from test.support import os_helper
from test.support import import_helper, os_helper
from test.support import sortdict
from unittest import mock
from xml.parsers import expat
Expand Down Expand Up @@ -821,5 +823,95 @@ def start_element(name, _):
self.assertEqual(started, ['doc'])


class AttackProtectionTest(unittest.TestCase):

def billion_laughs(self, ncols, nrows, text='.', indent=' '):
"""Create a billion laugh payload.

Be careful: the number of total items is pow(n, k), thereby
requiring at least pow(ncols, nrows) * sizeof(text) memory!
"""
body = textwrap.indent('\n'.join(
f'<!ENTITY row{i + 1} "{f"&row{i};" * ncols}">'
for i in range(nrows)
), indent)
return f"""\
<?xml version="1.0"?>
<!DOCTYPE doc [
{indent}<!ENTITY row0 "{text}">
{indent}<!ELEMENT doc (#PCDATA)>
{body}
]>
<doc>&row{nrows};</doc>
"""

def test_set_alloc_tracker_maximum_amplification(self):
# On WASI, the maximum amplification factor of the payload may differ,
# so we craft a payload that is likely to yield an allocation factor
# way larger than 1.0 and way smaller than 10^5.
payload = self.billion_laughs(1, 2)

p = expat.ParserCreate()
# Unconditionally enable maximum amplification factor.
p.SetAllocTrackerActivationThreshold(0)
# Use a max amplification factor likely to be below the real one.
self.assertIsNone(p.SetAllocTrackerMaximumAmplification(1.0))
msg = r"out of memory: line \d+, column \d+"
self.assertRaisesRegex(expat.ExpatError, msg, p.Parse, payload)

# # Re-create a parser as the current parser is now in an error state.
p = expat.ParserCreate()
# Unconditionally enable maximum amplification factor.
p.SetAllocTrackerActivationThreshold(0)
self.assertIsNone(p.SetAllocTrackerMaximumAmplification(10_000))
self.assertIsNotNone(p.Parse(payload))

def test_set_alloc_tracker_maximum_amplification_invalid_args(self):
parser = expat.ParserCreate()
f = parser.SetAllocTrackerMaximumAmplification

msg = re.escape("'max_factor' must be at least 1.0")
self.assertRaisesRegex(expat.ExpatError, msg, f, float('nan'))
self.assertRaisesRegex(expat.ExpatError, msg, f, 0.99)

subparser = parser.ExternalEntityParserCreate(None)
fsub = subparser.SetAllocTrackerMaximumAmplification
msg = re.escape("parser must be a root parser")
self.assertRaisesRegex(expat.ExpatError, msg, fsub, 1.0)

def test_set_alloc_tracker_activation_threshold(self):
# Run the test with EXPAT_MALLOC_DEBUG=2 to detect those constants.
MAX_ALLOC = 17333
MIN_ALLOC = 1096

payload = self.billion_laughs(10, 4)

p = expat.ParserCreate()
p.SetAllocTrackerActivationThreshold(MAX_ALLOC + 1)
self.assertIsNone(p.SetAllocTrackerMaximumAmplification(1.0))
# Check that we never reach the activation threshold.
self.assertIsNotNone(p.Parse(payload))

p = expat.ParserCreate()
p.SetAllocTrackerActivationThreshold(MIN_ALLOC - 1)
# Check that we always reach the activation threshold.
self.assertIsNone(p.SetAllocTrackerMaximumAmplification(1.0))
msg = r"out of memory: line \d+, column \d+"
self.assertRaisesRegex(expat.ExpatError, msg, p.Parse, payload)

def test_set_alloc_tracker_activation_threshold_overflown_args(self):
_testcapi = import_helper.import_module("_testcapi")
parser = expat.ParserCreate()
f = parser.SetAllocTrackerActivationThreshold
self.assertRaises(OverflowError, f, _testcapi.ULLONG_MAX + 1)

def test_set_alloc_tracker_activation_threshold_invalid_args(self):
parser = expat.ParserCreate()
subparser = parser.ExternalEntityParserCreate(None)
f = subparser.SetAllocTrackerActivationThreshold
msg = re.escape("parser must be a root parser")
self.assertRaisesRegex(expat.ExpatError, msg, f, 12345)


if __name__ == "__main__":
unittest.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Add :func:`~xml.parsers.expat.xmlparser.SetAllocTrackerMaximumAmplification`
and :func:`~xml.parsers.expat.xmlparser.SetAllocTrackerActivationThreshold`
to :ref:`xmlparser <xmlparser-objects>` objects to prevent use of
disproportional amounts of dynamic memory from within an Expat parser. Patch
by Bénédikt Tran.
4 changes: 2 additions & 2 deletions Misc/sbom.spdx.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

136 changes: 135 additions & 1 deletion Modules/clinic/pyexpat.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Modules/expat/expat_external.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@

#ifndef Expat_External_INCLUDED
# define Expat_External_INCLUDED 1
/* Required so that functions in expat.h are declared */
#include "expat_config.h"
/* Namespace external symbols to allow multiple libexpat version to
co-exist. */
#include "pyexpatns.h"
Expand Down
2 changes: 2 additions & 0 deletions Modules/expat/pyexpatns.h
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@
#define XmlPrologStateInit PyExpat_XmlPrologStateInit
#define XmlPrologStateInitExternalEntity PyExpat_XmlPrologStateInitExternalEntity
#define XML_ResumeParser PyExpat_XML_ResumeParser
#define XML_SetAllocTrackerActivationThreshold PyExpat_XML_SetAllocTrackerActivationThreshold
#define XML_SetAllocTrackerMaximumAmplification PyExpat_XML_SetAllocTrackerMaximumAmplification
#define XML_SetAttlistDeclHandler PyExpat_XML_SetAttlistDeclHandler
#define XML_SetBase PyExpat_XML_SetBase
#define XML_SetBillionLaughsAttackProtectionActivationThreshold PyExpat_XML_SetBillionLaughsAttackProtectionActivationThreshold
Expand Down
Loading
Loading