Skip to content

Commit 6661123

Browse files
authored
gh-90949: expose Expat API to tune exponential expansion protections (#139368)
Expose the XML Expat 2.7.2 APIs to tune protections against "billion laughs" [1] attacks. The exposed APIs are available on Expat parsers, that is, parsers created by `xml.parsers.expat.ParserCreate()`, as: - `parser.SetBillionLaughsAttackProtectionActivationThreshold(threshold)`, and - `parser.SetBillionLaughsAttackProtectionMaximumAmplification(max_factor)`. This completes the work in f04bea4, and improves the existing related documentation. [1]: https://en.wikipedia.org/wiki/Billion_laughs_attack
1 parent 48d0d0d commit 6661123

File tree

8 files changed

+382
-17
lines changed

8 files changed

+382
-17
lines changed

Doc/library/pyexpat.rst

Lines changed: 63 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -238,16 +238,71 @@ XMLParser Objects
238238
.. versionadded:: 3.13
239239

240240

241-
:class:`!xmlparser` objects have the following methods to mitigate some
242-
common XML vulnerabilities.
241+
:class:`!xmlparser` objects have the following methods to tune protections
242+
against some common XML vulnerabilities.
243+
244+
.. method:: xmlparser.SetBillionLaughsAttackProtectionActivationThreshold(threshold, /)
245+
246+
Sets the number of output bytes needed to activate protection against
247+
`billion laughs`_ attacks.
248+
249+
The number of output bytes includes amplification from entity expansion
250+
and reading DTD files.
251+
252+
Parser objects usually have a protection activation threshold of 8 MiB,
253+
but the actual default value depends on the underlying Expat library.
254+
255+
An :exc:`ExpatError` is raised if this method is called on a
256+
|xml-non-root-parser| parser.
257+
The corresponding :attr:`~ExpatError.lineno` and :attr:`~ExpatError.offset`
258+
should not be used as they may have no special meaning.
259+
260+
.. note::
261+
262+
Activation thresholds below 4 MiB are known to break support for DITA 1.3
263+
payload and are hence not recommended.
264+
265+
.. versionadded:: next
266+
267+
.. method:: xmlparser.SetBillionLaughsAttackProtectionMaximumAmplification(max_factor, /)
268+
269+
Sets the maximum tolerated amplification factor for protection against
270+
`billion laughs`_ attacks.
271+
272+
The amplification factor is calculated as ``(direct + indirect) / direct``
273+
while parsing, where ``direct`` is the number of bytes read from
274+
the primary document in parsing and ``indirect`` is the number of
275+
bytes added by expanding entities and reading of external DTD files.
276+
277+
The *max_factor* value must be a non-NaN :class:`float` value greater than
278+
or equal to 1.0. Peak amplifications of factor 15,000 for the entire payload
279+
and of factor 30,000 in the middle of parsing have been observed with small
280+
benign files in practice. In particular, the activation threshold should be
281+
carefully chosen to avoid false positives.
282+
283+
Parser objects usually have a maximum amplification factor of 100,
284+
but the actual default value depends on the underlying Expat library.
285+
286+
An :exc:`ExpatError` is raised if this method is called on a
287+
|xml-non-root-parser| parser or if *max_factor* is outside the valid range.
288+
The corresponding :attr:`~ExpatError.lineno` and :attr:`~ExpatError.offset`
289+
should not be used as they may have no special meaning.
290+
291+
.. note::
292+
293+
The maximum amplification factor is only considered if the threshold
294+
that can be adjusted by :meth:`.SetBillionLaughsAttackProtectionActivationThreshold`
295+
is exceeded.
296+
297+
.. versionadded:: next
243298

244299
.. method:: xmlparser.SetAllocTrackerActivationThreshold(threshold, /)
245300

246301
Sets the number of allocated bytes of dynamic memory needed to activate
247302
protection against disproportionate use of RAM.
248303

249-
By default, parser objects have an allocation activation threshold of 64 MiB,
250-
or equivalently 67,108,864 bytes.
304+
Parser objects usually have an allocation activation threshold of 64 MiB,
305+
but the actual default value depends on the underlying Expat library.
251306

252307
An :exc:`ExpatError` is raised if this method is called on a
253308
|xml-non-root-parser| parser.
@@ -271,7 +326,8 @@ common XML vulnerabilities.
271326
near the start of parsing even with benign files in practice. In particular,
272327
the activation threshold should be carefully chosen to avoid false positives.
273328

274-
By default, parser objects have a maximum amplification factor of 100.0.
329+
Parser objects usually have a maximum amplification factor of 100,
330+
but the actual default value depends on the underlying Expat library.
275331

276332
An :exc:`ExpatError` is raised if this method is called on a
277333
|xml-non-root-parser| parser or if *max_factor* is outside the valid range.
@@ -1010,4 +1066,6 @@ The ``errors`` module has the following attributes:
10101066
not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
10111067
and https://www.iana.org/assignments/character-sets/character-sets.xhtml.
10121068
1069+
1070+
.. _billion laughs: https://en.wikipedia.org/wiki/Billion_laughs_attack
10131071
.. |xml-non-root-parser| replace:: :ref:`non-root <xmlparser-non-root>`

Doc/whatsnew/3.15.rst

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -558,10 +558,18 @@ xml.parsers.expat
558558

559559
* Add :meth:`~xml.parsers.expat.xmlparser.SetAllocTrackerActivationThreshold`
560560
and :meth:`~xml.parsers.expat.xmlparser.SetAllocTrackerMaximumAmplification`
561-
to :ref:`xmlparser <xmlparser-objects>` objects to prevent use of
562-
disproportional amounts of dynamic memory from within an Expat parser.
561+
to :ref:`xmlparser <xmlparser-objects>` objects to tune protections against
562+
disproportional amounts of dynamic memory usage from within an Expat parser.
563563
(Contributed by Bénédikt Tran in :gh:`90949`.)
564564

565+
* Add :meth:`~xml.parsers.expat.xmlparser.SetBillionLaughsAttackProtectionActivationThreshold`
566+
and :meth:`~xml.parsers.expat.xmlparser.SetBillionLaughsAttackProtectionMaximumAmplification`
567+
to :ref:`xmlparser <xmlparser-objects>` objects to tune protections against
568+
`billion laughs`_ attacks.
569+
(Contributed by Bénédikt Tran in :gh:`90949`.)
570+
571+
.. _billion laughs: https://en.wikipedia.org/wiki/Billion_laughs_attack
572+
565573

566574
zlib
567575
----

Include/pyexpat.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,11 @@ struct PyExpat_CAPI
5757
XML_Parser parser, unsigned long long activationThresholdBytes);
5858
XML_Bool (*SetAllocTrackerMaximumAmplification)(
5959
XML_Parser parser, float maxAmplificationFactor);
60+
/* might be NULL for expat < 2.4.0 */
61+
XML_Bool (*SetBillionLaughsAttackProtectionActivationThreshold)(
62+
XML_Parser parser, unsigned long long activationThresholdBytes);
63+
XML_Bool (*SetBillionLaughsAttackProtectionMaximumAmplification)(
64+
XML_Parser parser, float maxAmplificationFactor);
6065
/* always add new stuff to the end! */
6166
};
6267

Lib/test/test_pyexpat.py

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -958,6 +958,64 @@ def test_set_maximum_amplification__fail_for_subparser(self):
958958
self.assert_root_parser_failure(setter, 123.45)
959959

960960

961+
@unittest.skipIf(expat.version_info < (2, 4, 0), "requires Expat >= 2.4.0")
962+
class ExpansionProtectionTest(AttackProtectionTestBase, unittest.TestCase):
963+
964+
def assert_rejected(self, func, /, *args, **kwargs):
965+
"""Check that func(*args, **kwargs) hits the allocation limit."""
966+
msg = (
967+
r"limit on input amplification factor \(from DTD and entities\) "
968+
r"breached: line \d+, column \d+"
969+
)
970+
self.assertRaisesRegex(expat.ExpatError, msg, func, *args, **kwargs)
971+
972+
def set_activation_threshold(self, parser, threshold):
973+
return parser.SetBillionLaughsAttackProtectionActivationThreshold(threshold)
974+
975+
def set_maximum_amplification(self, parser, max_factor):
976+
return parser.SetBillionLaughsAttackProtectionMaximumAmplification(max_factor)
977+
978+
def test_set_activation_threshold__threshold_reached(self):
979+
parser = expat.ParserCreate()
980+
# Choose a threshold expected to be always reached.
981+
self.set_activation_threshold(parser, 3)
982+
# Check that the threshold is reached by choosing a small factor
983+
# and a payload whose peak amplification factor exceeds it.
984+
self.assertIsNone(self.set_maximum_amplification(parser, 1.0))
985+
payload = self.exponential_expansion_payload(ncols=10, nrows=4)
986+
self.assert_rejected(parser.Parse, payload, True)
987+
988+
def test_set_activation_threshold__threshold_not_reached(self):
989+
parser = expat.ParserCreate()
990+
# Choose a threshold expected to be never reached.
991+
self.set_activation_threshold(parser, pow(10, 5))
992+
# Check that the threshold is reached by choosing a small factor
993+
# and a payload whose peak amplification factor exceeds it.
994+
self.assertIsNone(self.set_maximum_amplification(parser, 1.0))
995+
payload = self.exponential_expansion_payload(ncols=10, nrows=4)
996+
self.assertIsNotNone(parser.Parse(payload, True))
997+
998+
def test_set_maximum_amplification__amplification_exceeded(self):
999+
parser = expat.ParserCreate()
1000+
# Unconditionally enable maximum activation factor.
1001+
self.set_activation_threshold(parser, 0)
1002+
# Choose a max amplification factor expected to always be exceeded.
1003+
self.assertIsNone(self.set_maximum_amplification(parser, 1.0))
1004+
# Craft a payload for which the peak amplification factor is > 1.0.
1005+
payload = self.exponential_expansion_payload(ncols=1, nrows=2)
1006+
self.assert_rejected(parser.Parse, payload, True)
1007+
1008+
def test_set_maximum_amplification__amplification_not_exceeded(self):
1009+
parser = expat.ParserCreate()
1010+
# Unconditionally enable maximum activation factor.
1011+
self.set_activation_threshold(parser, 0)
1012+
# Choose a max amplification factor expected to never be exceeded.
1013+
self.assertIsNone(self.set_maximum_amplification(parser, 1e4))
1014+
# Craft a payload for which the peak amplification factor is < 1e4.
1015+
payload = self.exponential_expansion_payload(ncols=1, nrows=2)
1016+
self.assertIsNotNone(parser.Parse(payload, True))
1017+
1018+
9611019
@unittest.skipIf(expat.version_info < (2, 7, 2), "requires Expat >= 2.7.2")
9621020
class MemoryProtectionTest(AttackProtectionTestBase, unittest.TestCase):
9631021

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
Add :meth:`~xml.parsers.expat.xmlparser.SetAllocTrackerActivationThreshold`
22
and :meth:`~xml.parsers.expat.xmlparser.SetAllocTrackerMaximumAmplification`
3-
to :ref:`xmlparser <xmlparser-objects>` objects to prevent use of
4-
disproportional amounts of dynamic memory from within an Expat parser.
3+
to :ref:`xmlparser <xmlparser-objects>` objects to tune protections against
4+
disproportional amounts of dynamic memory usage from within an Expat parser.
55
Patch by Bénédikt Tran.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Add
2+
:meth:`~xml.parsers.expat.xmlparser.SetBillionLaughsAttackProtectionActivationThreshold`
3+
and
4+
:meth:`~xml.parsers.expat.xmlparser.SetBillionLaughsAttackProtectionMaximumAmplification`
5+
to :ref:`xmlparser <xmlparser-objects>` objects to tune protections against
6+
`billion laughs <https://en.wikipedia.org/wiki/Billion_laughs_attack>`_ attacks.
7+
Patch by Bénédikt Tran.

Modules/clinic/pyexpat.c.h

Lines changed: 147 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)