@@ -72,6 +72,13 @@ The :mod:`xml.parsers.expat` module contains two functions:
7272 *encoding * [1 ]_ is given it will override the implicit or explicit encoding of the
7373 document.
7474
75+ .. _xmlparser-non-root :
76+
77+ Parsers created through :func: `!ParserCreate ` are called "root" parsers,
78+ in the sense that they do not have any parent parser attached. Non-root
79+ parsers are created by :meth: `parser.ExternalEntityParserCreate
80+ <xmlparser.ExternalEntityParserCreate> `.
81+
7582 Expat can optionally do XML namespace processing for you, enabled by providing a
7683 value for *namespace_separator *. The value must be a one-character string; a
7784 :exc: `ValueError ` will be raised if the string has an illegal length (``None ``
@@ -231,6 +238,111 @@ XMLParser Objects
231238 .. versionadded :: 3.13
232239
233240
241+ :class: `!xmlparser ` objects have the following methods to tune protections
242+ against some common XML vulnerabilities.
243+
244+ .. method :: xmlparser.SetBillionLaughsAttackProtectionActivationThreshold(threshold, /)
245+
246+ Sets the number of output bytes needed to activate protection against
247+ `billion laughs `_ attacks.
248+
249+ The number of output bytes includes amplification from entity expansion
250+ and reading DTD files.
251+
252+ Parser objects usually have a protection activation threshold of 8 MiB,
253+ but the actual default value depends on the underlying Expat library.
254+
255+ An :exc: `ExpatError ` is raised if this method is called on a
256+ |xml-non-root-parser | parser.
257+ The corresponding :attr: `~ExpatError.lineno ` and :attr: `~ExpatError.offset `
258+ should not be used as they may have no special meaning.
259+
260+ .. note ::
261+
262+ Activation thresholds below 4 MiB are known to break support for DITA 1.3
263+ payload and are hence not recommended.
264+
265+ .. versionadded :: next
266+
267+ .. method :: xmlparser.SetBillionLaughsAttackProtectionMaximumAmplification(max_factor, /)
268+
269+ Sets the maximum tolerated amplification factor for protection against
270+ `billion laughs `_ attacks.
271+
272+ The amplification factor is calculated as ``(direct + indirect) / direct ``
273+ while parsing, where ``direct `` is the number of bytes read from
274+ the primary document in parsing and ``indirect `` is the number of
275+ bytes added by expanding entities and reading of external DTD files.
276+
277+ The *max_factor * value must be a non-NaN :class: `float ` value greater than
278+ or equal to 1.0. Peak amplifications of factor 15,000 for the entire payload
279+ and of factor 30,000 in the middle of parsing have been observed with small
280+ benign files in practice. In particular, the activation threshold should be
281+ carefully chosen to avoid false positives.
282+
283+ Parser objects usually have a maximum amplification factor of 100,
284+ but the actual default value depends on the underlying Expat library.
285+
286+ An :exc: `ExpatError ` is raised if this method is called on a
287+ |xml-non-root-parser | parser or if *max_factor * is outside the valid range.
288+ The corresponding :attr: `~ExpatError.lineno ` and :attr: `~ExpatError.offset `
289+ should not be used as they may have no special meaning.
290+
291+ .. note ::
292+
293+ The maximum amplification factor is only considered if the threshold
294+ that can be adjusted by :meth: `.SetBillionLaughsAttackProtectionActivationThreshold `
295+ is exceeded.
296+
297+ .. versionadded :: next
298+
299+ .. method :: xmlparser.SetAllocTrackerActivationThreshold(threshold, /)
300+
301+ Sets the number of allocated bytes of dynamic memory needed to activate
302+ protection against disproportionate use of RAM.
303+
304+ Parser objects usually have an allocation activation threshold of 64 MiB,
305+ but the actual default value depends on the underlying Expat library.
306+
307+ An :exc: `ExpatError ` is raised if this method is called on a
308+ |xml-non-root-parser | parser.
309+ The corresponding :attr: `~ExpatError.lineno ` and :attr: `~ExpatError.offset `
310+ should not be used as they may have no special meaning.
311+
312+ .. versionadded :: next
313+
314+ .. method :: xmlparser.SetAllocTrackerMaximumAmplification(max_factor, /)
315+
316+ Sets the maximum amplification factor between direct input and bytes
317+ of dynamic memory allocated.
318+
319+ The amplification factor is calculated as ``allocated / direct ``
320+ while parsing, where ``direct `` is the number of bytes read from
321+ the primary document in parsing and ``allocated `` is the number
322+ of bytes of dynamic memory allocated in the parser hierarchy.
323+
324+ The *max_factor * value must be a non-NaN :class: `float ` value greater than
325+ or equal to 1.0. Amplification factors greater than 100.0 can be observed
326+ near the start of parsing even with benign files in practice. In particular,
327+ the activation threshold should be carefully chosen to avoid false positives.
328+
329+ Parser objects usually have a maximum amplification factor of 100,
330+ but the actual default value depends on the underlying Expat library.
331+
332+ An :exc: `ExpatError ` is raised if this method is called on a
333+ |xml-non-root-parser | parser or if *max_factor * is outside the valid range.
334+ The corresponding :attr: `~ExpatError.lineno ` and :attr: `~ExpatError.offset `
335+ should not be used as they may have no special meaning.
336+
337+ .. note ::
338+
339+ The maximum amplification factor is only considered if the threshold
340+ that can be adjusted by :meth: `.SetAllocTrackerActivationThreshold `
341+ is exceeded.
342+
343+ .. versionadded :: next
344+
345+
234346:class: `xmlparser ` objects have the following attributes:
235347
236348
@@ -954,3 +1066,6 @@ The ``errors`` module has the following attributes:
9541066 not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
9551067 and https://www.iana.org/assignments/character-sets/character-sets.xhtml.
9561068
1069+
1070+ .. _billion laughs : https://en.wikipedia.org/wiki/Billion_laughs_attack
1071+ .. |xml-non-root-parser | replace :: :ref: `non-root <xmlparser-non-root >`
0 commit comments