Skip to content

HTML Cleaner allows crafted scripts in special contexts like svg or math to pass through

High
frenzymadness published GHSA-5jfw-gq64-q45f Nov 19, 2024

Package

pip lxml-html-clean (pip)

Affected versions

< 0.4.0

Patched versions

0.4.0

Description

Impact

The HTML Parser in lxml does not properly handle context-switching for special HTML tags such as <svg>, <math> and <noscript>. This behavior deviates from how web browsers parse and interpret such tags. Specifically, content in CSS comments is ignored by lxml_html_clean but may be interpreted differently by web browsers, enabling malicious scripts to bypass the cleaning process. This vulnerability could lead to Cross-Site Scripting (XSS) attacks, compromising the security of users relying on lxml_html_clean in default configuration for sanitizing untrusted HTML content.

Patches

Users employing the HTML cleaner in a security-sensitive context should upgrade to lxml 0.4.0, which addresses this issue.

Workarounds

As a temporary mitigation, users can configure lxml_html_clean with the following settings to prevent the exploitation of this vulnerability:

  • remove_tags: Specify tags to remove - their content is moved to their parents' tags.
  • kill_tags: Specify tags to be removed completely.
  • allow_tags: Restrict the set of permissible tags, excluding context-switching tags like <svg>, <math> and <noscript>.

References

Severity

High

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
High
Privileges required
None
User interaction
None
Scope
Unchanged
Confidentiality
High
Integrity
Low
Availability
High

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:L/A:H

CVE ID

CVE-2024-52595

Weaknesses

Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

The product does not neutralize or incorrectly neutralizes user-controllable input before it is placed in output that is used as a web page that is served to other users. Learn more on MITRE.

Improper Neutralization of Script in Attributes in a Web Page

The product does not neutralize or incorrectly neutralizes javascript: or other URIs from dangerous attributes within tags, such as onmouseover, onload, onerror, or style. Learn more on MITRE.

Incomplete List of Disallowed Inputs

The product implements a protection mechanism that relies on a list of inputs (or properties of inputs) that are not allowed by policy or otherwise require other action to neutralize before additional processing takes place, but the list is incomplete, leading to resultant weaknesses. Learn more on MITRE.

Credits