-
Notifications
You must be signed in to change notification settings - Fork 372
Description
To Reproduce
See also example script at bottom
- Use options to only allow
atags, only allowinghrefattribute, and only allowinghttpsschema - Sanitize a string where the href's protocol attribute contains html entities in decimal or hex format, but lacking the trailing semicolon
- Example of html entities:
p(decimal),p(hex) - Example of full string to sanitize-html:
<a href=http://example.com>ClickMe</a>
- Example of html entities:
Expected behavior
Expected: The href attribute is stripped from the string. (<a>ClickMe</a>)
Actual: The href remains, with the ampersand in the attribute changed to &. This results in a tag like: <a href="htt&#112://example.com">ClickMe</a>
Describe the bug
If the html entities are correctly formatted with trailing semicolon, we get the expected output. But the whole point of sanitization is to handle bad input. 😄
In any case, this is clearly not https schema, so the attribute should be stripped.
Security Consideration
I don't think this constitutes a security vulnerability. When the browser sees <a href="htt&#112://example.com">ClickMe</a>, it treats that like <a href="file:///htt&#112://example.com">ClickMe</a>, which is to say, a link to /htt& with everything after the # treated as the page hash. Maybe there's some vulnerability if you know the user happens to have a file in a specific spot on their local system with a name ending in ampersand?
Details
Version of Node.js: v22.13.1
Server Operating System: MacOS Version 15.3 (24D60) (also reproducible on ubuntu, but i'm not sure the version)
Additional context:
n/a
Screenshots
n/a
Example Script
const sanitizeHtml = require('sanitize-html');
// Only allow https links
const options = {
allowedTags: [
'a',
],
allowedAttributes: {
a: [
'href',
],
},
allowedSchemes: [
'https',
],
};
// Example 1
// Input: <a href=http://example.com>ClickMe</a>
// Expected: <a>ClickMe</a>
// Actual: <a href="htt&#112://example.com">ClickMe</a>
// Notes:
// * The HTML entity `p` lacks the trailing semicolon
// * The browser treats the actual result like "file:///htt&#112://example.com">ClickMe</a>"
console.log(sanitizeHtml('<a href=http://example.com>ClickMe</a>', options));
// Example 2
// Input: <a href=http://example.com>ClickMe</a>
// Expected: <a>ClickMe</a>
// Actual: <a>ClickMe</a>
// Notes:
// * Unlike Example 1, this one includes the trailing semicolon in the html entity
console.log(sanitizeHtml('<a href=http://example.com>ClickMe</a>', options));
// Example 3
// Input: <a href=http://example.com>ClickMe</a>
// Expected: <a>ClickMe</a>
// Actual: <a href="htt&#x70://example.com">ClickMe</a>
// Notes:
// * This is the same as Example 1, except it uses hex encoding rather than decimal
console.log(sanitizeHtml('<a href=http://example.com>ClickMe</a>', options));
// Example 4
// Input: <a href=javascript:alert(1)>ClickMe</a>
// Expected: <a>ClickMe</a>
// Actual: <a href="j&#97v&#97script:&#97lert(1)">ClickMe</a>
// Notes:
// * This is _NOT_ an XSS vulnerability: the browser treats this link like `file:///j&#97v&#97script:&#97lert(1)`
console.log(sanitizeHtml('<a href=javascript:alert(1)>ClickMe</a>', options));