Skip to content

Commit 0314bbe

Browse files
committed
add advisory
1 parent ae6b6ae commit 0314bbe

File tree

1 file changed

+36
-18
lines changed

1 file changed

+36
-18
lines changed
Lines changed: 36 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,47 @@
11
---
2-
title: "Typo3 HTML Sanitizer By-passing via the noscript tag"
3-
date: 2023-07-26
2+
title: "Masterminds/html5-php parser differential"
3+
date: 2023-11-14
44
tags:
5+
- "parser differential"
56
- "xss"
67
- "mxss"
78
- "bypass"
89
advisory: true
9-
origin: https://github.com/advisories/GHSA-59jf-3q9v-rh6g
10-
cves:
11-
- CVE-2023-38500
10+
origin: https://github.com/Masterminds/html5-php/issues/241
11+
cves:
1212
ghsas:
13-
- "GHSA-59jf-3q9v-rh6g"
1413
---
15-
# Problem
16-
Due to an encoding issue in the serialization layer, malicious markup nested in a noscript element was not encoded correctly. noscript is disabled in the default configuration, but might have been enabled in custom scenarios. This allows bypassing the cross-site scripting mechanism of typo3/html-sanitizer.
14+
## Observations
15+
We have noticed a number of parsing differences between the Masterminds/html5-php parser and the HTML5 specification. We think that the root cause of those issues drills down to the [use](https://github.com/Masterminds/html5-php/blob/master/src/HTML5/Parser/DOMTreeBuilder.php#L172) of PHP’s default parser, [loadHTML](https://www.php.net/manual/en/domdocument.loadhtml.php), [DOMImplementation](https://www.php.net/manual/en/class.domimplementation.php), etc. The lack of HTML5 support by PHP is [known](https://wiki.php.net/rfc/domdocument_html5_parser) and we contacted them asking to make it more clear in the [documentation](https://www.php.net/manual/en/domdocument.loadhtml.php) in order to raise awareness for these security issues.
16+
This behavior becomes security-relevant when HTML sanitizers use the Masterminds/html5-php parser. We have come across multiple PHP sanitizers that are vulnerable to bypasses due to using Masterminds/html5-php.
1717

18-
# Solution
19-
Update to `typo3/html-sanitizer` versions 1.5.1 or 2.1.2 that fix the problem described.
18+
## Exploitation
19+
Here are examples of the differentials, and how attackers can leverage these in order to bypass sanitizers.
2020

21-
# Credits
22-
Thanks to David Klein and Yaniv Nizry who reported this issue, and to TYPO3 security team members Oliver Hader and Benjamin Franzke who fixed the issue.
21+
### Comments:
22+
According to the [XML specification](https://www.w3.org/TR/xml/#sec-comments) (XHTML), comments must end with the characters `-->`.
23+
On the other hand, the [HTML specification](https://html.spec.whatwg.org/multipage/syntax.html#comments) states that a comment's text _'must not start with the string `>`, nor start with the string `->`'_.
24+
When parsing the following string in a browser, the comment will end before the `p` tag. But when parsing with `Masterminds/html5-php` the `p` tag will be considered a comment:
25+
* Input: `<!---><p>`
26+
* Browser (HTML5 specification) output: `<!----><p></p>`
27+
* Masterminds/html5-php parser output: `<!---><p>-->`
2328

24-
# References
25-
* [TYPO3-CORE-SA-2023-002](https://typo3.org/security/advisory/typo3-core-sa-2023-002)
26-
* [GHSA-59jf-3q9v-rh6g](https://github.com/TYPO3/html-sanitizer/security/advisories/GHSA-59jf-3q9v-rh6g)
27-
* https://nvd.nist.gov/vuln/detail/CVE-2023-38500
28-
* [TYPO3/html-sanitizer@e3026f5](https://github.com/TYPO3/html-sanitizer/commit/e3026f589fef0be8c3574ee3f0a0bfbe33d7ebdb)
29-
* https://typo3.org/security/advisory/typo3-core-sa-2023-002
29+
An attacker can input the following payload `<!---><xss>-->`. While the parser considers the `xss` tag as a comment, the browser will end the comment right before and render the `xss` tag as expected.
30+
31+
### Processing instructions (PI) elements ([known](https://github.com/Masterminds/html5-php#known-issues-or-things-we-designed-against-the-spec), but we encounter sanitizer bypasses due to this)
32+
Processing instructions elements exist in [XML specification](https://www.w3.org/TR/xml/#sec-pi) but in [HTML5](https://html.spec.whatwg.org/#determining-the-character-encoding:~:text=A%20sequence%20of%20bytes%20starting%20with%3A%200x3C%200x3F%20(%60%3C%3F%60)) the characters `<?` opens a comment and ends it at the first occurrence of greater than `>`.
33+
34+
Attackers can create the following Processing Instruction `<?xml >s<img src=x onerror=alert(1)> ?>` and while no `img` tag is rendered in Masterminds/html5-php the browser will create a comment and end it at the first `>` character, rendering the `img` tag.
35+
36+
### Foreign content elements
37+
HTML5 introduced two foreign elements ([math](https://html.spec.whatwg.org/#mathml) and [svg](https://html.spec.whatwg.org/#svg-0)) which follow different parsing specifications than HTML. Masterminds/html5-php doesn’t take it into account, causing other parsing differentials and [sanitizers bypass](https://research.securitum.com/dompurify-bypass-using-mxss/) such as:
38+
* `<svg><p><style><!--</style><xss>--></style>`
39+
40+
### `noscript` element
41+
Depending if [scripting](https://html.spec.whatwg.org/#the-noscript-element) is enabled (enabled by default in browsers) the `noscript` element parses its content differently:
42+
* If scripting is enabled, then the content is rendered as raw data
43+
* If scripting is disabled, then the content is rendered as HTML
44+
45+
Masterminds/html5-php parses according to disabled scripting, which is different than the default browsers’ parsing.
46+
This is not wrong per se, but still can cause some [mXSS](https://cure53.de/fp170.pdf) such as:
47+
`<noscript><p alt="</noscript><img src=x onerror=alert(1)>">`

0 commit comments

Comments
 (0)