|
1 | 1 | ---
|
2 |
| -title: "Typo3 HTML Sanitizer By-passing via the noscript tag" |
3 |
| -date: 2023-07-26 |
| 2 | +title: "Masterminds/html5-php parser differential" |
| 3 | +date: 2023-11-14 |
4 | 4 | tags:
|
| 5 | + - "parser differential" |
5 | 6 | - "xss"
|
6 | 7 | - "mxss"
|
7 | 8 | - "bypass"
|
8 | 9 | advisory: true
|
9 |
| -origin: https://github.com/advisories/GHSA-59jf-3q9v-rh6g |
10 |
| -cves: |
11 |
| - - CVE-2023-38500 |
| 10 | +origin: https://github.com/Masterminds/html5-php/issues/241 |
| 11 | +cves: |
12 | 12 | ghsas:
|
13 |
| - - "GHSA-59jf-3q9v-rh6g" |
14 | 13 | ---
|
15 |
| -# Problem |
16 |
| -Due to an encoding issue in the serialization layer, malicious markup nested in a noscript element was not encoded correctly. noscript is disabled in the default configuration, but might have been enabled in custom scenarios. This allows bypassing the cross-site scripting mechanism of typo3/html-sanitizer. |
| 14 | +## Observations |
| 15 | +We have noticed a number of parsing differences between the Masterminds/html5-php parser and the HTML5 specification. We think that the root cause of those issues drills down to the [use](https://github.com/Masterminds/html5-php/blob/master/src/HTML5/Parser/DOMTreeBuilder.php#L172) of PHP’s default parser, [loadHTML](https://www.php.net/manual/en/domdocument.loadhtml.php), [DOMImplementation](https://www.php.net/manual/en/class.domimplementation.php), etc. The lack of HTML5 support by PHP is [known](https://wiki.php.net/rfc/domdocument_html5_parser) and we contacted them asking to make it more clear in the [documentation](https://www.php.net/manual/en/domdocument.loadhtml.php) in order to raise awareness for these security issues. |
| 16 | +This behavior becomes security-relevant when HTML sanitizers use the Masterminds/html5-php parser. We have come across multiple PHP sanitizers that are vulnerable to bypasses due to using Masterminds/html5-php. |
17 | 17 |
|
18 |
| -# Solution |
19 |
| -Update to `typo3/html-sanitizer` versions 1.5.1 or 2.1.2 that fix the problem described. |
| 18 | +## Exploitation |
| 19 | +Here are examples of the differentials, and how attackers can leverage these in order to bypass sanitizers. |
20 | 20 |
|
21 |
| -# Credits |
22 |
| -Thanks to David Klein and Yaniv Nizry who reported this issue, and to TYPO3 security team members Oliver Hader and Benjamin Franzke who fixed the issue. |
| 21 | +### Comments: |
| 22 | +According to the [XML specification](https://www.w3.org/TR/xml/#sec-comments) (XHTML), comments must end with the characters `-->`. |
| 23 | +On the other hand, the [HTML specification](https://html.spec.whatwg.org/multipage/syntax.html#comments) states that a comment's text _'must not start with the string `>`, nor start with the string `->`'_. |
| 24 | +When parsing the following string in a browser, the comment will end before the `p` tag. But when parsing with `Masterminds/html5-php` the `p` tag will be considered a comment: |
| 25 | +* Input: `<!---><p>` |
| 26 | +* Browser (HTML5 specification) output: `<!----><p></p>` |
| 27 | +* Masterminds/html5-php parser output: `<!---><p>-->` |
23 | 28 |
|
24 |
| -# References |
25 |
| -* [TYPO3-CORE-SA-2023-002](https://typo3.org/security/advisory/typo3-core-sa-2023-002) |
26 |
| -* [GHSA-59jf-3q9v-rh6g](https://github.com/TYPO3/html-sanitizer/security/advisories/GHSA-59jf-3q9v-rh6g) |
27 |
| -* https://nvd.nist.gov/vuln/detail/CVE-2023-38500 |
28 |
| -* [TYPO3/html-sanitizer@e3026f5](https://github.com/TYPO3/html-sanitizer/commit/e3026f589fef0be8c3574ee3f0a0bfbe33d7ebdb) |
29 |
| -* https://typo3.org/security/advisory/typo3-core-sa-2023-002 |
| 29 | +An attacker can input the following payload `<!---><xss>-->`. While the parser considers the `xss` tag as a comment, the browser will end the comment right before and render the `xss` tag as expected. |
| 30 | + |
| 31 | +### Processing instructions (PI) elements ([known](https://github.com/Masterminds/html5-php#known-issues-or-things-we-designed-against-the-spec), but we encounter sanitizer bypasses due to this) |
| 32 | +Processing instructions elements exist in [XML specification](https://www.w3.org/TR/xml/#sec-pi) but in [HTML5](https://html.spec.whatwg.org/#determining-the-character-encoding:~:text=A%20sequence%20of%20bytes%20starting%20with%3A%200x3C%200x3F%20(%60%3C%3F%60)) the characters `<?` opens a comment and ends it at the first occurrence of greater than `>`. |
| 33 | + |
| 34 | +Attackers can create the following Processing Instruction `<?xml >s<img src=x onerror=alert(1)> ?>` and while no `img` tag is rendered in Masterminds/html5-php the browser will create a comment and end it at the first `>` character, rendering the `img` tag. |
| 35 | + |
| 36 | +### Foreign content elements |
| 37 | +HTML5 introduced two foreign elements ([math](https://html.spec.whatwg.org/#mathml) and [svg](https://html.spec.whatwg.org/#svg-0)) which follow different parsing specifications than HTML. Masterminds/html5-php doesn’t take it into account, causing other parsing differentials and [sanitizers bypass](https://research.securitum.com/dompurify-bypass-using-mxss/) such as: |
| 38 | +* `<svg><p><style><!--</style><xss>--></style>` |
| 39 | + |
| 40 | +### `noscript` element |
| 41 | +Depending if [scripting](https://html.spec.whatwg.org/#the-noscript-element) is enabled (enabled by default in browsers) the `noscript` element parses its content differently: |
| 42 | +* If scripting is enabled, then the content is rendered as raw data |
| 43 | +* If scripting is disabled, then the content is rendered as HTML |
| 44 | + |
| 45 | +Masterminds/html5-php parses according to disabled scripting, which is different than the default browsers’ parsing. |
| 46 | +This is not wrong per se, but still can cause some [mXSS](https://cure53.de/fp170.pdf) such as: |
| 47 | +`<noscript><p alt="</noscript><img src=x onerror=alert(1)>">` |
0 commit comments