Skip to content

Commit 70ce239

Browse files
committed
HTML API: Improve script tag escape state processing.
Addresses some edge cases parsing of script tag contents: - "<!-->" remains in the unescaped state and does not enter the escaped state. - Contents in the escaped state that end with "<script" do not enter double-escaped state. - "\f" (Form Feed) was missing as a tag name terminating character. Developed in WordPress/wordpress-develop#9397 and WordPress/wordpress-develop#9402. Props jonsurrell, dmsnell. See #63738. Built from https://develop.svn.wordpress.org/trunk@60649 git-svn-id: https://core.svn.wordpress.org/trunk@59985 1a063a9b-81f0-0310-95a4-ce76da25c4cd
1 parent a87cdb1 commit 70ce239

File tree

2 files changed

+46
-15
lines changed

2 files changed

+46
-15
lines changed

wp-includes/html-api/class-wp-html-tag-processor.php

Lines changed: 45 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1556,24 +1556,33 @@ private function skip_script_data(): bool {
15561556
}
15571557

15581558
/*
1559-
* Unlike with "-->", the "<!--" only transitions
1560-
* into the escaped mode if not already there.
1561-
*
1562-
* Inside the escaped modes it will be ignored; and
1563-
* should never break out of the double-escaped
1564-
* mode and back into the escaped mode.
1565-
*
1566-
* While this requires a mode change, it does not
1567-
* impact the parsing otherwise, so continue
1568-
* parsing after updating the state.
1559+
* "<!--" only transitions from _unescaped_ to _escaped_. This byte sequence is only
1560+
* significant in the _unescaped_ state and is ignored in any other state.
15691561
*/
15701562
if (
1563+
'unescaped' === $state &&
15711564
'!' === $html[ $at ] &&
15721565
'-' === $html[ $at + 1 ] &&
15731566
'-' === $html[ $at + 2 ]
15741567
) {
1575-
$at += 3;
1576-
$state = 'unescaped' === $state ? 'escaped' : $state;
1568+
$at += 3;
1569+
1570+
/*
1571+
* The parser is ready to enter the _escaped_ state, but may remain in the
1572+
* _unescaped_ state. This occurs when "<!--" is immediately followed by a
1573+
* sequence of 0 or more "-" followed by ">". This is similar to abruptly closed
1574+
* HTML comments like "<!-->" or "<!--->".
1575+
*
1576+
* Note that this check may advance the position significantly and requires a
1577+
* length check to prevent bad offsets on inputs like `<script><!---------`.
1578+
*/
1579+
$at += strspn( $html, '-', $at );
1580+
if ( $at < $doc_length && '>' === $html[ $at ] ) {
1581+
++$at;
1582+
continue;
1583+
}
1584+
1585+
$state = 'escaped';
15771586
continue;
15781587
}
15791588

@@ -1610,8 +1619,30 @@ private function skip_script_data(): bool {
16101619
*/
16111620
$at += 6;
16121621
$c = $html[ $at ];
1613-
if ( ' ' !== $c && "\t" !== $c && "\r" !== $c && "\n" !== $c && '/' !== $c && '>' !== $c ) {
1614-
++$at;
1622+
if (
1623+
/**
1624+
* These characters trigger state transitions of interest:
1625+
*
1626+
* - @see {https://html.spec.whatwg.org/multipage/parsing.html#script-data-end-tag-name-state}
1627+
* - @see {https://html.spec.whatwg.org/multipage/parsing.html#script-data-escaped-end-tag-name-state}
1628+
* - @see {https://html.spec.whatwg.org/multipage/parsing.html#script-data-double-escape-start-state}
1629+
* - @see {https://html.spec.whatwg.org/multipage/parsing.html#script-data-double-escape-end-state}
1630+
*
1631+
* The "\r" character is not present in the above references. However, "\r" must be
1632+
* treated the same as "\n". This is because the HTML Standard requires newline
1633+
* normalization during preprocessing which applies this replacement.
1634+
*
1635+
* - @see https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream
1636+
* - @see https://infra.spec.whatwg.org/#normalize-newlines
1637+
*/
1638+
'>' !== $c &&
1639+
' ' !== $c &&
1640+
"\n" !== $c &&
1641+
'/' !== $c &&
1642+
"\t" !== $c &&
1643+
"\f" !== $c &&
1644+
"\r" !== $c
1645+
) {
16151646
continue;
16161647
}
16171648

wp-includes/version.php

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
*
1717
* @global string $wp_version
1818
*/
19-
$wp_version = '6.9-alpha-60648';
19+
$wp_version = '6.9-alpha-60649';
2020

2121
/**
2222
* Holds the WordPress DB revision, increments when changes are made to the WordPress DB schema.

0 commit comments

Comments
 (0)