Skip to content

HTML API: Improve script tag escape state processing #9397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: trunk
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
6ad9951
Move script data length checks to top of loop
sirreal Jul 10, 2025
b3b3177
Remove parser_state change in skip_script_data
sirreal Jul 10, 2025
ca16e0e
Remove more length checks
sirreal Jul 10, 2025
0456be7
Improve documentation
sirreal Jul 10, 2025
ea6f7d3
Improve comment explaining early return logic
sirreal Jul 31, 2025
4be62b9
Improve loop comment
sirreal Aug 6, 2025
df2affa
Add script tag processing tests
sirreal Aug 6, 2025
d0cbb00
Remove problematic tests
sirreal Aug 6, 2025
69f3bce
Merge branch 'html-api/improve-skip-script-data-len-checks' into html…
sirreal Aug 6, 2025
c509f9d
Revert "Remove problematic tests"
sirreal Aug 6, 2025
de91e09
Add test that reveals bad offset
sirreal Aug 6, 2025
f041a9c
Ensure the escaped state is not entered on abruptly closed comments
sirreal Aug 6, 2025
2b6833c
Add unclosed script tag test
sirreal Aug 6, 2025
bba0547
Prevent script close tag from being found
sirreal Aug 6, 2025
728d13f
Fix typos in explanatory comments
sirreal Aug 6, 2025
1b4478f
Reword explanatory comment.
dmsnell Aug 6, 2025
d22ef9a
Merge branch 'trunk' into html-api/ensure-script-data-states-parse-co…
sirreal Aug 7, 2025
360d896
fixup! Merge branch 'trunk' into html-api/ensure-script-data-states-p…
sirreal Aug 7, 2025
9fd074f
Update `<!--` comments
sirreal Aug 7, 2025
840f6aa
Add more tests and improve coverage
sirreal Aug 7, 2025
f7bcfb4
more tests more coverage
sirreal Aug 7, 2025
e9dd022
Improve language about abruptly closed comment
sirreal Aug 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 46 additions & 15 deletions src/wp-includes/html-api/class-wp-html-tag-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -1496,13 +1496,39 @@ private function skip_script_data(): bool {
while ( false !== $at && $at < $doc_length ) {
$at += strcspn( $html, '-<', $at );

/*
* *IMPORTANT:* Any changes to this loop *must* ensure the conditions described in this
* comment remain valid.
*
* The rest of this loop matches different byte sequences. If a script close tag is not
* found, the function will return false. The script close tag is the longest byte
* sequenced to match. Therefore, a single length check for at least 8 additional
* bytes allows for an early `false` return OR subsequent matches without length checks.
*
* $at may be here.
* ↓
* </script>
* ╰──┬───╯
* $at + 8 additional bytes are required for a non-false return value.
*
* The length of shorter matches is already satisfied:
*
* $at may be here.
* ↓
* -->
* ├╯
* $at + 2 additional characters does not require an additional length check.
*/
if ( $at + 8 >= $doc_length ) {
return false;
}

/*
* For all script states a "-->" transitions
* back into the normal unescaped script mode,
* even if that's the current state.
*/
if (
$at + 2 < $doc_length &&
'-' === $html[ $at ] &&
'-' === $html[ $at + 1 ] &&
'>' === $html[ $at + 2 ]
Expand All @@ -1512,10 +1538,6 @@ private function skip_script_data(): bool {
continue;
}

if ( $at + 1 >= $doc_length ) {
return false;
}

/*
* Everything of interest past here starts with "<".
* Check this character and advance position regardless.
Expand All @@ -1537,13 +1559,29 @@ private function skip_script_data(): bool {
* parsing after updating the state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these changes interact with the existing comment? Did we get this comment wrong the first time? Did you review the comment above for updates?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment isn't wrong, although I don't understand exactly what the last paragraph is trying to say. I've pushed a change to clarify and simplify it.

*/
if (
$at + 2 < $doc_length &&
'unescaped' === $state &&
'!' === $html[ $at ] &&
'-' === $html[ $at + 1 ] &&
'-' === $html[ $at + 2 ]
) {
$at += 3;
$state = 'unescaped' === $state ? 'escaped' : $state;
$at += 3;

/*
* The parser is ready to enter the `escaped` state but may remain in the
* `unescaped` state if there is immediately is a sequence of any number of 0 or
* more "-" characters followed by ">". This is similar to abruptly closed HTML
* comments like "<!-->" or "<!--->".
*
* Note that this check may have advanced the position significantly and requires
* a length check to prevent bad offsets on inputs like `<script><!---------`.
*/
$at += strspn( $html, '-', $at );
if ( $at < $doc_length && '>' === $html[ $at ] ) {
++$at;
continue;
}

$state = 'escaped';
continue;
}

Expand All @@ -1561,7 +1599,6 @@ private function skip_script_data(): bool {
* proceed scanning to the next potential token in the text.
*/
if ( ! (
$at + 6 < $doc_length &&
( 's' === $html[ $at ] || 'S' === $html[ $at ] ) &&
( 'c' === $html[ $at + 1 ] || 'C' === $html[ $at + 1 ] ) &&
( 'r' === $html[ $at + 2 ] || 'R' === $html[ $at + 2 ] ) &&
Expand All @@ -1579,13 +1616,9 @@ private function skip_script_data(): bool {
* "<script123" should not end a script region even though
* "<script" is found within the text.
*/
if ( $at + 6 >= $doc_length ) {
continue;
}
$at += 6;
$c = $html[ $at ];
if ( ' ' !== $c && "\t" !== $c && "\r" !== $c && "\n" !== $c && '/' !== $c && '>' !== $c ) {
++$at;
continue;
}

Expand All @@ -1611,8 +1644,6 @@ private function skip_script_data(): bool {
}

if ( $this->bytes_already_parsed >= $doc_length ) {
$this->parser_state = self::STATE_INCOMPLETE_INPUT;

return false;
}

Expand Down
48 changes: 48 additions & 0 deletions tests/phpunit/tests/html-api/wpHtmlTagProcessor.php
Original file line number Diff line number Diff line change
Expand Up @@ -3046,4 +3046,52 @@ public static function data_alphabet_by_characters_uppercase() {
yield strtoupper( $data[0] ) => array( strtoupper( $data[0] ) );
}
}

/**
* Test that script tags are parsed correctly.
*
* Script tag parsing is very complicated, see the following resources for more details:
*
* - https://html.spec.whatwg.org/multipage/parsing.html#script-data-state
* - https://html.spec.whatwg.org/multipage/scripting.html#restrictions-for-contents-of-script-elements
*
* @ticket 63738
*
* @dataProvider data_script_tag
*/
public function test_script_tag_parsing( string $input, bool $closes ) {
$processor = new WP_HTML_Tag_Processor( $input );

if ( $closes ) {
$this->assertTrue( $processor->next_token(), 'Expected to find complete script tag.' );
$this->assertSame( 'SCRIPT', $processor->get_tag() );
return;
}

$this->assertFalse( $processor->next_token(), 'Expected to fail next_token().' );
$this->assertTrue( $processor->paused_at_incomplete_token(), 'Expected an incomplete SCRIPT tag token.' );
}

/**
* Data provider.
*/
public static function data_script_tag(): array {
return array(
'Basic script tag' => array( '<script></script>', true ),
'Script with type attribute' => array( '<script type="text/javascript"></script>', true ),
'Script data escaped' => array( '<script><!--</script>', true ),
'Script data double-escaped exit (comment)' => array( '<script><!--<script>--></script>', true ),
'Script data double-escaped exit (closed)' => array( '<script><!--<script></script></script>', true ),
'Script data double-escaped exit (closed/truncated)' => array( '<script><!--<script></script </script>', true ),
'Script data no double-escape' => array( '<script><!-- --><script></script>', true ),
'Script data no double-escape (short comment)' => array( '<script><!--><script></script>', true ),
'Script data almost double-escaped' => array( '<script><!--<script</script>', true ),

'Script tag with self-close flag (ignored)' => array( '<script />', false ),
'Script data double-escaped' => array( '<script><!--<script></script>', false ),

'Unclosed script in escaped state' => array( '<script><!--------------', false ),
'Unclosed script in double escaped state' => array( '<script><!--<script ', false ),
);
}
}
Loading