HTML API: Improve script tag escape state processing #9397

sirreal · 2025-08-06T17:01:40Z

✅ (merged to trunk) ~~This includes #9230 (merged here).~~

Trac ticket: https://core.trac.wordpress.org/ticket/63738

Address two small HTML API mis-parses of script contents.

<script>
<!-->
<script>
</script>🎉

In this case, the parser switched from unescaped to escaped on a sequence like <!-->. This abruptly closed empty comment does not transition to escaped, but remains in the unescaped state.

In the example, the double-escaped state should not be reached on <script> (because the processor should not be in the escaped state) and the script tag should close correctly at </script>.

before / after

<script>
<!--
<script</script>🎉

In this case <script< is correctly recognized as not a sequence that should transition from escaped to double-escaped, however it incorrectly advances beyond the following < character that starts the script close tag and does not close correctly at </script>.

before / after

This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

The parser state is managed externally and should not be changed in skip_script_data

…-api/ensure-script-data-states-parse-correctly

This reverts commit d0cbb00.

do not enter the escaped state.

HTML like: <script><!--<script</script>🎉 Would not encounter the closing script tag

Copilot

Pull Request Overview

This pull request fixes HTML API script tag parsing by correcting two edge cases in script content state transitions. The fixes ensure proper handling of abruptly closed comments and prevent incorrect position advancement when checking for script tag transitions.

Fixed handling of abruptly closed comment sequences like  that should remain in unescaped state
Corrected position advancement when checking for script tag transitions to prevent skipping close tags
Added comprehensive test coverage for various script tag parsing scenarios

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
src/wp-includes/html-api/class-wp-html-tag-processor.php	Core logic fixes for script tag parsing state transitions and position handling
tests/phpunit/tests/html-api/wpHtmlTagProcessor.php	Added comprehensive test cases covering various script tag parsing scenarios

src/wp-includes/html-api/class-wp-html-tag-processor.php

github-actions · 2025-08-06T17:14:57Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell, dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

github-actions · 2025-08-06T17:16:35Z

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

The Plugin and Theme Directories cannot be accessed within Playground.
All changes will be lost when closing a tab with a Playground instance.
All changes will be lost when refreshing the page.
A fresh instance is created each time the link below is clicked.
Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

Co-authored-by: Copilot <[email protected]>

src/wp-includes/html-api/class-wp-html-tag-processor.php

dmsnell · 2025-08-06T19:58:37Z

src/wp-includes/html-api/class-wp-html-tag-processor.php

@@ -1537,13 +1559,29 @@ private function skip_script_data(): bool {
 			 * parsing after updating the state.


Do these changes interact with the existing comment? Did we get this comment wrong the first time? Did you review the comment above for updates?

The comment isn't wrong, although I don't understand exactly what the last paragraph is trying to say. I've pushed a change to clarify and simplify it.

tests/phpunit/tests/html-api/wpHtmlTagProcessor.php

dmsnell

I’ve run some tests with coverage to confirm the changes in behavior.

With the new tests from this PR, but the code in trunk, we have two test failures:

Coverage here matches the existing coverage in trunk.

`trunk` tests with `trunk` code show the same non-tested coverage.
This one never matched, which I guess is expected with the change here.

New tests, new code
For now this is mostly an observation. I think the untested lines of code bothered me before but I didn’t know why we weren’t hitting them. I think what you discovered as this optimization on the string length is part of the explanation.

…rrectly

…arse-correctly

sirreal · 2025-08-07T09:29:05Z

Good reminder on the coverage reports. I've added more tests and all the lines in skip_script_data are now hit with the exception of this condition, the body of this is never entered:

wordpress-develop/src/wp-includes/html-api/class-wp-html-tag-processor.php

Lines 1631 to 1633 in 057f585

    
           if ( $this->bytes_already_parsed >= $doc_length ) { 
        
           	return false; 
        
           }

I think that it's impossible to hit now (after [60617] / #9230). It may always have been impossible. However, I don't mind leaving that check even if it's redundant.

I believe the condition would be hit on inputs like <script></script that end right after the closing script tag name. With the early length check it shouldn't be possible to hit this condition because the function would already have returned false here:

wordpress-develop/src/wp-includes/html-api/class-wp-html-tag-processor.php

Lines 1531 to 1533 in 057f585

    
           if ( $at + 8 >= $doc_length ) { 
        
           	return false; 
        
           }

dmsnell · 2025-08-07T18:07:11Z

Is it possible to have Copilot not pollute the PR with its comments? Can it not run locally, or at least leave comments as the result of some command you run? or is it only designed to run from the Github interface and leave erroneous comments?

dmsnell

It took me a while but I manually recreated the entire SCRIPT state machine in dot and then piece-meal minimized it until I had this.

Then I verified the code paths in the old and new skip_script_data() and found that the old one has the mistakes while the new one matches the diagram.

Finally I reviewed your blog post and found that our state machines are equivalent, though you show more directly that in double escaped mode there is no need to complete the SCRIPT tag.

On that note, the way I read your diagram, the </script transitions into Close SCRIPT tag are misleading, because the only exit out of the element is a complete closing tag whose name is SCRIPT. This means that if you encounter </script- in those places, we stay in script data or in script data escaped.

The code is correct, and it’s easy to argue the diagram is correct, but I found that slightly confusing. It’s also why I chose names for my diagram instead of literals, because the outbound closing tag might contain attributes and unwanted self-closing flags.

Thanks for your patience and for adding the tests. These are good finds. The rules for consuming vs. reconsuming are easy to overlook.

sirreal · 2025-08-08T11:32:19Z

Thanks for diving in and doing your own investigation, I'm happy to have my research confirmed ❤️

The blog post we're referencing is available here: Safe JSON in script tags: How not to break a site

On that note, the way I read your diagram, the </script transitions into Close SCRIPT tag are misleading, because the only exit out of the element is a complete closing tag whose name is SCRIPT. This means that if you encounter </script- in those places, we stay in script data or in script data escaped.

That's a good point, it was a bit ambiguous. I alluded to the trailing character in a foot note but it wasn't sufficiently clear. I added a note after the diagram to help clarify because this is an important point.

dmsnell · 2025-08-08T22:47:30Z

It’s fine to keep these separate, but would we want to combine this with #9402? Maybe merge it into here?

This is mostly about accounting and not about code.

sirreal added 14 commits August 6, 2025 13:51

Move script data length checks to top of loop

6ad9951

Remove parser_state change in skip_script_data

b3b3177

The parser state is managed externally and should not be changed in skip_script_data

Remove more length checks

ca16e0e

Improve documentation

0456be7

Improve comment explaining early return logic

ea6f7d3

Improve loop comment

4be62b9

Add script tag processing tests

df2affa

Remove problematic tests

d0cbb00

Merge branch 'html-api/improve-skip-script-data-len-checks' into html…

69f3bce

…-api/ensure-script-data-states-parse-correctly

Revert "Remove problematic tests"

c509f9d

This reverts commit d0cbb00.

Add test that reveals bad offset

de91e09

Ensure the escaped state is not entered on abruptly closed comments

f041a9c

 do not enter the escaped state.

Add unclosed script tag test

2b6833c

Prevent script close tag from being found

bba0547

HTML like: <script><!--<script</script>🎉 Would not encounter the closing script tag

sirreal requested review from dmsnell and Copilot August 6, 2025 17:13

sirreal changed the title ~~HTML API: Improve script tag escaping processing~~ HTML API: Improve script tag escape state processing Aug 6, 2025

Copilot AI reviewed Aug 6, 2025

View reviewed changes

src/wp-includes/html-api/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved

src/wp-includes/html-api/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved

sirreal marked this pull request as ready for review August 6, 2025 17:14

Fix typos in explanatory comments

728d13f

Co-authored-by: Copilot <[email protected]>

sirreal requested a review from Copilot August 6, 2025 17:17

This comment was marked as off-topic.

Sign in to view

dmsnell reviewed Aug 6, 2025

View reviewed changes

src/wp-includes/html-api/class-wp-html-tag-processor.php Outdated Show resolved Hide resolved

dmsnell reviewed Aug 6, 2025

View reviewed changes

tests/phpunit/tests/html-api/wpHtmlTagProcessor.php Outdated Show resolved Hide resolved

dmsnell reviewed Aug 6, 2025

View reviewed changes

Reword explanatory comment.

1b4478f

sirreal mentioned this pull request Aug 7, 2025

HTML API: Reduce skip_script_data length checks #9230

Closed

sirreal added 5 commits August 7, 2025 10:35

Merge branch 'trunk' into html-api/ensure-script-data-states-parse-co…

d22ef9a

…rrectly

fixup! Merge branch 'trunk' into html-api/ensure-script-data-states-p…

360d896

…arse-correctly

Update <!-- comments

9fd074f

Add more tests and improve coverage

840f6aa

more tests more coverage

f7bcfb4

sirreal requested a review from Copilot August 7, 2025 09:20

sirreal requested review from dmsnell and Copilot and removed request for Copilot August 7, 2025 09:29

Improve language about abruptly closed comment

e9dd022

This comment was marked as off-topic.

Sign in to view

sirreal requested a review from Copilot August 7, 2025 09:55

This comment was marked as off-topic.

Sign in to view

dmsnell approved these changes Aug 8, 2025

View reviewed changes

		@@ -1537,13 +1559,29 @@ private function skip_script_data(): bool {
		* parsing after updating the state.

HTML API: Improve script tag escape state processing #9397

Are you sure you want to change the base?

HTML API: Improve script tag escape state processing #9397

Uh oh!

Conversation

sirreal commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 6, 2025

Test using WordPress Playground

Some things to be aware of

Uh oh!

This comment was marked as off-topic.

Uh oh!

Uh oh!

dmsnell Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

sirreal Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dmsnell left a comment

Choose a reason for hiding this comment

Uh oh!

sirreal commented Aug 7, 2025

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

dmsnell commented Aug 7, 2025

Uh oh!

dmsnell left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sirreal commented Aug 8, 2025

Uh oh!

dmsnell commented Aug 8, 2025

Uh oh!

Uh oh!

sirreal commented Aug 6, 2025 •

edited

Loading

github-actions bot commented Aug 6, 2025 •

edited

Loading

dmsnell left a comment •

edited

Loading