Skip to content

HTML API: Improve script tag escape state processing #9397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: trunk
Choose a base branch
from

Conversation

sirreal
Copy link
Member

@sirreal sirreal commented Aug 6, 2025

✅ (merged to trunk) This includes #9230 (merged here).

Trac ticket: https://core.trac.wordpress.org/ticket/63738

Address two small HTML API mis-parses of script contents.


<script>
<!-->
<script>
</script>🎉

In this case, the parser switched from unescaped to escaped on a sequence like <!-->. This abruptly closed empty comment does not transition to escaped, but remains in the unescaped state.

In the example, the double-escaped state should not be reached on <script> (because the processor should not be in the escaped state) and the script tag should close correctly at </script>.

before / after


<script>
<!--
<script</script>🎉

In this case <script< is correctly recognized as not a sequence that should transition from escaped to double-escaped, however it incorrectly advances beyond the following < character that starts the script close tag and does not close correctly at </script>.

before / after


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

@sirreal sirreal requested review from dmsnell and Copilot August 6, 2025 17:13
@sirreal sirreal changed the title HTML API: Improve script tag escaping processing HTML API: Improve script tag escape state processing Aug 6, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request fixes HTML API script tag parsing by correcting two edge cases in script content state transitions. The fixes ensure proper handling of abruptly closed comments and prevent incorrect position advancement when checking for script tag transitions.

  • Fixed handling of abruptly closed comment sequences like <!----> that should remain in unescaped state
  • Corrected position advancement when checking for script tag transitions to prevent skipping close tags
  • Added comprehensive test coverage for various script tag parsing scenarios

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/wp-includes/html-api/class-wp-html-tag-processor.php Core logic fixes for script tag parsing state transitions and position handling
tests/phpunit/tests/html-api/wpHtmlTagProcessor.php Added comprehensive test cases covering various script tag parsing scenarios

@sirreal sirreal marked this pull request as ready for review August 6, 2025 17:14
Copy link

github-actions bot commented Aug 6, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell, dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

Copy link

github-actions bot commented Aug 6, 2025

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@sirreal sirreal requested a review from Copilot August 6, 2025 17:17
Copilot

This comment was marked as off-topic.

@@ -1537,13 +1559,29 @@ private function skip_script_data(): bool {
* parsing after updating the state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these changes interact with the existing comment? Did we get this comment wrong the first time? Did you review the comment above for updates?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment isn't wrong, although I don't understand exactly what the last paragraph is trying to say. I've pushed a change to clarify and simplify it.

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve run some tests with coverage to confirm the changes in behavior.

With the new tests from this PR, but the code in trunk, we have two test failures:

Screenshot 2025-08-06 at 3 51 36 PM

Coverage here matches the existing coverage in trunk.

Screenshot 2025-08-06 at 3 49 16 PMThis one never matched, which I guess is expected with the change here.Screenshot 2025-08-06 at 3 49 32 PM
`trunk` tests with `trunk` code show the same non-tested coverage.
Screenshot 2025-08-06 at 3 50 58 PM
New tests, new code
Screenshot 2025-08-06 at 3 52 33 PM

For now this is mostly an observation. I think the untested lines of code bothered me before but I didn’t know why we weren’t hitting them. I think what you discovered as this optimization on the string length is part of the explanation.

@sirreal sirreal requested a review from Copilot August 7, 2025 09:20
@sirreal
Copy link
Member Author

sirreal commented Aug 7, 2025

Good reminder on the coverage reports. I've added more tests and all the lines in skip_script_data are now hit with the exception of this condition, the body of this is never entered:

if ( $this->bytes_already_parsed >= $doc_length ) {
return false;
}

I think that it's impossible to hit now (after [60617] / #9230). It may always have been impossible. However, I don't mind leaving that check even if it's redundant.

I believe the condition would be hit on inputs like <script></script that end right after the closing script tag name. With the early length check it shouldn't be possible to hit this condition because the function would already have returned false here:

if ( $at + 8 >= $doc_length ) {
return false;
}

@sirreal sirreal requested review from dmsnell and Copilot and removed request for Copilot August 7, 2025 09:29
Copilot

This comment was marked as off-topic.

@sirreal sirreal requested a review from Copilot August 7, 2025 09:55
Copilot

This comment was marked as off-topic.

@dmsnell
Copy link
Member

dmsnell commented Aug 7, 2025

Is it possible to have Copilot not pollute the PR with its comments? Can it not run locally, or at least leave comments as the result of some command you run? or is it only designed to run from the Github interface and leave erroneous comments?

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while but I manually recreated the entire SCRIPT state machine in dot and then piece-meal minimized it until I had this.

script-states

Then I verified the code paths in the old and new skip_script_data() and found that the old one has the mistakes while the new one matches the diagram.

Finally I reviewed your blog post and found that our state machines are equivalent, though you show more directly that in double escaped mode there is no need to complete the SCRIPT tag.

On that note, the way I read your diagram, the </script transitions into Close SCRIPT tag are misleading, because the only exit out of the element is a complete closing tag whose name is SCRIPT. This means that if you encounter </script- in those places, we stay in script data or in script data escaped.

The code is correct, and it’s easy to argue the diagram is correct, but I found that slightly confusing. It’s also why I chose names for my diagram instead of literals, because the outbound closing tag might contain attributes and unwanted self-closing flags.

Thanks for your patience and for adding the tests. These are good finds. The rules for consuming vs. reconsuming are easy to overlook.

@sirreal
Copy link
Member Author

sirreal commented Aug 8, 2025

Thanks for diving in and doing your own investigation, I'm happy to have my research confirmed ❤️

The blog post we're referencing is available here: Safe JSON in script tags: How not to break a site

On that note, the way I read your diagram, the </script transitions into Close SCRIPT tag are misleading, because the only exit out of the element is a complete closing tag whose name is SCRIPT. This means that if you encounter </script- in those places, we stay in script data or in script data escaped.

That's a good point, it was a bit ambiguous. I alluded to the trailing character in a foot note but it wasn't sufficiently clear. I added a note after the diagram to help clarify because this is an important point.

@dmsnell
Copy link
Member

dmsnell commented Aug 8, 2025

It’s fine to keep these separate, but would we want to combine this with #9402? Maybe merge it into here?

This is mostly about accounting and not about code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants