Skip to content

Conversation

sirreal
Copy link
Member

@sirreal sirreal commented Aug 2, 2024

Trac ticket: https://core.trac.wordpress.org/ticket/61810

When the Tag Processor (or HTML Processor) attempts to parse certain incomplete script tags, the parser enters an infinite loop and will hang indefinitely. The conditions to reach this situation are:

  • Input HTML ends with an open script tag.
  • The final character of input is - or <.

If these conditions are satisfied, the parser will enter an infinite loop and hang when it attempts to parse the script tag.

Example problematic inputs:

  • <script>-
  • <script><

Creating a processor and calling next_tag() will hang. In both cases, next_tag() should return false with the processor in incomplete token state.

Diagnosis

This bug was caused by essential code for advancing the parser position ($at++) never being reached by a short-circuit condition: if ( $too_short || '<' !== $html[ $at++ ] ) {}. If the left-hand side is true, the RHS is never evaluated and the processor may not advance.

Other code in the block would advance the parser under most conditions, it was only when - or < were in the final position in the document that the LHS would evaluate to true and the parser would enter an infinite loop.

This was difficult to spot because of its similarity to many other blocks in the parser like this:

if (
  $at + 2 < $doc_length &&
  '-' === $html[ $at ] &&
  '-' === $html[ $at + 1 ] &&
  '>' === $html[ $at + 2 ]
) {}

This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

Copy link

github-actions bot commented Aug 2, 2024

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@sirreal sirreal force-pushed the html-api/fix-script-tag-infinite-loop branch from 6bc5b09 to 8c3fe3a Compare August 2, 2024 13:51
sirreal added 2 commits August 2, 2024 17:06
The infinite loop was caused by the parser-advancing increment not being
called when two `||` OR conditions short-circuited. If the first
condition was true, the `$at++` code was never reached.

This was the case if the parser was stopped on the final character "-"
or "<" as the final character of input HTML in an open script tag.
@@ -1431,8 +1431,15 @@ private function skip_script_data(): bool {
continue;
}

// Everything of interest past here starts with "<".
if ( $at + 1 >= $doc_length || '<' !== $html[ $at++ ] ) {
if ( $at + 1 >= $doc_length ) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've chosen to break here because there's no way to find the script closer. The document is incomplete, so breaking and returning false below enters the correct code path. This could also return false directly instead of breaking out of the loop.

This condition could likely be increased. We need to match at least <!-- so we could apply this with at least $at + 3.

Suggested change
if ( $at + 1 >= $doc_length ) {
// The parser needs to match at least `<!--` from here.
if ( $at + 3 >= $doc_length ) {

I also think this could apply more aggressive length checks earlier, then skip all of the subsequent length checks. Ultimately, we need to find a script closer in order to exit true. It doesn't matter how much of the script contents are parsed if it can't be closed the parser will stop on incomplete input. After this line:

We could include this to eagerly reach incomplete input if we know a closer cannot be found.

if ( $at + 8 > $doc_length ) { return false; }

Here's my thinking:

/* Things could happen in the loop, but they're irrelevant if a `</script>` cannot be found  */

// $at:  V
<script> ----->␃
// $at+8         ^

/* A script closer could fit here and is ultimately found */

// $at:          V
<script> bla bla </script>␃
// $at+8                 ^

/* A script closer would fit so processing continues */
// $at:          V
<script> bla bla -            more document…␃
// $at+8                 ^

Maybe the proposed bugfix could be landed and included in WordPress 6.6, with more optimizations included landed in another change to trunk.

@dmsnell curious to hear your thoughts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are valid points. what I like about advancing one character at a time is that in some cases, I leaned on that to ensure we didn't count wrong. a targeting fix sounds nice now, and then a follow-up can be tested in isolation, particularly for performance, for skipping ahead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I also like returning false directly - no need to further confuse control flow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started work on this in #9230

@sirreal sirreal marked this pull request as ready for review August 2, 2024 15:54
Copy link

github-actions bot commented Aug 2, 2024

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell, dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

pento pushed a commit that referenced this pull request Aug 2, 2024
When the Tag Processor (or HTML Processor) attempts to parse certain
incomplete script tags, the parser enters an infinite loop and will
hang indefinitely. The conditions to reach this situation are:

- Input HTML ends with an open script tag.
- The final character of input is `-` or `<`.

The infinite loop was caused by the parser-advancing increment not being
called when two `||` OR conditions short-circuited. If the first
condition was true, the `$at++` code was never reached.

This path resolves the issue.

Developed in #7128
Discussed in https://core.trac.wordpress.org/ticket/61810

Follow-up to [55203].

Props: dmsnell, jonsurrell.
Fixes #61810.


git-svn-id: https://develop.svn.wordpress.org/trunk@58845 602fd350-edb4-49c9-b593-d223f7449a82
@dmsnell
Copy link
Member

dmsnell commented Aug 2, 2024

Merged in [58845]
bdef9de

@dmsnell dmsnell closed this Aug 2, 2024
markjaquith pushed a commit to markjaquith/WordPress that referenced this pull request Aug 2, 2024
When the Tag Processor (or HTML Processor) attempts to parse certain
incomplete script tags, the parser enters an infinite loop and will
hang indefinitely. The conditions to reach this situation are:

- Input HTML ends with an open script tag.
- The final character of input is `-` or `<`.

The infinite loop was caused by the parser-advancing increment not being
called when two `||` OR conditions short-circuited. If the first
condition was true, the `$at++` code was never reached.

This path resolves the issue.

Developed in WordPress/wordpress-develop#7128
Discussed in https://core.trac.wordpress.org/ticket/61810

Follow-up to [55203].

Props: dmsnell, jonsurrell.
Fixes #61810.

Built from https://develop.svn.wordpress.org/trunk@58845


git-svn-id: http://core.svn.wordpress.org/trunk@58241 1a063a9b-81f0-0310-95a4-ce76da25c4cd
@dmsnell dmsnell deleted the html-api/fix-script-tag-infinite-loop branch August 2, 2024 23:52
github-actions bot pushed a commit to platformsh/wordpress-performance that referenced this pull request Aug 3, 2024
When the Tag Processor (or HTML Processor) attempts to parse certain
incomplete script tags, the parser enters an infinite loop and will
hang indefinitely. The conditions to reach this situation are:

- Input HTML ends with an open script tag.
- The final character of input is `-` or `<`.

The infinite loop was caused by the parser-advancing increment not being
called when two `||` OR conditions short-circuited. If the first
condition was true, the `$at++` code was never reached.

This path resolves the issue.

Developed in WordPress/wordpress-develop#7128
Discussed in https://core.trac.wordpress.org/ticket/61810

Follow-up to [55203].

Props: dmsnell, jonsurrell.
Fixes #61810.

Built from https://develop.svn.wordpress.org/trunk@58845


git-svn-id: https://core.svn.wordpress.org/trunk@58241 1a063a9b-81f0-0310-95a4-ce76da25c4cd
aslamdoctor pushed a commit to aslamdoctor/wordpress-develop that referenced this pull request Dec 28, 2024
When the Tag Processor (or HTML Processor) attempts to parse certain
incomplete script tags, the parser enters an infinite loop and will
hang indefinitely. The conditions to reach this situation are:

- Input HTML ends with an open script tag.
- The final character of input is `-` or `<`.

The infinite loop was caused by the parser-advancing increment not being
called when two `||` OR conditions short-circuited. If the first
condition was true, the `$at++` code was never reached.

This path resolves the issue.

Developed in WordPress#7128
Discussed in https://core.trac.wordpress.org/ticket/61810

Follow-up to [55203].

Props: dmsnell, jonsurrell.
Fixes #61810.


git-svn-id: https://develop.svn.wordpress.org/trunk@58845 602fd350-edb4-49c9-b593-d223f7449a82
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants