Skip to content

HTML API: Handle \f in skip_script_data tag matching #9402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion src/wp-includes/html-api/class-wp-html-tag-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -1610,7 +1610,30 @@ private function skip_script_data(): bool {
*/
$at += 6;
$c = $html[ $at ];
if ( ' ' !== $c && "\t" !== $c && "\r" !== $c && "\n" !== $c && '/' !== $c && '>' !== $c ) {
if (
/*
* These characters trigger state transitions of interest:
*
* - @see {https://html.spec.whatwg.org/multipage/parsing.html#script-data-end-tag-name-state}
* - @see {https://html.spec.whatwg.org/multipage/parsing.html#script-data-escaped-end-tag-name-state}
* - @see {https://html.spec.whatwg.org/multipage/parsing.html#script-data-double-escape-start-state}
* - @see {https://html.spec.whatwg.org/multipage/parsing.html#script-data-double-escape-end-state}
*
* The "\r" character is not present in the above references. However, "\r" must be
* treated the same as "\n". This is because the HTML Standard requires newline
* normalization during preprocessing which applies this replacement.
*
* - @see https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream
* - @see https://infra.spec.whatwg.org/#normalize-newlines
*/
'>' !== $c &&
' ' !== $c &&
"\n" !== $c &&
'/' !== $c &&
"\t" !== $c &&
"\f" !== $c &&
"\r" !== $c
Comment on lines +1629 to +1635
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ordered these by how often I'd expect the character to be seen in this position. I don't expect any real performance improvements from that part of the change, but also don't see any down side to having > and appear as the first and second match opportunities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that these are all likely to be performed in parallel and executed before the CPU even reaches these lines, so yeah, I would guess this is true, but without measurement also lean on not knowing. shouldn’t matter in any case, and unless someone has realistic benchmarks on realistic data, I would be skeptical of any performance claims on the position of these items.

) {
++$at;
continue;
}
Expand Down
2 changes: 2 additions & 0 deletions tests/phpunit/tests/html-api/wpHtmlTagProcessor.php
Original file line number Diff line number Diff line change
Expand Up @@ -2012,6 +2012,7 @@ public function test_script_tag_parsing( string $input, bool $closes ) {
public static function data_script_tag(): array {
return array(
'Basic script tag' => array( '<script></script>', true ),
'Basic script tag with </script\f> close' => array( "<script></script\f>", true ),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that we’re testing a class of terminations here we could extend these to add all of the relevant characters. for now I think this patch is great to go anyway, but I do think we would have some valuable work for someone to refactor some of the existing tests from the original build of the Tag Processor.

there’s probably something to be said about recreating the state machine from the spec and testing each of its branches.

'Script with type attribute' => array( '<script type="text/javascript"></script>', true ),
'Script data escaped' => array( '<script><!--</script>', true ),
'Script data double-escaped exit (comment)' => array( '<script><!--<script>--></script>', true ),
Expand All @@ -2021,6 +2022,7 @@ public static function data_script_tag(): array {

'Script tag with self-close flag (ignored)' => array( '<script />', false ),
'Script data double-escaped' => array( '<script><!--<script></script>', false ),
'Basic script tag double-escaped with <script\f' => array( "<script><!--<script\f</script>", false ),
);
}

Expand Down
Loading