Skip to content
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion src/wp-includes/html-api/class-wp-html-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -1040,8 +1040,17 @@ public function step( $node_to_process = self::PROCESS_NEXT_NODE ): bool {
$token_name = $this->get_token_name();

if ( self::REPROCESS_CURRENT_NODE !== $node_to_process ) {
try {
$bookmark_name = $this->bookmark_token();
} catch ( Exception $e ) {
if ( self::ERROR_EXCEEDED_MAX_BOOKMARKS === $this->last_error ) {
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this not perhaps lead a developer to think that they reached the end of the document, when in reality the nesting is too large (or the max bookmarks exceeded)? In that way, I think an exception is more helpful. Otherwise, wouldn't every loop over tokens in a doc need to do something like:

while ( $p->next_tag() ) {
    // ...
}
if ( WP_HTML_Processor::ERROR_EXCEEDED_MAX_BOOKMARKS === $p->get_last_error() ) {
     // Handle max bookmark error.
}

This would put the exception case in the regular code that always runs. Since exceeding the max bookmarks should be exceptional, I would think an exception is preferred:

try {
    while ( $p->next_tag() ) {
        // ...
    }
} catch ( Exception $e ) {
     if ( WP_HTML_Processor::ERROR_EXCEEDED_MAX_BOOKMARKS === $p->get_last_error() ) {
          // Handle max bookmark error.
     }
}

But since this is the only exception that WP_HTML_Tag_Processor throws (currently), then it could be just:

try {
    while ( $p->next_tag() ) {
        // ...
    }
} catch ( Exception $e ) {
    // Handle max bookmark error.
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since this is the only exception that WP_HTML_Tag_Processor throws

This is only in the WP_HTML_Processor. WP_HTML_Tag_Processor should not throw any errors.

Will this not perhaps lead a developer to think that they reached the end of the document

That's already the case, the HTML processor has avoided throwing errors and exposes problems through some getters. Primarily, ::get_last_error() should be used:

<?php
require '/wordpress/wp-load.php';
echo '<plaintext>';
echo "WordPress " . wp_get_wp_version() . "\n";

$p = WP_HTML_Processor::create_fragment('<table><tbody>unsupported');
while( $p->next_token() ) {
  var_dump($p->get_tag());  
}
// Need to check error status.
var_dump( $p->get_last_error() );
var_dump( $p->get_unsupported_exception()->getMessage() );

When these APIs throw errors that callers are supposed to handle, it's just too easy to bring down users' sites with errors that aren't actionable for them. It's true that superficially "end of document" is the same as "error." It seems preferable that a document silently fail to fully parse instead of crashing and bringing down a site.

In either case, the developer should do another thing that's not obvious (add exception handling with try/catch or check error status after iteration with the available method). Relying on error status is the least impactful for a site if a developer overlooks this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only in the WP_HTML_Processor. WP_HTML_Tag_Processor should not throw any errors.

Yes, sorry, I meant WP_HTML_Processor.

In either case, the developer should do another thing that's not obvious (add exception handling with try/catch or check error status after iteration with the available method). Relying on error status is the least impactful for a site if a developer overlooks this.

OK, makes sense to me.

}
throw $e;
}

$this->state->current_token = new WP_HTML_Token(
$this->bookmark_token(),
$bookmark_name,
$token_name,
$this->has_self_closing_flag(),
$this->release_internal_bookmark_on_destruct
Expand Down Expand Up @@ -1151,6 +1160,12 @@ public function step( $node_to_process = self::PROCESS_NEXT_NODE ): bool {
* otherwise might involve messier calling and return conventions.
*/
return false;
} catch ( Exception $e ) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exhausted bookmarks throw a generic Exception.

This block catches the exceptions thrown by insert_virtual_token().

if ( self::ERROR_EXCEEDED_MAX_BOOKMARKS === $this->last_error ) {
return false;
}
// Rethrow any other exceptions for higher-level handling.
throw $e;
}
}

Expand Down Expand Up @@ -6287,6 +6302,8 @@ private function insert_foreign_element( WP_HTML_Token $token, bool $only_add_to
*
* @since 6.7.0
*
* @throws Exception When unable to allocate a bookmark for the next token in the input HTML document.
*
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bookmark_token() actually throws, but it's not handled here. The annotation may not be appropriate.

* @param string $token_name Name of token to create and insert into the stack of open elements.
* @param string|null $bookmark_name Optional. Name to give bookmark for created virtual node.
* Defaults to auto-creating a bookmark name.
Expand Down
59 changes: 58 additions & 1 deletion tests/phpunit/tests/html-api/wpHtmlProcessor.php
Original file line number Diff line number Diff line change
Expand Up @@ -1068,7 +1068,7 @@ public function test_ensure_next_token_method_extensibility( $html, $expected_to
/**
* Ensure that lowercased tag_name query matches tags case-insensitively.
*
* @group 62427
* @ticket 62427
*/
public function test_next_tag_lowercase_tag_name() {
// The upper case <DIV> is irrelevant but illustrates the case-insentivity.
Expand All @@ -1079,4 +1079,61 @@ public function test_next_tag_lowercase_tag_name() {
$processor = WP_HTML_Processor::create_fragment( '<svg><RECT>' );
$this->assertTrue( $processor->next_tag( array( 'tag_name' => 'rect' ) ) );
}

/**
* Ensure that the processor does not throw errors in cases of extreme HTML nesting.
*
* @ticket 64394
*
* @expectedIncorrectUsage WP_HTML_Tag_Processor::set_bookmark
*/
public function test_deep_nesting_fails_process_without_error() {
$html = str_repeat( '<i>', WP_HTML_Processor::MAX_BOOKMARKS * 2 );
$processor = WP_HTML_Processor::create_fragment( $html );

// The fragment parser starts with a few context tokens already bookmarked.
$reached_tokens = ( function () {
return count( $this->bookmarks );
} )->call( $processor );
while ( $processor->next_token() ) {
++$reached_tokens;
}
$this->assertSame( WP_HTML_Processor::MAX_BOOKMARKS, $reached_tokens );
}

/**
* @ticket 64394
*
* @expectedIncorrectUsage WP_HTML_Tag_Processor::set_bookmark
*/
public function test_deep_nesting_fails_processing_virtual_tokens_without_error() {
$html = str_repeat( '<table><td>', WP_HTML_Processor::MAX_BOOKMARKS * 2 );
$processor = WP_HTML_Processor::create_fragment( $html );

// The fragment parser starts with a few context tokens already bookmarked.
$reached_tokens = ( function () {
return count( $this->bookmarks );
} )->call( $processor );
while ( $processor->next_token() ) {
++$reached_tokens;
}

/*
* This test has some variability depending on how the virtual tokens align.
* It will produce 1 real, 2 virtual, 1 real.
*
* "<table><td><table><td>…" produces:
* └─TABLE
* └─TBODY (virtual)
* └─TR (virtual)
* └─TD
* └─TABLE
* └─TBODY (virtual)
* └─TR (virtual)
* └─TD
* └─…
*/
$this->assertGreaterThanOrEqual( WP_HTML_Processor::MAX_BOOKMARKS - 1, $reached_tokens );
$this->assertLessThanOrEqual( WP_HTML_Processor::MAX_BOOKMARKS + 1, $reached_tokens );
}
}
Loading