Skip to content

HTML Processor perf ideas#32

Draft
sirreal wants to merge 53 commits intotrunkfrom
autoresearch/html-perf-2026-03-13
Draft

HTML Processor perf ideas#32
sirreal wants to merge 53 commits intotrunkfrom
autoresearch/html-perf-2026-03-13

Conversation

@sirreal
Copy link
Owner

@sirreal sirreal commented Mar 14, 2026

Trac ticket:

Use of AI Tools


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

sirreal added 29 commits March 13, 2026 22:48
…ex pointer

- Cache strlen($this->html) in a property to avoid repeated function calls
- Convert recursive next_visitable_token() to iterative while loop
- Replace array_shift() with index pointer for element queue consumption

Benchmark: 2453ms → 2386ms (~2.7% improvement)
…checks

- parse_next_tag() is only called from base_class_next_token() which
  already calls after_tag(), so the second call was redundant
- Guard the update-flushing logic with a check for non-empty updates,
  avoiding function call overhead in the read-only hot path

Benchmark: 2386ms → 2282ms (~4.4% improvement)
…rty access

Cache $this->html and $this->bytes_already_parsed in local variables
to reduce property access overhead in the inner attribute parsing loop.
Also inlines skip_whitespace() calls within the method.
Replace in_array() with constant array for O(1) isset() lookup.
Add early returns for common cases (#text, #comment, html doctype)
to avoid unnecessary property access and method calls.

Benchmark: 2282ms → 2204ms (~3.4% improvement)
get_tag() is called multiple times per token (from step(), step_in_body(),
get_token_name(), etc.). Cache the uppercase tag name on first computation
and clear it in after_tag().

Benchmark: 2204ms → 2132ms (~3.3% improvement)
- Cache get_tag() result to avoid redundant substr+strtoupper per token
- Replace get_token_type() + conditional sigil with direct parser_state
  check, eliminating a method call and string interpolation per token
- Applied across all 17 step_in_* method entry points

Benchmark: 2204ms → 2108ms (~4.4% improvement from tag cache + op pattern)
… methods

Inline the sigil computation into the $op concatenation in step_in_caption,
step_in_table_body, step_in_row, and step_in_cell.
Skip the null byte and whitespace detection loops when the text
starts with a regular character. Most text nodes contain visible
content, so this avoids unnecessary strspn calls.
…content check

Avoid temporary array allocation in the hot per-token path.
Also convert bookmark_token() from throwing to returning null
on failure, moving the exception to insert_virtual_node() only.
Avoid the int-to-string conversion in bookmark_token() by passing
the counter directly as the bookmark key. PHP arrays support int
keys natively, avoiding a string allocation per token.
…trcspn

Replace strspn() + strcspn() combo for tag name detection with a direct
character range comparison and a single strcspn() call. Since alphabetic
characters are not in the delimiter set, one strcspn() from the tag name
start computes the full tag name length. Moves the doc_length bounds
check before the character access for safety.
…t_token_name()

In all step_in_* methods, replace $this->get_token_name() with
$this->state->current_token->node_name. The token name is already
computed in step() and stored on the WP_HTML_Token, so reading it
directly avoids a method call and switch dispatch per token.
The operation string (e.g. '+DIV', '-DIV', '#text') was recomputed in
every step_in_* method via string concatenation and is_tag_closer()
calls. Compute it once in step() and store as a property that all
dispatch methods read directly.
During step(), current_element is always null, so the virtual element
check in the overridden is_tag_closer() always falls through to the
parent method. Call parent::is_tag_closer() directly to skip the
unnecessary is_virtual() dispatch chain.
Replace method calls to expects_closer() with inline checks in
next_visitable_token() and step(). These hot-path loops call
expects_closer() for every token, and inlining eliminates method
dispatch overhead for the common HTML-namespace case.
Add a pre-computed is_pop boolean property to WP_HTML_Stack_Event to
avoid string comparison per event. Merge the two separate is_pop blocks
in next_visitable_token() into one to reduce branching.
Fast-path the two most common token types (matched tags and text nodes)
to avoid the method call and switch dispatch of get_token_name(). Tags
call get_tag() directly, text nodes return '#text' immediately.
Maintain a cached reference to the top element of the stack, updated
on push, pop, and remove_node. This avoids calling end() on every
current_node() access, eliminating function call overhead in the
hot path where current_node() is called multiple times per step.
In the push/pop handler closures, use parent::is_tag_closer() instead of
$this->is_tag_closer() to bypass the is_virtual() dispatch chain. Also
cache current_token in a local variable and simplify the provenance
computation to short-circuit on the is_virtual check.
In push/pop handlers, avoid calling change_parsing_namespace() when the
namespace is already 'html' (the common case). This skips the method
call, in_array validation, and property assignment for the vast majority
of tokens.
When is_virtual is false, current_token is guaranteed to be set (the
is_virtual check already validates this). Remove the redundant isset()
call from the provenance ternary in push/pop handlers.
The operation string property is no longer read — all checks use the
is_pop boolean instead. Skip the assignment in the constructor.
Replace string-based operation parameter with a boolean is_pop flag
passed directly from the call sites. Eliminates a string comparison
in the constructor for every stack event (~1.5M per benchmark run).
…events

Replace the 'virtual'/'real' provenance string with a boolean is_virtual
flag. This eliminates string allocation and comparison, replacing them
with a direct boolean assignment and check.
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on a comprehensive performance optimization effort for the HTML Processor, specifically targeting the next_token() method's throughput. It incorporates numerous micro-optimizations identified through dedicated autoresearch, aiming to significantly reduce execution time and memory usage during HTML tokenization. The changes involve refactoring core logic, introducing caching mechanisms, and streamlining data structures to enhance efficiency without altering existing behavior.

Highlights

  • Performance Improvements: Implemented 23 distinct micro-optimizations to the HTML Processor's tokenization process, resulting in a 27.6% improvement in mean execution time.
  • Caching and Reduced Computations: Introduced caching for HTML document length ($this->html_length) and uppercase tag names ($this->tag_name_cache), and pre-computed operation strings ($this->current_op) to avoid redundant function calls and string manipulations in hot paths.
  • Iterative Processing: Converted the recursive next_visitable_token() method to an iterative loop with an index pointer, reducing overhead associated with recursion.
  • Optimized Attribute Handling: Introduced lazy attribute parsing using scan_next_attribute() and ensure_attributes_parsed(), deferring the creation of WP_HTML_Attribute_Token objects until they are explicitly needed.
  • Streamlined Stack Events: Refactored WP_HTML_Stack_Event to use boolean flags (is_pop, is_virtual) instead of string-based operations and provenance, and optimized push/pop handlers to use parent::is_tag_closer() and direct boolean parameters.
  • Faster Lookup Mechanisms: Replaced in_array() checks with isset() on a constant lookup table (ELEMENTS_WITHOUT_A_CLOSER) for determining elements that do not expect a closing tag, and used direct character checks for tag name parsing.
  • Reduced Object Allocations: Optimized WP_HTML_Open_Elements by caching the current_node to avoid repeated end() calls on the stack array, and streamlined bookmark handling to return integers instead of strings.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • autoresearch.checks.sh
    • Added a new script to run HTML API tests with suppressed success output, showing only errors.
  • autoresearch.md
    • Added a new document detailing the objectives, metrics, how-to-run instructions, files in scope, constraints, a comprehensive list of committed performance wins, dead ends, and architectural notes for HTML Tag Processor performance research.
  • autoresearch.sh
    • Added a new script to perform syntax checks, run hyperfine benchmarks on bench.php, and extract performance metrics (mean, stddev, min, max execution times).
  • bench.php
    • Added a new benchmark script that loads html-standard.html and iterates through all tokens using WP_HTML_Processor::next_token().
  • bootstrap-html-api.php
    • Added a new bootstrap file for HTML API benchmarks, including necessary require_once statements for HTML API classes and polyfills for WordPress functions like esc_attr, __, _doing_it_wrong, and wp_kses_uri_attributes.
  • src/wp-includes/html-api/class-wp-html-open-elements.php
    • Added a current_node_cache property to store a cached reference to the last node on the stack.
    • Updated current_node() to return the cached node instead of calling end() on the stack.
    • Modified current_node_is() to utilize the current_node_cache for improved performance.
    • Updated pop(), push(), and remove_node() methods to maintain and update the current_node_cache consistently.
  • src/wp-includes/html-api/class-wp-html-processor.php
    • Added ELEMENTS_WITHOUT_A_CLOSER constant for efficient lookup of void and special atomic HTML elements.
    • Introduced current_op property to cache the operation string for the current token, avoiding re-computation.
    • Added element_queue_index property to manage iterative processing of the element queue.
    • Refactored push and pop handlers to use parent::is_tag_closer() and pass boolean is_pop and is_virtual directly to WP_HTML_Stack_Event constructor, and optimized namespace change logic.
    • Converted next_visitable_token() from a recursive function to an iterative loop, utilizing element_queue_index and is_pop for efficiency.
    • Updated is_tag_closer() to use the is_pop property from current_element.
    • Modified is_virtual() to directly check the is_virtual property of current_element.
    • Optimized expects_closer() with early returns for comments, text nodes, and doctypes, and used isset() on ELEMENTS_WITHOUT_A_CLOSER for faster checks.
    • Streamlined step() by using direct isset() checks for void/atomic elements and pre-computing $this->current_op.
    • Optimized step() to use direct comparisons instead of in_array() for math integration node types.
    • Modified bookmark_token() to return null on failure instead of throwing an exception and to return an integer bookmark name.
    • Updated seek() to reset element_queue_index when seeking to a new bookmark.
    • Added a null check for the result of bookmark_token() in insert_virtual_node() to handle allocation failures.
    • Optimized various step_in_* methods (step_initial, step_before_html, step_before_head, step_in_head, step_in_head_noscript, step_after_head, step_in_body, step_in_table, step_in_caption, step_in_column_group, step_in_table_body, step_in_row, step_in_cell, step_in_select, step_in_select_in_table, step_in_template, step_after_body, step_in_frameset, step_after_frameset, step_after_after_body, step_after_after_frameset, step_in_foreign_content) to use cached current_token->node_name and current_op for improved performance.
  • src/wp-includes/html-api/class-wp-html-stack-event.php
    • Replaced string-based operation and provenance properties with boolean is_pop and is_virtual properties for better performance and clarity.
    • Updated the constructor to accept boolean parameters for is_pop and is_virtual.
  • src/wp-includes/html-api/class-wp-html-tag-processor.php
    • Added html_length property to cache the byte length of the HTML document, avoiding repeated strlen() calls.
    • Introduced tag_name_cache property to store the uppercase tag name, computed on first access per token.
    • Added attributes_dirty and attribute_scan_start properties to support lazy parsing of attributes.
    • Initialized html_length in the constructor.
    • Updated base_class_next_token() to use html_length and to employ scan_next_attribute() for deferred attribute parsing.
    • Modified base_class_next_token() to save and restore new attribute-related properties during token processing.
    • Updated skip_rcdata() and skip_script_data() to use the cached html_length.
    • Removed a redundant after_tag() call from parse_next_tag().
    • Optimized tag name parsing in parse_next_tag() using direct character checks and strcspn for improved speed.
    • Updated parse_next_tag() to use the cached doc_length (html_length).
    • Optimized parse_next_attribute() by using local variables for html, doc_length, and at, reducing property access overhead.
    • Added scan_next_attribute() as a lightweight method to advance the cursor past attributes without creating objects.
    • Introduced ensure_attributes_parsed() to create attribute objects only when they are accessed.
    • Updated after_tag() to guard update flushing with emptiness checks and to reset tag_name_cache and attributes_dirty.
    • Added ensure_attributes_parsed() call in class_name_updates_to_attributes_updates() to ensure attributes are parsed before class name updates are applied.
    • Updated apply_attributes_updates() to refresh html_length after modifying the HTML content.
    • Added ensure_attributes_parsed() calls in get_attribute(), get_attribute_names_with_prefix(), set_attribute(), and remove_attribute() to ensure attributes are available when needed.
    • Optimized get_tag() to use tag_name_cache for efficient retrieval of uppercase tag names.
    • Added a fast-path check in subdivide_text_appropriately() to quickly determine if text is not a null sequence or whitespace-only.
Activity
  • This pull request is marked for code review only, with further discussion directed to a Trac ticket.
  • The changes represent a series of performance optimizations identified through dedicated autoresearch.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant number of performance optimizations for the HTML Processor, which are well-documented in the autoresearch.md file. The changes in the core PHP classes are focused on reducing function calls, avoiding string comparisons, and implementing lazy parsing for attributes, all of which are excellent for performance. The new benchmarking scripts are a great addition for measuring these improvements. I have a couple of suggestions to make these scripts more robust.

set -euo pipefail

# Run HTML API tests — suppress success output, only show errors
./vendor/bin/phpunit -c tests/phpunit/tests/html-api/phpunit.xml --stop-on-error --stop-on-failure --stop-on-warning --stop-on-defect 2>&1 | tail -5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of | tail -5 is a bit fragile as it might hide important error details if the output from phpunit is longer than 5 lines. A more robust approach would be to capture the full output and display it only when the command fails. This ensures you see all relevant error information when tests fail, without cluttering the output on success.

Suggested change
./vendor/bin/phpunit -c tests/phpunit/tests/html-api/phpunit.xml --stop-on-error --stop-on-failure --stop-on-warning --stop-on-defect 2>&1 | tail -5
if ! output=$(./vendor/bin/phpunit -c tests/phpunit/tests/html-api/phpunit.xml --stop-on-error --stop-on-failure --stop-on-warning --stop-on-defect 2>&1); then
echo "$output"
exit 1
fi

Comment on lines +17 to +22
$data = json_decode(file_get_contents($argv[1]), true);
$r = $data["results"][0];
printf("METRIC mean_ms=%.1f\n", $r["mean"] * 1000);
printf("METRIC stddev_ms=%.1f\n", $r["stddev"] * 1000);
printf("METRIC min_ms=%.1f\n", $r["min"] * 1000);
printf("METRIC max_ms=%.1f\n", $r["max"] * 1000);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The inline PHP script for extracting metrics is not very robust. If file_get_contents or json_decode fails, or if the JSON structure is not what's expected, the script will fail with a PHP error. It would be better to add some error handling to provide clearer error messages.

Suggested change
$data = json_decode(file_get_contents($argv[1]), true);
$r = $data["results"][0];
printf("METRIC mean_ms=%.1f\n", $r["mean"] * 1000);
printf("METRIC stddev_ms=%.1f\n", $r["stddev"] * 1000);
printf("METRIC min_ms=%.1f\n", $r["min"] * 1000);
printf("METRIC max_ms=%.1f\n", $r["max"] * 1000);
$json = file_get_contents($argv[1]);
if (false === $json) {
fwrite(STDERR, "Error: Failed to read temp file: {$argv[1]}\n");
exit(1);
}
$data = json_decode($json, true);
if (null === $data) {
fwrite(STDERR, "Error: Failed to decode JSON from file: {$argv[1]}\n");
exit(1);
}
if (!isset($data["results"][0])) {
fwrite(STDERR, "Error: Unexpected JSON structure in file: {$argv[1]}\n");
exit(1);
}
$r = $data["results"][0];
printf("METRIC mean_ms=%.1f\n", $r["mean"] * 1000);
printf("METRIC stddev_ms=%.1f\n", $r["stddev"] * 1000);
printf("METRIC min_ms=%.1f\n", $r["min"] * 1000);
printf("METRIC max_ms=%.1f\n", $r["max"] * 1000);

sirreal added 24 commits March 14, 2026 11:05
Non-element tokens (text, comments, etc.) are always immediately popped
from the open elements stack on the next step() call. This bypasses the
actual stack push/pop and creates the event directly, avoiding: stack
array manipulation, after_element_push/pop callbacks, pop handler event
creation, and breadcrumb push/pop for tokens that always cancel out.
~110ms improvement.
Inline the text node handling from step_in_body() directly in step()
when the insertion mode is IN_BODY. Avoids the method call overhead,
variable assignments, and switch dispatch for the most common token
type in the most common insertion mode. ~40ms improvement.
In the text node fast path, create the stack event directly instead of
going through insert_html_element(). Since we already know the token
is a non-element text node, skip the method call and redundant check.
~20ms improvement.
Text tokens in the IN_BODY insertion mode don't need bookmarks for
read-only tokenization. Skip bookmark_token(), set_bookmark(), and
WP_HTML_Span allocation. Create a lightweight WP_HTML_Token with no
bookmark or destructor callback. ~65ms improvement.
Replace method call with inline logic to avoid function call overhead
on every token. For full parsers (no context_node), this is just a
direct call to current_node(). ~20ms improvement.
Make is_closing_tag protected so the subclass can access it directly.
Inline the is_tag_closer() check in step() to avoid method call
overhead on every token. For start tags (most common), this
short-circuits on is_closing_tag=false. ~12ms improvement.
Add set_bookmark_fast() to tag processor that skips state checks,
array_key_exists, and count() overflow guard. Since bookmarks use
monotonically increasing integer names and old bookmarks are released
when tokens are destroyed, overflow can't happen. ~14ms improvement.
Move the current_op string computation after the text node fast path,
since fast-pathed text tokens never use the op string. Marginal.
Move the text node fast path to right after token parsing, inside the
subdivide_text_appropriately block. This skips all tag-specific
computations (adjusted_current_node, is_matched_tag, is_closer,
is_start_tag, token_name ternary chain) for text tokens. ~24ms.
Replace bookmark_token() method call with inline code, eliminating
one method call per non-text token. Also removes the null check since
set_bookmark_fast doesn't fail. Marginal.
Make token_starts_at and token_length protected so the subclass can
inline the self-closing flag check. For non-matched tags (text,
comments), short-circuits on is_matched_tag=false. For matched tags,
avoids method call overhead. ~35ms improvement.
Make tag_name_starts_at, tag_name_length, tag_name_cache protected.
Inline the strtoupper(substr()) tag name computation, compute
token_name before is_closer (to use cached value for BR check), and
eliminate two get_tag() method calls per matched tag. ~25ms.
Store the is_tag_closer result from step() in a property and read it
in push/pop handlers instead of calling parent::is_tag_closer() per
push and pop. Saves two method calls per stack operation. ~30ms.
The root-node bookmark only exists in fragment parsers. Guard the
string comparison with isset(context_node) so full parsers avoid
the comparison entirely. ~14ms improvement.
Replace count() comparison with isset() for checking if the event
queue has more events. Avoids function call overhead. Marginal.
Check namespace first in push handler. For HTML-namespace elements
(the vast majority), this avoids the integration_node_type access
entirely. Marginal.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant