feat: refactor dom manipulation #355

fabianwgl · 2025-11-25T12:43:07Z

User description

PR Type

Enhancement

Description

Replace regex-based DOM manipulation with DOMDocument for safer HTML parsing
Add UTF-8 encoding support using mb_encode_numericentity to handle special characters
Implement wrapper tag strategy to preserve HTML fragments during parsing
Add security escaping (esc_url, esc_attr) for output sanitization
Convert attribute validation from regex to DOM element methods
Add comprehensive test suite comparing old and new implementations

Diagram Walkthrough

flowchart LR
  A["Regex-based<br/>manipulation"] -->|"Replace with"| B["DOMDocument<br/>parsing"]
  B -->|"Wrap buffer"| C["cookiebot-wrapper<br/>tag"]
  C -->|"Parse with"| D["mb_encode_numericentity<br/>UTF-8 encoding"]
  D -->|"Modify"| E["Script elements<br/>via DOM API"]
  E -->|"Extract & escape"| F["Safe HTML<br/>output"]
  G["Test suite"] -->|"Validates"| H["Old vs New<br/>equivalence"]

File Walkthrough

Relevant files

Enhancement

helper.php `Convert script manipulation from regex to DOMDocument` src/lib/helper.php Refactored `cookiebot_addons_manipulate_script()` from regex-based to DOMDocument-based implementation Added UTF-8 encoding using `mb_encode_numericentity()` to handle special characters safely Implemented wrapper tag strategy to preserve HTML fragments and prevent unwanted nesting Replaced regex pattern matching with DOM element iteration and attribute manipulation Added early return for empty buffers and proper error suppression with `libxml_use_internal_errors()` Added imports for `DOMDocument` and `DOMElement` classes	+64/-79
Script_Loader_Tag.php `Replace regex parsing with DOMDocument in script loader` src/lib/script_loader_tag/Script_Loader_Tag.php Refactored `cookiebot_add_consent_attribute_to_tag()` to use DOMDocument instead of regex for script tag parsing Replaced regex-based attribute validation with new `validate_attributes_for_consent_ignore_dom()` method using DOM element API Added UTF-8 encoding support via `mb_encode_numericentity()` for consistent HTML parsing Added security escaping with `esc_url()` and `esc_attr()` for output sanitization Converted string-based ID validation to DOM attribute checking logic Added imports for `DOMDocument` and `DOMElement` classes	+78/-22

Tests

compare_output.php `Add test suite for DOM refactoring validation` tests/compare_output.php Created comprehensive test suite comparing new DOMDocument implementation against old regex-based logic Implemented `normalize_html()` function to handle whitespace, quote normalization, and attribute sorting Added test cases for simple scripts, multiple scripts, attributes, existing types, and fragments Tests both `cookiebot_addons_manipulate_script()` and `Script_Loader_Tag` implementations Provides detailed output showing normalized and raw HTML for failed test cases	+141/-0
old_logic.php `Store original regex implementations for testing` tests/old_logic.php Preserved original regex-based implementations as reference for testing Contains `cookiebot_addons_manipulate_script_old()` function with original regex patterns Contains `Script_Loader_Tag_Old` class with original regex-based attribute validation Serves as baseline for comparison tests to ensure behavioral equivalence	+101/-0
mock_wp_functions.php `Add WordPress function mocks for testing` tests/mock_wp_functions.php Created mock implementations of WordPress functions for standalone testing Mocked security functions: `esc_url()`, `esc_attr()`, `esc_html()` Mocked option functions: `get_option()`, `update_option()` Mocked utility functions: `add_action()`, `add_filter()`, `apply_filters()` Defined required constants for plugin directory paths	+59/-0

Miscellaneous

debug_dom.php `Add DOMDocument parsing debug script` debug_dom.php Created debugging script to test DOMDocument parsing behavior Demonstrates wrapper tag strategy for preserving HTML fragments Shows UTF-8 encoding with `mb_encode_numericentity()` usage Outputs parsed node structure for verification	+15/-0

CodeAnt-AI Description

Switch script consent tagging to DOM parsing with regression tests

What Changed

Script handling now loads fragments through DOMDocument so keyword-matched scripts consistently get type="text/plain" and data-cookieconsent attributes instead of relying on fragile regex.
Ignored scripts are rewritten through DOM parsing only when their IDs match the expected handle pattern and they lack prior consent attributes, while URLs and consent tags are escaped before rendering.
A comparison test suite with WordPress function mocks exercises the new helper and loader tag code against the legacy logic to catch regressions.

Impact

✅ Stable consent tagging for keyword-matched scripts
✅ Accurate ignore flags for identified third-party scripts
✅ Confidence from regression tests

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

codeant-ai · 2025-11-25T12:43:17Z

CodeAnt AI is reviewing your PR.

sonarqubecloud · 2025-11-25T12:43:39Z

Quality Gate passed

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

qodo-code-review · 2025-11-25T12:43:42Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
⚪	Inadequate test escaping Description: Mock escaping functions `esc_url()`, `esc_attr()`, and `esc_html()` either return input unchanged or only partially escape, which can mask output-escaping issues during tests and allow unsafe HTML/URLs to pass validation unnoticed. mock_wp_functions.php [13-21] Referred Code if (!function_exists('esc_url')) { function esc_url($url) { return $url; } } if (!function_exists('esc_attr')) { function esc_attr($text) { return htmlspecialchars($text, ENT_QUOTES); } } if (!function_exists('esc_html')) { function esc_html($text) { return htmlspecialchars($text, ENT_QUOTES); } }
	Unsafe debug output Description: Debug script echoes DOM-parsed HTML content directly to STDOUT without escaping or output control, which if left accessible in production could disclose or reflect untrusted HTML content. debug_dom.php [11-15] Referred Code echo "Child Nodes: " . $dummy->childNodes->length . "\n"; foreach ($dummy->childNodes as $node) { echo "Node: " . $node->nodeName . "\n"; echo "Content: " . $dom->saveHTML($node) . "\n"; }
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
⚪	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: No auditing: The newly added DOM parsing and script manipulation perform security-relevant modifications without any audit logging of actions taken or outcomes. Referred Code if ( empty( $buffer ) ) { return $buffer; } // Use DOMDocument to safely parse and modify the script tag $dom = new DOMDocument(); // Suppress errors for partial HTML $libxml_previous_state = libxml_use_internal_errors( true ); // Wrap buffer in a custom tag to ensure correct parsing of fragments (e.g. multiple siblings at root) // This prevents DOMDocument from trying to fix structure by nesting siblings $wrapped_buffer = '<cookiebot-wrapper>' . $buffer . '</cookiebot-wrapper>'; // Load HTML with UTF-8 encoding hack // The mb_convert_encoding is to ensure we don't have encoding issues // We use LIBXML_HTML_NOIMPLIED \| LIBXML_HTML_NODEFDTD to avoid adding <html><body> wrappers automatically // Replacement for deprecated mb_convert_encoding(..., 'HTML-ENTITIES', 'UTF-8') $encoded_buffer = mb_encode_numericentity( $wrapped_buffer, array( 0x80, 0x10FFFF, 0, 0x1FFFFF ), 'UTF-8' ); $dom->loadHTML( $encoded_buffer, LIBXML_HTML_NOIMPLIED \| LIBXML_HTML_NODEFDTD ); ... (clipped 52 lines) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Error handling: DOMDocument loadHTML errors are suppressed and not logged, and failure to parse or missing wrapper returns original buffer without contextual error reporting. Referred Code // Suppress errors for partial HTML $libxml_previous_state = libxml_use_internal_errors( true ); // Wrap buffer in a custom tag to ensure correct parsing of fragments (e.g. multiple siblings at root) // This prevents DOMDocument from trying to fix structure by nesting siblings $wrapped_buffer = '<cookiebot-wrapper>' . $buffer . '</cookiebot-wrapper>'; // Load HTML with UTF-8 encoding hack // The mb_convert_encoding is to ensure we don't have encoding issues // We use LIBXML_HTML_NOIMPLIED \| LIBXML_HTML_NODEFDTD to avoid adding <html><body> wrappers automatically // Replacement for deprecated mb_convert_encoding(..., 'HTML-ENTITIES', 'UTF-8') $encoded_buffer = mb_encode_numericentity( $wrapped_buffer, array( 0x80, 0x10FFFF, 0, 0x1FFFFF ), 'UTF-8' ); $dom->loadHTML( $encoded_buffer, LIBXML_HTML_NOIMPLIED \| LIBXML_HTML_NODEFDTD ); libxml_use_internal_errors( $libxml_previous_state ); $scripts = $dom->getElementsByTagName( 'script' ); $modified = false; // Convert DOMNodeList to array to avoid modification issues during iteration $script_nodes = array(); ... (clipped 45 lines) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Input handling: External HTML fragments are parsed and rewritten without explicit validation of $keywords or buffer content beyond emptiness, which may need additional safeguards depending on sources. Referred Code if ( empty( $buffer ) ) { return $buffer; } // Use DOMDocument to safely parse and modify the script tag $dom = new DOMDocument(); // Suppress errors for partial HTML $libxml_previous_state = libxml_use_internal_errors( true ); // Wrap buffer in a custom tag to ensure correct parsing of fragments (e.g. multiple siblings at root) // This prevents DOMDocument from trying to fix structure by nesting siblings $wrapped_buffer = '<cookiebot-wrapper>' . $buffer . '</cookiebot-wrapper>'; // Load HTML with UTF-8 encoding hack // The mb_convert_encoding is to ensure we don't have encoding issues // We use LIBXML_HTML_NOIMPLIED \| LIBXML_HTML_NODEFDTD to avoid adding <html><body> wrappers automatically // Replacement for deprecated mb_convert_encoding(..., 'HTML-ENTITIES', 'UTF-8') $encoded_buffer = mb_encode_numericentity( $wrapped_buffer, array( 0x80, 0x10FFFF, 0, 0x1FFFFF ), 'UTF-8' ); $dom->loadHTML( $encoded_buffer, LIBXML_HTML_NOIMPLIED \| LIBXML_HTML_NODEFDTD ); ... (clipped 52 lines) Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review · 2025-11-25T12:44:35Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category Suggestion Impact

Possible issue

Preserve original script tag attributes

Use DOMDocument to parse the script tag and add consent attributes, which
preserves existing attributes like async and defer, instead of completely
replacing the tag and losing them.

src/lib/script_loader_tag/Script_Loader_Tag.php [82-85]

-// If we have a match, we completely replace the tag with our own constructed one
-// This is safer than parsing for this specific case as we know exactly what we want
-//phpcs:ignore WordPress.WP.EnqueuedResources.NonEnqueuedScript
-return '<script src="' . esc_url( $src ) . '" type="text/plain" data-cookieconsent="' . esc_attr( implode( ',', $this->tags[ $handle ] ) ) . '"></script>';
+// If we have a match, we parse the tag to add attributes, preserving existing ones.
+$dom = new DOMDocument();
+// Suppress errors for partial HTML and handle encoding
+$libxml_previous_state = libxml_use_internal_errors( true );
+$encoded_tag = mb_encode_numericentity( $tag, array( 0x80, 0x10FFFF, 0, 0x1FFFFF ), 'UTF-8' );
+$dom->loadHTML( $encoded_tag, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );
+libxml_use_internal_errors( $libxml_previous_state );
 
+$scripts = $dom->getElementsByTagName( 'script' );
+if ( $scripts->length > 0 ) {
+    /** @var DOMElement $script */
+    $script = $scripts->item( 0 );
+    $script->setAttribute( 'type', 'text/plain' );
+    $script->setAttribute( 'data-cookieconsent', esc_attr( implode( ',', $this->tags[ $handle ] ) ) );
+    return $dom->saveHTML( $script );
+}
+

Apply / Chat

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a bug where important script attributes like async or id are discarded, and proposes a robust fix using DOMDocument which aligns with the PR's refactoring goals.

High

More

pantoaibot · 2025-11-25T12:45:10Z

PR Summary:

Refactor script tag manipulation to use DOMDocument instead of regex; add DOM-based validation and tests.

Replaced regex-based parsing in cookiebot_addons_manipulate_script (src/lib/helper.php) with DOMDocument:
- Wraps fragments in a , encodes input for UTF-8, suppresses libxml errors, iterates script nodes, and updates attributes (remove type/data-cookieconsent, then set type="text/plain" and data-cookieconsent).
- Early-return when buffer is empty; returns original buffer when no modification occurred.
- Removes previous PCRE/JIT fallback logic.
- Adds use imports for DOMDocument and DOMElement.
Script_Loader_Tag (src/lib/script_loader_tag/Script_Loader_Tag.php):
- Uses DOMDocument to parse/modify individual script tags when marking them as ignored; adds data-cookieconsent="ignore" via DOM.
- Replaces regex attribute validation with validate_attributes_for_consent_ignore_dom(), which checks the script id and existing data-cookieconsent attribute using DOMElement methods and a whitelist of valid suffixes.
- Escapes output for known-tag replacement using esc_url() and esc_attr().
- Added helper methods: extract_base_id_from_inline_id and the DOM-based validator.
Tests and tooling:
- Added tests and mocks (tests/compare_output.php, tests/mock_wp_functions.php) and old_logic.php to compare new DOM-based behavior against the original regex implementation.
- Added debug_dom.php for manual DOM parsing inspection.
Behavior/compatibility notes:
- DOMDocument can change attribute ordering/formatting (tests normalize differences) — outputs may differ in formatting but aim to preserve semantic behavior.
- Slight behavior change: empty input now returns immediately; if no scripts are modified the original buffer is returned unchanged.
- Removed regex stacklimit fallback; parsing is now DOM-based (more robust but slightly heavier).
No external dependency updates.

_{Reviewed by Panto AI}

pantoaibot · 2025-11-25T12:47:39Z

debug_dom.php

+<?php
+$buffer = '<div></div><script src="other.js"></script><script src="tracking.js"></script>';
+$wrapped_buffer = '<dummy>' . $buffer . '</dummy>';
+
+$dom = new DOMDocument();
+$encoded_buffer = mb_encode_numericentity( $wrapped_buffer, array( 0x80, 0x10FFFF, 0, 0x1FFFFF ), 'UTF-8' );
+$dom->loadHTML( $encoded_buffer, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );
+
+$dummy = $dom->getElementsByTagName('dummy')->item(0);
+
+echo "Child Nodes: " . $dummy->childNodes->length . "\n";
+foreach ($dummy->childNodes as $node) {
+    echo "Node: " . $node->nodeName . "\n";
+    echo "Content: " . $dom->saveHTML($node) . "\n";
+}


[NITPICK] This file appears to be a local debugging script left in the commit. Remove debug_dom.php from the PR (or move it to a non-shipped dev-only location). Leaving debug utilities in the repository can accidentally expose information and increases maintenance noise.

pantoaibot · 2025-11-25T12:47:41Z

src/lib/helper.php

+		$dom = new DOMDocument();
+
+		// Suppress errors for partial HTML


[CRITICAL_BUG] The new logic depends on the DOMDocument extension (instantiating new DOMDocument()). If ext-dom is not available in the target environment this will fatal error. Add a graceful fallback (either keep the original regex-based implementation as a fallback or check class_exists('DOMDocument')/extension_loaded('dom') and return $buffer or run the regex approach). Ensure the code never triggers a fatal error in environments without DOM.

if (!class_exists('DOMDocument') || !extension_loaded('dom')) { // Fallback to regex-based implementation // (Insert the old regex logic here, or simply return $buffer) return $buffer; }

pantoaibot · 2025-11-25T12:47:43Z

src/lib/helper.php

+		$libxml_previous_state = libxml_use_internal_errors( true );
+
+		// Wrap buffer in a custom tag to ensure correct parsing of fragments (e.g. multiple siblings at root)
+		// This prevents DOMDocument from trying to fix structure by nesting siblings
+		$wrapped_buffer = '<cookiebot-wrapper>' . $buffer . '</cookiebot-wrapper>';

-		$normalized_buffer = preg_replace( '/(<script(.*?)\/>)/is', '<script$2></script>', $buffer );
+		// Load HTML with UTF-8 encoding hack
+		// The mb_convert_encoding is to ensure we don't have encoding issues
+		// We use LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD to avoid adding <html><body> wrappers automatically
+		// Replacement for deprecated mb_convert_encoding(..., 'HTML-ENTITIES', 'UTF-8')
+		$encoded_buffer = mb_encode_numericentity( $wrapped_buffer, array( 0x80, 0x10FFFF, 0, 0x1FFFFF ), 'UTF-8' );
+		$dom->loadHTML( $encoded_buffer, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );

-		if ( $normalized_buffer !== null ) {
-			$buffer = $normalized_buffer;
+		libxml_use_internal_errors( $libxml_previous_state );
+


[PERFORMANCE_OPTIMIZATION] You call libxml_use_internal_errors(true) and then restore the previous state after loadHTML but don't clear libxml errors. Call libxml_clear_errors() after loadHTML (before restoring the previous state) to avoid accumulating libxml errors in long-running processes or tests.

$dom->loadHTML( $encoded_buffer, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD ); libxml_clear_errors(); libxml_use_internal_errors( $libxml_previous_state );

pantoaibot · 2025-11-25T12:47:44Z

src/lib/helper.php

+				}
+			}
+		}
+
+		if ( $modified ) {
+			// Save HTML
+			// We extract children of our wrapper
+			$wrapper = $dom->getElementsByTagName( 'cookiebot-wrapper' )->item( 0 );
+			if ( $wrapper ) {
+				$output = '';
+				foreach ( $wrapper->childNodes as $node ) {
+					$output .= $dom->saveHTML( $node );
+				}
+				return $output;


[VALIDATION] Behavior change: the old implementation updated option 'cookiebot_regex_stacklimit' on a regex fallback. The new DOM-based path removed that fallback and no longer writes that option. If other code relies on that option being set to detect regex failures, adapt or preserve that behavior in a DOM fallback branch to maintain compatibility.

// In the fallback branch (regex failure), preserve the option update if ($updated_scripts === null) { $updated_scripts = $buffer; if (get_option('cookiebot_regex_stacklimit') === false) { update_option('cookiebot_regex_stacklimit', 1); } }

pantoaibot · 2025-11-25T12:47:46Z

src/lib/script_loader_tag/Script_Loader_Tag.php

+			// Use DOMDocument to safely parse and modify the script tag
+			$dom = new DOMDocument();
+
+			// Suppress errors for partial HTML
+			$libxml_previous_state = libxml_use_internal_errors( true );
+
+			// Load HTML with UTF-8 encoding hack
+			// The mb_convert_encoding is to ensure we don't have encoding issues
+			// Replacement for deprecated mb_convert_encoding(..., 'HTML-ENTITIES', 'UTF-8')
+			$encoded_tag = mb_encode_numericentity( $tag, array( 0x80, 0x10FFFF, 0, 0x1FFFFF ), 'UTF-8' );
+			$dom->loadHTML( $encoded_tag, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );
+


[CRITICAL_BUG] This method now uses DOMDocument (new DOMDocument() at lines 96-100). As with helper.php, you must guard against missing ext-dom (class DOMDocument not found) or add a fallback. Without the DOM extension this will fatal error. Either check class_exists('DOMDocument') and return original $tag or fall back to the previous preg_replace_callback approach.

if ( ! class_exists( 'DOMDocument' ) ) { // Fallback to previous regex approach or return original $tag return preg_replace_callback( '/<script\s*(?<atts>[^>]*)>/', function ( $tag ) use ( $handle ) { if ( ! self::validate_attributes_for_consent_ignore( $handle, $tag['atts'] ) ) { return $tag[0]; } return str_replace( '<script ', '<script data-cookieconsent="ignore" ', $tag[0] ); }, $tag ); } // ... continue with DOMDocument logic

pantoaibot · 2025-11-25T12:47:48Z

tests/compare_output.php

+    $html = preg_replace_callback('/<script([^>]*)>/', function($matches) {
+        $attrs = $matches[1];
+        preg_match_all('/(\w+)(?:="([^"]*)")?/', $attrs, $attr_matches, PREG_SET_ORDER);
+        $sorted_attrs = [];
+        foreach ($attr_matches as $m) {
+            $name = $m[1];
+            $value = isset($m[2]) ? $m[2] : '';
+            $sorted_attrs[$name] = $value;
+        }
+        ksort($sorted_attrs);
+        $new_attrs = '';
+        foreach ($sorted_attrs as $k => $v) {
+            if ($v === '') {
+                $new_attrs .= " $k";
+            } else {
+                $new_attrs .= " $k=\"$v\"";
+            }
+        }
+        return "<script$new_attrs>";
+    }, $html);


[REFACTORING] The attribute sorting regex in normalize_html uses /(\w+)(?:="([^"]*)")?/ which doesn't match attribute names containing hyphens (e.g. data-cookieconsent, data-cfasync). Update the attribute name pattern to allow hyphens (e.g. [A-Za-z0-9_-]+) so attribute sorting is robust and tests don't produce false negatives because of attribute name parsing issues.

// In normalize_html, update the attribute regex to allow hyphens in attribute names $html = preg_replace_callback('/<script([^>]*)>/', function($matches) { $attrs = $matches[1]; // Updated regex: attribute names can include hyphens and underscores preg_match_all('/([A-Za-z0-9_\-]+)(?:="([^"]*)")?/', $attrs, $attr_matches, PREG_SET_ORDER); $sorted_attrs = []; foreach ($attr_matches as $m) { $name = $m[1]; $value = isset($m[2]) ? $m[2] : ''; $sorted_attrs[$name] = $value; } ksort($sorted_attrs); $new_attrs = ''; foreach ($sorted_attrs as $k => $v) { if ($v === '') { $new_attrs .= " $k"; } else { $new_attrs .= " $k=\"$v\""; } } return "<script$new_attrs>"; }, $html);

pantoaibot · 2025-11-25T12:47:50Z

tests/mock_wp_functions.php

+if (!function_exists('esc_url')) {
+    function esc_url($url) { return $url; }
+}
+if (!function_exists('esc_attr')) {
+    function esc_attr($text) { return htmlspecialchars($text, ENT_QUOTES); }
+}


[NITPICK] Test helper esc_attr returns htmlspecialchars(..., ENT_QUOTES) while esc_url returns original value. This difference is OK for tests but can hide encoding differences when new code uses esc_url/esc_attr. Consider making esc_url a lightweight normalizer (e.g. return filter_var($url, FILTER_SANITIZE_URL)) to better reflect production behavior in tests.

function esc_url($url) { return filter_var($url, FILTER_SANITIZE_URL); }

pantoaibot · 2025-11-25T12:47:52Z

Reviewed up to commit:0dcba638733eb2176a7a3eb572eb3f500c76cbbc

Additional Suggestion

src/lib/script_loader_tag/Script_Loader_Tag.php, line:218-256

validate_attributes_for_consent_ignore_dom reimplements ID matching logic using string checks and a small list of suffixes. This can miss edge cases compared to the original regex (for example different suffix patterns or subtle ID forms). Reuse the original preg_match pattern (converted to operate on the DOMElement's id attribute) or centralize the matching logic so both extract_base_id_from_inline_id() and this validator agree on allowed suffixes. That avoids regressions for inline script IDs created by WP or build tools.

private function validate_attributes_for_consent_ignore_dom( $script_handle, $id, $script ) {
    // Use the same regex as extract_base_id_from_inline_id for suffixes
    $base_id = preg_replace( '/-js-(extra|after|before)$/', '', $id );
    if ( $base_id !== $script_handle ) {
        return false;
    }
    if ( $script->hasAttribute( 'data-cookieconsent' ) ) {
        return false;
    }
    return true;
}
// Or, move the regex pattern to a shared method and use it in both places.

_{Reviewed by Panto AI}

codeant-ai · 2025-11-25T12:50:48Z

src/lib/script_loader_tag/Script_Loader_Tag.php

+				$script->setAttribute( 'data-cookieconsent', 'ignore' );
+
+				// Save HTML
+				return $dom->saveHTML( $script );
+			}


Suggestion: After the DOM validation, inject the data-cookieconsent attribute into the original $tag string instead of returning $dom->saveHTML() so the markup is not mutated by DOMDocument serialization. [possible issue]

codeant-ai · 2025-11-25T12:50:52Z

CodeAnt AI finished reviewing your PR.

feat: refactor dom manipulation

0dcba63

qodo-code-review bot added the Review effort 3/5 label Nov 25, 2025

codeant-ai bot added the size:L This PR changes 100-499 lines, ignoring generated files label Nov 25, 2025

pantoaibot bot reviewed Nov 25, 2025

View reviewed changes

codeant-ai bot reviewed Nov 25, 2025

View reviewed changes

		$dom = new DOMDocument();

		// Suppress errors for partial HTML

feat: refactor dom manipulation #355

Are you sure you want to change the base?

feat: refactor dom manipulation #355

Uh oh!

Conversation

fabianwgl commented Nov 25, 2025 • edited by codeant-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Example

Preserve Org Learnings with CodeAnt

Example

Retrigger review

Check Your Repository Health

Uh oh!

codeant-ai bot commented Nov 25, 2025

Uh oh!

sonarqubecloud bot commented Nov 25, 2025

Quality Gate passed

Uh oh!

qodo-code-review bot commented Nov 25, 2025

PR Compliance Guide 🔍

Uh oh!

qodo-code-review bot commented Nov 25, 2025

PR Code Suggestions ✨

Uh oh!

pantoaibot bot commented Nov 25, 2025

Uh oh!

pantoaibot bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

pantoaibot bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

pantoaibot bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

pantoaibot bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

pantoaibot bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

pantoaibot bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

pantoaibot bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

pantoaibot bot commented Nov 25, 2025

Uh oh!

codeant-ai bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

codeant-ai bot commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fabianwgl commented Nov 25, 2025 •

edited by codeant-ai bot

Loading