Skip to content

Commit d0c6c2d

Browse files
committed
HTML API: Replace single PCRE call with new UTF-8 pipeline.
The only PCRE in the HTML API was used to validate a given attribute name when setting an attribute. This change relies on the new UTF-8 `wp_has_noncharacters()` method, removing the reliance on the PCRE extension and unifying behaviors across PHP runtime environments.
1 parent ad0c62c commit d0c6c2d

File tree

1 file changed

+16
-28
lines changed

1 file changed

+16
-28
lines changed

src/wp-includes/html-api/class-wp-html-tag-processor.php

Lines changed: 16 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -3930,41 +3930,29 @@ public function set_attribute( $name, $value ): bool {
39303930
return false;
39313931
}
39323932

3933-
/*
3933+
$name_length = strlen( $name );
3934+
3935+
/**
39343936
* WordPress rejects more characters than are strictly forbidden
39353937
* in HTML5. This is to prevent additional security risks deeper
39363938
* in the WordPress and plugin stack. Specifically the
39373939
* less-than (<) greater-than (>) and ampersand (&) aren't allowed.
39383940
*
3939-
* The use of a PCRE match enables looking for specific Unicode
3940-
* code points without writing a UTF-8 decoder. Whereas scanning
3941-
* for one-byte characters is trivial (with `strcspn`), scanning
3942-
* for the longer byte sequences would be more complicated. Given
3943-
* that this shouldn't be in the hot path for execution, it's a
3944-
* reasonable compromise in efficiency without introducing a
3945-
* noticeable impact on the overall system.
3946-
*
39473941
* @see https://html.spec.whatwg.org/#attributes-2
3948-
*
3949-
* @todo As the only regex pattern maybe we should take it out?
3950-
* Are Unicode patterns available broadly in Core?
39513942
*/
3952-
if ( preg_match(
3953-
'~[' .
3954-
// Syntax-like characters.
3955-
'"\'>&</ =' .
3956-
// Control characters.
3957-
'\x{00}-\x{1F}' .
3958-
// HTML noncharacters.
3959-
'\x{FDD0}-\x{FDEF}' .
3960-
'\x{FFFE}\x{FFFF}\x{1FFFE}\x{1FFFF}\x{2FFFE}\x{2FFFF}\x{3FFFE}\x{3FFFF}' .
3961-
'\x{4FFFE}\x{4FFFF}\x{5FFFE}\x{5FFFF}\x{6FFFE}\x{6FFFF}\x{7FFFE}\x{7FFFF}' .
3962-
'\x{8FFFE}\x{8FFFF}\x{9FFFE}\x{9FFFF}\x{AFFFE}\x{AFFFF}\x{BFFFE}\x{BFFFF}' .
3963-
'\x{CFFFE}\x{CFFFF}\x{DFFFE}\x{DFFFF}\x{EFFFE}\x{EFFFF}\x{FFFFE}\x{FFFFF}' .
3964-
'\x{10FFFE}\x{10FFFF}' .
3965-
']~Ssu',
3966-
$name
3967-
) ) {
3943+
if (
3944+
0 === $name_length ||
3945+
// Syntax-like characters.
3946+
strcspn( $name, '"\'>&</ =' ) !== $name_length ||
3947+
// Control characters.
3948+
strcspn(
3949+
$name,
3950+
"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F" .
3951+
"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F"
3952+
) !== $name_length ||
3953+
// Unicode noncharacters.
3954+
wp_has_noncharacters( $name )
3955+
) {
39683956
_doing_it_wrong(
39693957
__METHOD__,
39703958
__( 'Invalid attribute name.' ),

0 commit comments

Comments
 (0)