Skip to content

Commit d6355dc

Browse files
committed
Docs: Expand valid_unicode function documentation.
The `valid_unicode()` function accepts a limited set of codepoints according to the XML specification. Document the allowed codepoints and link to relevant documentation. Developed in WordPress#9100. Props jonsurrell, dmsnell. See WordPress#6583, #63166. git-svn-id: https://develop.svn.wordpress.org/trunk@60405 602fd350-edb4-49c9-b593-d223f7449a82
1 parent a5f6b9c commit d6355dc

File tree

1 file changed

+24
-4
lines changed

1 file changed

+24
-4
lines changed

src/wp-includes/kses.php

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2083,18 +2083,38 @@ function wp_kses_normalize_entities3( $matches ) {
20832083
/**
20842084
* Determines if a Unicode codepoint is valid.
20852085
*
2086+
* The definition of a valid Unicode codepoint is taken from the XML definition:
2087+
*
2088+
* > Characters
2089+
* >
2090+
* > …
2091+
* > Legal characters are tab, carriage return, line feed, and the legal characters of
2092+
* > Unicode and ISO/IEC 10646.
2093+
* > …
2094+
* > Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
2095+
*
20862096
* @since 2.7.0
20872097
*
2098+
* @see https://www.w3.org/TR/xml/#charsets
2099+
*
20882100
* @param int $i Unicode codepoint.
20892101
* @return bool Whether or not the codepoint is a valid Unicode codepoint.
20902102
*/
20912103
function valid_unicode( $i ) {
20922104
$i = (int) $i;
20932105

2094-
return ( 0x9 === $i || 0xa === $i || 0xd === $i ||
2095-
( 0x20 <= $i && $i <= 0xd7ff ) ||
2096-
( 0xe000 <= $i && $i <= 0xfffd ) ||
2097-
( 0x10000 <= $i && $i <= 0x10ffff )
2106+
return (
2107+
0x9 === $i || // U+0009 HORIZONTAL TABULATION (HT)
2108+
0xA === $i || // U+000A LINE FEED (LF)
2109+
0xD === $i || // U+000D CARRIAGE RETURN (CR)
2110+
/*
2111+
* The valid Unicode characters according to the XML specification:
2112+
*
2113+
* > any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.
2114+
*/
2115+
( 0x20 <= $i && $i <= 0xD7FF ) ||
2116+
( 0xE000 <= $i && $i <= 0xFFFD ) ||
2117+
( 0x10000 <= $i && $i <= 0x10FFFF )
20982118
);
20992119
}
21002120

0 commit comments

Comments
 (0)