Skip to content

Commit b91b757

Browse files
committed
KSES: Prevent normalization from unescaping escaped numeric character references.
Fixes an issue where `wp_kses_normalize_entities` would transform inputs like "'" into "'", changing the intended HTML text. This behavior has present since the initial version of KSES was introduced in [649]. [2896] applied the normalization to post content for users without the "unfiltered_html" capability. Developed in #9099. Props jonsurrell, dmsnell, sirlouen. Fixes #63630. git-svn-id: https://develop.svn.wordpress.org/trunk@60616 602fd350-edb4-49c9-b593-d223f7449a82
1 parent 1f7af9a commit b91b757

File tree

2 files changed

+41
-3
lines changed

2 files changed

+41
-3
lines changed

src/wp-includes/kses.php

Lines changed: 34 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1958,14 +1958,45 @@ function wp_kses_normalize_entities( $content, $context = 'html' ) {
19581958
// Disarm all entities by converting & to &
19591959
$content = str_replace( '&', '&', $content );
19601960

1961-
// Change back the allowed entities in our list of allowed entities.
1961+
/*
1962+
* Decode any character references that are now double-encoded.
1963+
*
1964+
* It's important that the following normalizations happen in the correct order.
1965+
*
1966+
* At this point, all `&` have been transformed to `&`. Double-encoded named character
1967+
* references like `&` will be decoded back to their single-encoded form `&`.
1968+
*
1969+
* First, numeric (decimal and hexadecimal) character references must be handled so that
1970+
* `	` becomes `	`. If the named character references were handled first, there
1971+
* would be no way to know whether the double-encoded character reference had been produced
1972+
* in this function or was the original input.
1973+
*
1974+
* Consider the two examples, first with named entity decoding followed by numeric
1975+
* entity decoding. We'll use U+002E FULL STOP (.) in our example, this table follows the
1976+
* string processing from left to right:
1977+
*
1978+
* | Input | &-encoded | Named ref double-decoded | Numeric ref double-decoded |
1979+
* | ------------ | ---------------- | ------------------------- | -------------------------- |
1980+
* | `.` | `.` | `.` | `.` |
1981+
* | `.` | `.` | `.` | `.` |
1982+
*
1983+
* Notice in the example above that different inputs result in the same result. The second case
1984+
* was not normalized and produced HTML that is semantically different from the input.
1985+
*
1986+
* | Input | &-encoded | Numeric ref double-decoded | Named ref double-decoded |
1987+
* | ------------ | ---------------- | --------------------------- | ------------------------ |
1988+
* | `.` | `.` | `.` | `.` |
1989+
* | `.` | `.` | `.` | `.` |
1990+
*
1991+
* Here, each input is normalized to an appropriate output.
1992+
*/
1993+
$content = preg_replace_callback( '/&#(0*[0-9]{1,7});/', 'wp_kses_normalize_entities2', $content );
1994+
$content = preg_replace_callback( '/&#[Xx](0*[0-9A-Fa-f]{1,6});/', 'wp_kses_normalize_entities3', $content );
19621995
if ( 'xml' === $context ) {
19631996
$content = preg_replace_callback( '/&([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_xml_named_entities', $content );
19641997
} else {
19651998
$content = preg_replace_callback( '/&([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_named_entities', $content );
19661999
}
1967-
$content = preg_replace_callback( '/&#(0*[0-9]{1,7});/', 'wp_kses_normalize_entities2', $content );
1968-
$content = preg_replace_callback( '/&#[Xx](0*[0-9A-Fa-f]{1,6});/', 'wp_kses_normalize_entities3', $content );
19692000

19702001
return $content;
19712002
}

tests/phpunit/tests/kses.php

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -597,6 +597,12 @@ public static function data_normalize_entities(): array {
597597
'Encoded named ref &' => array( '&', '&' ),
598598
'Encoded named ref &' => array( '&', '&' ),
599599
'Encoded named ref &' => array( '&', '&' ),
600+
'Encoded numeric ref '' => array( ''', ''' ),
601+
'Encoded numeric ref '' => array( ''', ''' ),
602+
'Encoded numeric ref '' => array( ''', ''' ),
603+
'Encoded hex ref '' => array( ''', ''' ),
604+
'Encoded hex ref '' => array( ''', ''' ),
605+
'Encoded hex ref '' => array( ''', ''' ),
600606

601607
/*
602608
* The codepoint value here is outside of the valid unicode range whose
@@ -609,6 +615,7 @@ public static function data_normalize_entities(): array {
609615

610616
/**
611617
* @ticket 26290
618+
* @ticket 63630
612619
*
613620
* @dataProvider data_normalize_entities
614621
*/

0 commit comments

Comments
 (0)