Skip to content

Commit 2a29ee2

Browse files
committed
KSES: Prevent normalization from unescaping escaped numeric character references.
Fixes an issue where `wp_kses_normalize_entities` would transform inputs like "'" into "'", changing the intended HTML text. This behavior has present since the initial version of KSES was introduced in [649]. [2896] applied the normalization to post content for users without the "unfiltered_html" capability. Developed in WordPress/wordpress-develop#9099. Props jonsurrell, dmsnell, sirlouen. Fixes #63630. Built from https://develop.svn.wordpress.org/trunk@60616 git-svn-id: http://core.svn.wordpress.org/trunk@59952 1a063a9b-81f0-0310-95a4-ce76da25c4cd
1 parent cb8f828 commit 2a29ee2

File tree

2 files changed

+35
-4
lines changed

2 files changed

+35
-4
lines changed

wp-includes/kses.php

Lines changed: 34 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1958,14 +1958,45 @@ function wp_kses_normalize_entities( $content, $context = 'html' ) {
19581958
// Disarm all entities by converting & to &
19591959
$content = str_replace( '&', '&', $content );
19601960

1961-
// Change back the allowed entities in our list of allowed entities.
1961+
/*
1962+
* Decode any character references that are now double-encoded.
1963+
*
1964+
* It's important that the following normalizations happen in the correct order.
1965+
*
1966+
* At this point, all `&` have been transformed to `&`. Double-encoded named character
1967+
* references like `&` will be decoded back to their single-encoded form `&`.
1968+
*
1969+
* First, numeric (decimal and hexadecimal) character references must be handled so that
1970+
* `	` becomes `	`. If the named character references were handled first, there
1971+
* would be no way to know whether the double-encoded character reference had been produced
1972+
* in this function or was the original input.
1973+
*
1974+
* Consider the two examples, first with named entity decoding followed by numeric
1975+
* entity decoding. We'll use U+002E FULL STOP (.) in our example, this table follows the
1976+
* string processing from left to right:
1977+
*
1978+
* | Input | &-encoded | Named ref double-decoded | Numeric ref double-decoded |
1979+
* | ------------ | ---------------- | ------------------------- | -------------------------- |
1980+
* | `.` | `.` | `.` | `.` |
1981+
* | `.` | `.` | `.` | `.` |
1982+
*
1983+
* Notice in the example above that different inputs result in the same result. The second case
1984+
* was not normalized and produced HTML that is semantically different from the input.
1985+
*
1986+
* | Input | &-encoded | Numeric ref double-decoded | Named ref double-decoded |
1987+
* | ------------ | ---------------- | --------------------------- | ------------------------ |
1988+
* | `.` | `.` | `.` | `.` |
1989+
* | `.` | `.` | `.` | `.` |
1990+
*
1991+
* Here, each input is normalized to an appropriate output.
1992+
*/
1993+
$content = preg_replace_callback( '/&#(0*[0-9]{1,7});/', 'wp_kses_normalize_entities2', $content );
1994+
$content = preg_replace_callback( '/&#[Xx](0*[0-9A-Fa-f]{1,6});/', 'wp_kses_normalize_entities3', $content );
19621995
if ( 'xml' === $context ) {
19631996
$content = preg_replace_callback( '/&([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_xml_named_entities', $content );
19641997
} else {
19651998
$content = preg_replace_callback( '/&([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_named_entities', $content );
19661999
}
1967-
$content = preg_replace_callback( '/&#(0*[0-9]{1,7});/', 'wp_kses_normalize_entities2', $content );
1968-
$content = preg_replace_callback( '/&#[Xx](0*[0-9A-Fa-f]{1,6});/', 'wp_kses_normalize_entities3', $content );
19692000

19702001
return $content;
19712002
}

wp-includes/version.php

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
*
1717
* @global string $wp_version
1818
*/
19-
$wp_version = '6.9-alpha-60615';
19+
$wp_version = '6.9-alpha-60616';
2020

2121
/**
2222
* Holds the WordPress DB revision, increments when changes are made to the WordPress DB schema.

0 commit comments

Comments
 (0)