Skip to content

Commit 79b5204

Browse files
committed
Use fast path in more cases when doing case folding with mb_convert_case
mbstring's Unicode case conversion is table-driven, using Minimal Perfect Hash tables. However, for small codepoint values, we bypass the hashtable lookup and just use hard-coded conversion logic (i.e. adding or subtracting 0x20 from the appropriate ASCII range). For upcasing and downcasing, we had already optimized the conditional which sends execution down this fast path, to use the fast path for as many codepoint values as possible. However, for case folding, this had not been done. This will give a small performance boost for case-folding Unicode text which includes non-breaking spaces, symbols like ¥ or ™, or accented Latin characters (used in many European languages).
1 parent 51b1aa1 commit 79b5204

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

ext/mbstring/php_unicode.c

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,9 @@ static unsigned php_unicode_totitle_raw(unsigned code, const mbfl_encoding *enc)
180180

181181
static unsigned php_unicode_tofold_raw(unsigned code, const mbfl_encoding *enc)
182182
{
183-
if (code < 0x80) {
183+
/* After the ASCII characters, the first codepoint with an special case-folded version
184+
* is 0xB5 (MICRO SIGN) */
185+
if (code < 0xB5) {
184186
/* Fast path for ASCII */
185187
if (code >= 0x41 && code <= 0x5A) {
186188
if (UNEXPECTED(enc == &mbfl_encoding_8859_9 && code == 0x49)) {

0 commit comments

Comments
 (0)