Skip to content

Commit 13fe106

Browse files
committed
bugfix: char class casefold for certain chars
When a character is less than or equal to single byte size (0xff), yet it takes more than 1 byte in the current encoding, the case folding code incorrectly put it in bitset instead of code range. As a result, for utf8 encoding, casefold works incorrectly on characters in range \u0080 to \u00ff (latin1 supplement). Before fix: * `"\u00c2"` `[\u00e0-\u00e5]` returns false * `"\u00c2"` `[\u00e2]` returns false * `"\u00c2"` `\u00e2` returns true
1 parent 005b574 commit 13fe106

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/org/joni/ApplyCaseFold.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ public void apply(int from, int[]to, int length, Object o) {
4141

4242
if (Config.CASE_FOLD_IS_APPLIED_INSIDE_NEGATIVE_CCLASS) {
4343
if ((inCC && !cc.isNot()) || (!inCC && cc.isNot())) {
44-
if (enc.minLength() > 1 || to[0] >= BitSet.SINGLE_BYTE_SIZE) {
44+
if (enc.minLength() > 1 || to[0] >= BitSet.SINGLE_BYTE_SIZE || enc.codeToMbcLength(to[0]) > 1) {
4545
cc.addCodeRange(env, to[0], to[0]);
4646
} else {
4747
/* /(?i:[^A-C])/.match("a") ==> fail. */

0 commit comments

Comments
 (0)