Skip to content

Commit 6d8029e

Browse files
authored
Backport unicode-org#4530 to the 1.4 branch (unicode-org#4537)
For releasing `icu_normalizer` 1.4.1.
1 parent 40b418c commit 6d8029e

File tree

5 files changed

+31
-5
lines changed

5 files changed

+31
-5
lines changed

CHANGELOG.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,15 @@
44
- [Remove icu_datagen's dep on `fractional`](https://github.com/unicode-org/icu4x/pull/4472)
55
66

7+
- Fix normalization of character whose decomposition contains more than one starter and ends with a non-starter followed by a non-starter
8+
with a lower Canonical Combining Class than the last character of the decomposition. (https://github.com/unicode-org/icu4x/pull/4530)
9+
10+
711
## icu4x 1.4 (Nov 16, 2023)
812

913
- General
1014
- MSRV is now 1.67
11-
15+
1216
- Components
1317
- Compiled data updated to CLDR 44 and ICU 74 (https://github.com/unicode-org/icu4x/pull/4245)
1418
- `icu_calendar`

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

components/normalizer/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ name = "icu_normalizer"
77
description = "API for normalizing text into Unicode Normalization Forms"
88
license-file = "LICENSE"
99

10-
version.workspace = true
10+
version = "1.4.1"
1111
rust-version.workspace = true
1212
authors.workspace = true
1313
edition.workspace = true

components/normalizer/src/lib.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -637,7 +637,7 @@ where
637637
i += 1;
638638
// Half-width kana and iota subscript don't occur in the tails
639639
// of these multicharacter decompositions.
640-
if decomposition_starts_with_non_starter(trie_value) {
640+
if !decomposition_starts_with_non_starter(trie_value) {
641641
combining_start = i;
642642
}
643643
}
@@ -676,7 +676,7 @@ where
676676
i += 1;
677677
// Half-width kana and iota subscript don't occur in the tails
678678
// of these multicharacter decompositions.
679-
if decomposition_starts_with_non_starter(trie_value) {
679+
if !decomposition_starts_with_non_starter(trie_value) {
680680
combining_start = i;
681681
}
682682
}

components/normalizer/tests/tests.rs

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1308,6 +1308,28 @@ fn test_utf16_basic() {
13081308
);
13091309
}
13101310

1311+
#[test]
1312+
fn test_accented_digraph() {
1313+
let normalizer: DecomposingNormalizer = DecomposingNormalizer::new_nfkd();
1314+
assert_eq!(
1315+
normalizer.normalize("\u{01C4}\u{0323}"),
1316+
"DZ\u{0323}\u{030C}"
1317+
);
1318+
assert_eq!(
1319+
normalizer.normalize("DZ\u{030C}\u{0323}"),
1320+
"DZ\u{0323}\u{030C}"
1321+
);
1322+
}
1323+
1324+
#[test]
1325+
fn test_ddd() {
1326+
let normalizer: DecomposingNormalizer = DecomposingNormalizer::new_nfd();
1327+
assert_eq!(
1328+
normalizer.normalize("\u{0DDD}\u{0334}"),
1329+
"\u{0DD9}\u{0DCF}\u{0334}\u{0DCA}"
1330+
);
1331+
}
1332+
13111333
#[test]
13121334
fn test_is_normalized() {
13131335
let nfd: DecomposingNormalizer = DecomposingNormalizer::new_nfd();

0 commit comments

Comments
 (0)