Skip to content

Commit 211db9b

Browse files
authored
Update UAX#29 text segmenter data rules to 16.0. (#6367)
Unicode 16.0.0 changes are property value only. So updating data files and test data.
1 parent dcfb305 commit 211db9b

29 files changed

+5858
-6730
lines changed

components/segmenter/tests/spec_test.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -353,6 +353,11 @@ fn run_grapheme_break_extra_test() {
353353
grapheme_break_test(include_str!("testdata/GraphemeBreakExtraTest.txt"));
354354
}
355355

356+
#[test]
357+
fn run_grapheme_break_random_test() {
358+
grapheme_break_test(include_str!("testdata/GraphemeBreakRandomTest.txt"));
359+
}
360+
356361
fn sentence_break_test(file: &'static str) {
357362
let test_iter = TestContentIterator::new(file);
358363
let segmenter = SentenceSegmenter::new(Default::default());

components/segmenter/tests/testdata/GraphemeBreakExtraTest.txt

Lines changed: 0 additions & 107 deletions
Large diffs are not rendered by default.

components/segmenter/tests/testdata/GraphemeBreakRandomTest.txt

Lines changed: 103 additions & 0 deletions
Large diffs are not rendered by default.

components/segmenter/tests/testdata/GraphemeBreakTest.txt

Lines changed: 185 additions & 279 deletions
Large diffs are not rendered by default.

components/segmenter/tests/testdata/SentenceBreakRandomTest.txt

Lines changed: 101 additions & 104 deletions
Large diffs are not rendered by default.

components/segmenter/tests/testdata/SentenceBreakTest.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# SentenceBreakTest-15.1.0.txt
2-
# Date: 2023-04-05, 20:41:29 GMT
3-
# © 2023 Unicode®, Inc.
1+
# SentenceBreakTest-16.0.0.txt
2+
# Date: 2024-04-30, 21:48:41 GMT
3+
# © 2024 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
5-
# For terms of use, see https://www.unicode.org/terms_of_use.html
5+
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
66
#
77
# Unicode Character Database
88
# For documentation, see https://www.unicode.org/reports/tr44/

components/segmenter/tests/testdata/WordBreakTest.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# WordBreakTest-15.1.0.txt
2-
# Date: 2023-03-31, 14:30:32 GMT
3-
# © 2023 Unicode®, Inc.
1+
# WordBreakTest-16.0.0.txt
2+
# Date: 2024-04-30, 21:48:43 GMT
3+
# © 2024 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
5-
# For terms of use, see https://www.unicode.org/terms_of_use.html
5+
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
66
#
77
# Unicode Character Database
88
# For documentation, see https://www.unicode.org/reports/tr44/

provider/data/segmenter/data/segmenter_break_grapheme_cluster_v1.rs.data

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

provider/data/segmenter/data/segmenter_break_sentence_v1.rs.data

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

provider/data/segmenter/data/segmenter_break_word_v1.rs.data

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)