Skip to content

Conversation

@eggrobin
Copy link
Member

@eggrobin eggrobin commented Jan 9, 2026

Checklist

  • Required: Issue filed: ICU-23307
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-NNNNN Fix xyz"
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-NNNNN Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable
  • Approver: Feel free to merge on my behalf

ALLOW_MANY_COMMITS=meow

@eggrobin eggrobin marked this pull request as draft January 9, 2026 17:22
@eggrobin
Copy link
Member Author

eggrobin commented Jan 9, 2026

Somehow this is affecting Han-Latin/Names; Latin-Bopomofo, even though neither appears to contain string literals in UnicodeSets.

@eggrobin
Copy link
Member Author

eggrobin commented Jan 9, 2026

Latin-NumericPinyin.xml:

($tone) ( [i o n u {o n} {n g}]) → $2 &Pinyin-NumericPinyin($1);

Whyyy??!

@eggrobin
Copy link
Member Author

eggrobin commented Jan 9, 2026

de-ASCII also does this:

$AE = [Ä {A \u0308}];
$OE = [Ö {O \u0308}];
$UE = [Ü {U \u0308}];

[ä {a \u0308}] → ae;
[ö {o \u0308}] → oe;
[ü {u \u0308}] → ue;

and blt-fonipa-t-blt:

$DIGRAPHS = [{ꪹ  ꪸ} {ꪹ  ꪷ} {ꪹ ꪱ}];

These seem to be the only users of space-insensitivity of string literals.

@macchiati
Copy link
Member

macchiati commented Jan 9, 2026 via email

@eggrobin
Copy link
Member Author

eggrobin commented Jan 9, 2026

An artifact, that we just never noticed before the change

I added something to the CLDR-design agenda. (Not sure I can make it Monday, I expect this week to be chaotic—I am moving Friday—, but I’m sure you have enough context to discuss this topic.)

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/uniset_props.cpp is different
  • icu4c/source/test/intltest/usettest.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@eggrobin
Copy link
Member Author

@markusicu How do I deal with the transliterator breakages until CLDR fixes its rules? Do I just sit on this PR until then?

@macchiati
Copy link
Member

macchiati commented Jan 16, 2026 via email

@markusicu
Copy link
Member

How do I deal with the transliterator breakages until CLDR fixes its rules?

@macchiati please make sure that the translit rules get fixed.
@eggrobin for ICU, I would manually change the .txt files in ICU, assuming that the next CLDR integration will come in with equivalent fixes.

@macchiati
Copy link
Member

Agreed; Robin shouldn't be blocked on this.

@macchiati
Copy link
Member

With a quick regex \[[^\]]*\{[^\}]\s, I'm getting the following in transforms, and no other hits in /common, exemplars/, or keyboards/.

So that matches all of the ones Robin found; that will be easy to fix on the CLDR side

blt-fonipa-t-blt.xml
50: $DIGRAPHS = [{ꪹ  ꪸ} {ꪹ  ꪷ} {ꪹ ꪱ}]; 
de-ASCII.xml (6 matches)
13: $AE = [Ä {A \u0308}]; 
14: $OE = [Ö {O \u0308}]; 
15: $UE = [Ü {U \u0308}]; 
17: [ä {a \u0308}] → ae; 
18: [ö {o \u0308}] → oe; 
19: [ü {u \u0308}] → ue; 
Latin-NumericPinyin.xml
29: ($tone) ( [i o n u {o n} {n g}]) → $2 &Pinyin-NumericPinyin($1); 

@macchiati
Copy link
Member

The CLDR PR and ticket are at unicode-org/cldr#5297

@eggrobin
Copy link
Member Author

Thanks for doing that @macchiati. I was busy moving so didn’t have the time to look at this.

@eggrobin eggrobin marked this pull request as ready for review January 20, 2026 00:43
Copy link
Contributor

@richgillam richgillam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eggrobin eggrobin merged commit 4ebbe0c into unicode-org:main Jan 21, 2026
100 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants