Skip to content

CLDR-10024 Add tests for RBNF#5458

Open
grhoten wants to merge 8 commits intounicode-org:mainfrom
grhoten:10024
Open

CLDR-10024 Add tests for RBNF#5458
grhoten wants to merge 8 commits intounicode-org:mainfrom
grhoten:10024

Conversation

@grhoten
Copy link
Member

@grhoten grhoten commented Mar 11, 2026

CLDR-10024

We're adding 2 types of new tests. This is a lot easier to do with the flattened RBNF rules.

  • One test involves conformance tests. These were generated from the existing rules. I didn't check them for accuracy.
  • One test involves roundtripping every public ruleset in every language with a set of values.

The current set of tests depend on several other open pull requests being merged before merging these changes. That includes important fixes for Serbian (#5456), and relevant new rules.

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

%spellout-cardinal-masculine:
-x: ناقص >>;
x.x: <%%spellout-numbering-m< فاصل >> ;
x.x: <%%spellout-numbering-m< فاصل >>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace typo getting fixed.

1000000000000000000: =#,##0=;
%spellout-ordinal-neuter:
-x: μείον >>;
x.x: =0.#=;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exact rule doesn't matter. It just needs to sort of roundtrip.

0: =%spellout-numbering=;
%spellout-construct-masculine:
-x: מינוס >>;
x.x: << נקודה >>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! The rule was missing.

1000000: <%spellout-cardinal-masculine< $(cardinal,one{milijun}few{milijuna}other{milijuna})$[ >>];
1000000000: <%spellout-cardinal-feminine< $(cardinal,one{milijarda}few{milijarde}other{milijardi})$[ >>];
1000000000000: <%spellout-cardinal-masculine< $(cardinal,one{bilijun}few{bilijuna}other{bilijuna})$[ >>];
1000000000000000: <%spellout-cardinal-feminine< $(cardinal,one{bilijarda}few{bilijarde}other{bilijardi})$[ >>];
Copy link
Member Author

@grhoten grhoten Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Known roundtrip issue that is fixed by switching to cardinal plural rules.

<rulesetGrouping type="SpelloutRules">
<rbnfRules><![CDATA[
%spellout-numbering-year-latn:
-x: マイナス>>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an important rule, but it's needed for roundtripping.

1000000000: << tỷ[ >%%after-hundred>];
1000000000000000000: =#,##0=;
%spellout-ordinal:
-x: âm >>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an important rule, but it's needed for roundtripping.

Comment on lines +195 to +201
ROUNDTRIP_VALUES.put(
"SpelloutRules",
new Number[] {
-1L, 0L, 0.2, 1L, 1.1, 2L, 3L, 10L, 99L, 100L, 101L, 999L, 1000L, 1001L,
1999L, 2000L, 2001L, 2100L, 2200L, 10000L, 20000L, 100000L, 200000L, 1000000L,
2000000L
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are values that we're roundtripping for the spellout rules.

private static final Set<String> ALIASES =
new TreeSet<>(Arrays.asList("nb", "en_001"));

private static final Number[] ROUNDTRIP_LONG_VALUES = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are values that we're roundtripping for the non-spellout rules. Those are normally positive integers.

@AEApple AEApple requested review from echeran and sffc March 11, 2026 23:57
30: aduasa[->%%spellout-cardinal-tens>];
40: adu<<[->%%spellout-cardinal-tens>];
100: ­ɔha[-na-­>>];
100: ɔha[-na->>];
Copy link
Member Author

@grhoten grhoten Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary soft hyphens are here. I'm not sure why the tests sometimes failed on this locale, but also they shouldn't be present in this scenario. I was using a slightly different version of ICU with these tests.

return type + ";" + ruleSetName + ";" + numStr + ";" + formatted;
}

public static void main(String[] args) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need to seed the conformance tests, just run this class, and it will automatically generate some sample conformance tests.

1001: << миң[>%%and>];
2000: << миң[>%%and>];
100000: << миң[>%%commas>];
100000/1000: << миң[>%%commas>];
Copy link
Member Author

@grhoten grhoten Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! The original syntax probably wasn't intended, since the word is the same as the thousand. Now 1000 and 100000 don't conflict on the value.

Comment on lines -155 to +134
1000000000000000000: =#,##0='inci;
1000000000000001: <%spellout-numbering< квадриллион[ >>|унчу];
2000000000000000: <%spellout-numbering< квадриллион[ >>|унчу];
1000000000000000000: =#,##0='инчи;
Copy link
Member Author

@grhoten grhoten Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Wrong script. It now matches the digits-ordinal ruleset. Also modernizing the syntax magically fixed the roundtripping issue.

Comment on lines -38 to -45
100: <%%spellout-cardinal-large<ratus[ >>];
1000: <%%spellout-cardinal-large<rebu[ >>];
1000000: <%%spellout-cardinal-large<juta[ >>];
1000000000: <%%spellout-cardinal-large<miliar[ >>];
100: saratus[ >>];
200: << ratus[ >>];
1000: sarebu[ >>];
2000: << rebu[ >>];
1000000: sajuta[ >>];
2000000: << juta[ >>];
1000000000: samiliar[ >>];
2000000000: << miliar[ >>];
1000000000000: =#,##0=;
%%spellout-cardinal-large:
1: sa;
2: =%spellout-cardinal= ;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra space at the end was confusing the ICU parsing for some reason. Making it structured more like the other languages fixed the issue.

Comment on lines +221 to +223
private static final Set<String> KNOWN_BROKEN_LOCALES =
new TreeSet<>(Arrays.asList("ga", "lt"));
private static final Set<String> ALIASES = new TreeSet<>(Arrays.asList("nb", "en_001"));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The known broken locales have roundtrip issues. We will just warn about them for now instead of failing on them.

We skip over the known aliases where we are acknowledging their existence, but we don't really care that they're empty.

Comment on lines -180 to +161
100: st[ >>];
200: dvest[ >>];
300: trist[ >>];
400: četrist[ >>];
500: petst[ >>];
600: šest[ >>];
700: sedamst[ >>];
800: osamst[ >>];
900: devetst[ >>];
100: stot[ >>];
200: dvestot[ >>];
300: tristot[ >>];
400: četristot[ >>];
500: petstot[ >>];
600: šestot[ >>];
700: sedamstot[ >>];
800: osamstot[ >>];
900: devetstot[ >>];
Copy link
Member Author

@grhoten grhoten Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big ol' truncation typo. Number Format Tester confirms that these were wrong.

This source reference confirms the correct way to spell 3 examples.
https://www.lets-learn.eu/croatian/guide/ordinal-numbers-in-croatian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant