Skip to content

Commit 3660793

Browse files
aphillipseemelisffc
authored
Describe number selection fully (#621)
* Describe number selection fully [NOT FINISHED] Update the design document as part of my action item in order to describe number selection completely per last week's call. * fix typo and some formatting * Apply suggestions from code review Co-authored-by: Eemeli Aro <[email protected]> * Address comments with a major rewrite * Tweak operand handling * Typos * Improve Czech example * Update number-selection.md * Address @macchiati's comments * Add text for determining exact literal match * Update number-selection.md * typo * Implement numeric literal selection using strings * Address comment about plural example * remove "boolean" * Apply suggestions from code review Co-authored-by: Eemeli Aro <[email protected]> * Fix the operands section - Make the format and notes match date/time - Make clear that passed literals that match `numeric-literal` work - Remove the excess normative statement about non-numerics * Update exploration/number-selection.md * Add significant digits to key matching * Fix `:ordinal` as a formatter * Changes based on 2024-02-14 call * Fix implementation defined types * remove fraction digits from `:integer` * Address comments, fix useGrouping * Fix `signDisplay` * Fix a typo * Remove signif digits from `:integer`; add note about no min/max defaults * Fix minimumIntegerDigits * Update exploration/number-selection.md Co-authored-by: Shane F. Carr <[email protected]> * Update number-selection.md * only use integer matching * Update exploration/number-selection.md Co-authored-by: Eemeli Aro <[email protected]> * Update exploration/number-selection.md Co-authored-by: Eemeli Aro <[email protected]> * Update exploration/number-selection.md Co-authored-by: Eemeli Aro <[email protected]> --------- Co-authored-by: Eemeli Aro <[email protected]> Co-authored-by: Shane F. Carr <[email protected]>
1 parent 1282e6d commit 3660793

File tree

1 file changed

+287
-25
lines changed

1 file changed

+287
-25
lines changed

exploration/number-selection.md

Lines changed: 287 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,12 @@ Status: **Accepted**
77
<dl>
88
<dt>Contributors</dt>
99
<dd>@eemeli</dd>
10+
<dd>@aphillips</dd>
1011
<dt>First proposed</dt>
1112
<dd>2023-09-06</dd>
1213
<dt>Pull Request</dt>
1314
<dd><a href="https://github.com/unicode-org/message-format-wg/pull/471">#471</a></dd>
15+
<dd><a href="https://github.com/unicode-org/message-format-wg/pull/621">#621</a></dd>
1416
</dl>
1517
</details>
1618

@@ -45,6 +47,7 @@ but ordinal rules use `one` (_1st_, _21st_, etc.), `few` (_2nd_, _22nd_, etc.),
4547
Additionally,
4648
MF1 provides `ChoiceFormat` selection based on a complex rule set
4749
(and which allows determining if a number falls into a specific range).
50+
This capability is not supported by the default functions of MF2.
4851

4952
Both JS and ICU PluralRules implementations provide for determining the plural category
5053
of a range based on its start and end values.
@@ -92,44 +95,303 @@ ICU MF1 messages using `plural` and `selectordinal` should be representable in M
9295
9396
## Proposed Design
9497
95-
Given that we already have a `:number`,
96-
it makes sense to add a `<matchSignature>` to it with an option
98+
### Number Selection
9799
98-
```xml
99-
<option name="select" values="plural ordinal exact" default="plural" />
100-
```
100+
Number selection has three modes:
101+
- `exact` selection matches the operand to explicit numeric keys exactly
102+
- `plural` selection matches the operand to explicit numeric keys exactly
103+
or to plural rule categories if there is no explicit match
104+
- `ordinal` selection matches the operand to explicit numeric keys exactly
105+
or to ordinal rule categories if there is no explicit match
106+
107+
108+
### Functions
109+
110+
The following functions use numeric selection:
111+
112+
The function `:number` is the default selector for numeric values.
113+
114+
The function `:integer` provides a reduced set of options for selecting
115+
and formatting numeric values as integers.
116+
117+
### Operands
118+
119+
The _operand_ of a number function is either an implementation-defined type or
120+
a literal that matches the `number-literal` production in the [ABNF](/main/spec/message.abnf).
121+
All other values produce a _Selection Error_ when evaluated for selection
122+
or a _Formatting Error_ when attempting to format the value.
123+
124+
> For example, in Java, any subclass of `java.lang.Number` plus the primitive
125+
> types (`byte`, `short`, `int`, `long`, `float`, `double`, etc.)
126+
> might be considered as the "implementation-defined numeric types".
127+
> Implementations in other programming languages would define different types
128+
> or classes according to their local needs.
129+
130+
> [!NOTE]
131+
> String values passed as variables in the _formatting context_'s
132+
> _input mapping_ can be formatted as numeric values as long as their
133+
> contents match the `number-literal` production in the [ABNF](/main/spec/message.abnf).
134+
>
135+
> For example, if the value of the variable `num` were the string
136+
> `-1234.567`, it would behave identically to the local
137+
> variable in this example:
138+
> ```
139+
> .local $example = {|-1234.567| :number}
140+
> {{{$num :number} == {$example}}}
141+
> ```
101142
102-
The default `plural` value is presumed to be the most common use case,
103-
and it affords the least bad fallback when used incorrectly:
104-
Using "plural" for "exact" still selects exactly matching cases,
105-
whereas using "exact" for "plural" will not select LDML category matches.
106-
This might not be noticeable in the source language,
143+
> [!NOTE]
144+
> Implementations are encouraged to provide support for compound types or data structures
145+
> that provide additional semantic meaning to the formatting of number-like values.
146+
> For example, in ICU4J, the type `com.ibm.icu.util.Measure` can be used to communicate
147+
> a value that include a unit
148+
> or the type `com.ibm.icu.util.CurrencyAmount` can be used to set the currency and related
149+
> options (such as the number of fraction digits).
150+
151+
152+
### Options
153+
154+
The following options and their values are required in the default registry to be available on the
155+
function `:number`:
156+
- `select`
157+
- `plural` (default)
158+
- `ordinal`
159+
- `exact`
160+
- `compactDisplay` // this option only has meaning when combined with the option `notation=compact`
161+
- `short` (default)
162+
- `long`
163+
- `notation`
164+
- `standard` (default)
165+
- `scientific`
166+
- `engineering`
167+
- `compact`
168+
- `numberingSystem`
169+
- valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier)
170+
(default is locale-specific)
171+
- `signDisplay`
172+
- `auto` (default)
173+
- `always`
174+
- `exceptZero`
175+
- `negative`
176+
- `never`
177+
- `style`
178+
- `decimal` (default)
179+
- `percent` (see [Percent Style](#percent-style) below)
180+
- `useGrouping`
181+
- `auto` (default)
182+
- `always`
183+
- `never`
184+
- `min2`
185+
- `minimumIntegerDigits`
186+
- (non-negative integer, default: `1`)
187+
-
188+
> [!NOTE]
189+
> The following options do not have default values because they are only to be used
190+
> as overrides for an existing locale-and-value dependent implementation-defined
191+
> default
192+
193+
- `minimumFractionDigits`
194+
- (non-negative integer)
195+
- `maximumFractionDigits`
196+
- (non-negative integer)
197+
- `minimumSignificantDigits`
198+
- (non-negative integer)
199+
- `maximumSignificantDigits`
200+
- (non-negative integer)
201+
202+
The following options and their values are required in the default registry to be available on the
203+
function `:integer`:
204+
- `select`
205+
- `plural` (default)
206+
- `ordinal`
207+
- `exact`
208+
- `numberingSystem`
209+
- valid [Unicode Number System Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeNumberSystemIdentifier)
210+
(default is locale-specific)
211+
- `signDisplay`
212+
- `auto` (default)
213+
- `always`
214+
- `exceptZero`
215+
- `negative`
216+
- `never`
217+
- `style`
218+
- `decimal` (default)
219+
- `percent` (see [Percent Style](#percent-style) below)
220+
- `useGrouping`
221+
- `auto` (default)
222+
- `true`
223+
- `false`
224+
- `min2`
225+
- `always`
226+
- `minimumIntegerDigits`
227+
- (non-negative integer, default: `1`)
228+
229+
> [!NOTE]
230+
> The following option does not have a default value because it is only to be used
231+
> as an override for an existing locale-and-value dependent implementation-defined
232+
> default
233+
234+
- `maximumSignificantDigits`
235+
- (non-negative integer)
236+
237+
> [!NOTE]
238+
> The following options or option values are being developed during the Technical Preview
239+
> period.
240+
241+
The following values for the option `style` are _not_ part of the default registry.
242+
Implementations SHOULD avoid creating options that conflict with these, but
243+
are encouraged to track development of these options during Tech Preview:
244+
- `currency`
245+
- `unit`
246+
247+
The following options are _not_ part of the default registry.
248+
Implementations SHOULD avoid creating options that conflict with these, but
249+
are encouraged to track development of these options during Tech Preview:
250+
- `currency`
251+
- valid [Unicode Currency Identifier](https://cldr-smoke.unicode.org/spec/main/ldml/tr35.html#UnicodeCurrencyIdentifier)
252+
(no default)
253+
- `currencyDisplay`
254+
- `symbol` (default)
255+
- `narrowSymbol`
256+
- `code`
257+
- `name`
258+
- `currencySign`
259+
- `accounting`
260+
- `standard` (default)
261+
- `unit`
262+
- (anything not empty)
263+
- `unitDisplay`
264+
- `long`
265+
- `short` (default)
266+
- `narrow`
267+
268+
### Default Value of `select` Option
269+
270+
The value `plural` is default for the option `select`
271+
because it is the most common use case for numeric selection.
272+
It can be used for exact value matches but also allows for the grammatical needs of other
273+
languages using CLDR's plural rules.
274+
This might not be noticeable in the source language (particularly English),
107275
but can cause problems in target locales that the original developer is not considering.
108276
109277
> For example, a naive developer might use a special message for the value `1` without
110278
> considering other locale's need for a `one` plural:
111279
>
112280
> ```
113281
> .match {$var}
114-
> [1] {{You have one last chance}}
115-
> [one] {{You have {$var} chance remaining}} // needed by languages such as Polish or Russian
116-
> [*] {{You have {$var} chances remaining}}
282+
> 1 {{You have one last chance}}
283+
> one {{You have {$var} chance remaining}} // needed by languages such as Polish or Russian
284+
> // such locales typically require other keywords
285+
> // such as two, few, many, and so forth
286+
> * {{You have {$var} chances remaining}}
117287
> ```
118288
119-
Additional options such as `minimumFractionDigits` and others already supported by `:number`
120-
should also be supported.
121289
122-
If PR [#532](https://github.com/unicode-org/message-format-wg/pull/532) is accepted,
123-
also add the following `<alias>` definitions to `<function name="number">`:
290+
### Percent Style
291+
292+
When implementing `style=percent`, the numeric value of the operand
293+
MUST be divided by 100 for the purposes of formatting.
294+
295+
### Selection
296+
297+
When implementing [`MatchSelectorKeys`](spec/formatting.md#resolve-preferences),
298+
numeric selectors perform as described below.
299+
300+
- Let `return_value` be a new empty list of strings.
301+
- Let `operand` be the resolved value of the _operand_.
302+
If the `operand` is not a number type, emit a _Selection Error_
303+
and return `return_value`.
304+
- Let `keys` be a list of strings containing keys to match.
305+
(Hint: this list is an argument to `MatchSelectorKeys`)
306+
- For each string `key` in `keys`:
307+
- If the value of `key` matches the production `number-literal`:
308+
- If the parsed value of `key` is an [exact match](#determining-exact-literal-match)
309+
of the value of the `operand`, then `key` matches the selector.
310+
Add `key` to the front of the `return_value` list.
311+
- Else, if the value of `key` is a keyword:
312+
- Let `keyword` be a string which is the result of [rule selection](#rule-selection).
313+
- If `keyword` equals `key`, then `key` matches the selector.
314+
Append `key` to the end of the `return_value` list.
315+
- Else, `key` is invalid;
316+
emit a _Selection Error_.
317+
Do not add `key` to `return_value`
318+
- Return `return_value`
319+
320+
### Plural/Ordinal Keywords
321+
The _plural/ordinal keywords_ are: `zero`, `one`, `two`, `few`, `many`, and
322+
`other`.
323+
324+
### Rule Selection
325+
326+
If the option `select` is set to `exact`, rule-based selection is not used.
327+
Return the empty string.
328+
329+
> [!NOTE]
330+
> Since keys cannot be the empty string in a numeric selector, returning the
331+
> empty string disables keyword selection
332+
333+
If the option `select` is set to `plural`, selection should be based on CLDR plural rule data
334+
of type `cardinal`. See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html)
335+
for examples.
336+
337+
If the option `select` is set to `ordinal`, selection should be based on CLDR plural rule data
338+
of type `ordinal`. See [charts](https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html)
339+
for examples.
340+
341+
Apply the rules defined by CLDR to the resolved value of the operand and the function options,
342+
and return the resulting keyword.
343+
If no rules match, return `other`.
344+
345+
> **Example.**
346+
> In CLDR 44, the Czech (`cs`) plural rule set can be found
347+
> [here](https://www.unicode.org/cldr/charts/44/supplemental/language_plural_rules.html#cs).
348+
>
349+
> A message in Czech might be:
350+
> ```
351+
> .match {$numDays :number}
352+
> one {{{$numDays} den}}
353+
> few {{{$numDays} dny}}
354+
> many {{{$numDays} dne}}
355+
> * {{{$numDays} dní}}
356+
> ```
357+
> Using the rules found above, the results of various `operand` values might look like:
358+
> | Operand value | Keyword | Formatted Message |
359+
> |---|---|---|
360+
> | 1 | `one` | 1 den |
361+
> | 2 | `few` | 2 dny |
362+
> | 5 | `other` | 5 dní |
363+
> | 22 | `few` | 22 dny |
364+
> | 27 | `other` | 27 dní |
365+
> | 2.4 | `many` | 2,4 dne |
366+
367+
368+
369+
### Determining Exact Literal Match
370+
371+
> [!IMPORTANT]
372+
> The exact behavior of exact literal match is only defined for non-zero-filled
373+
> integer values.
374+
> Annotations that use fraction digits or significant digits might work in specific
375+
> implementation-defined ways.
376+
> Users should avoid depending on these types of keys in message selection.
377+
378+
379+
Number literals in the MessageFormat 2 syntax use the
380+
[format defined for a JSON number](https://www.rfc-editor.org/rfc/rfc8259#section-6).
381+
The resolved value of an `operand` exactly matches a numeric literal `key`
382+
if, when the `operand` is serialized using the format for a JSON number
383+
the two strings are equal.
384+
385+
> [!NOTE]
386+
> Implementations are not expected to implement this exactly as written,
387+
> as there are clearly optimizations that can be applied.
388+
389+
> [!NOTE]
390+
> Only integer matching is required in the Technical Preview.
391+
> Feedback describing use cases for fractional and significant digits-based
392+
> selection would be helpful.
393+
Otherwise, users should avoid using matching with fractional numbers or significant digits.
124394
125-
```xml
126-
<alias name="plural" supports="match">
127-
<setOption name="select" value="plural"/>
128-
</alias>
129-
<alias name="ordinal" supports="match">
130-
<setOption name="select" value="ordinal"/>
131-
</alias>
132-
```
133395
134396
## Alternatives Considered
135397

0 commit comments

Comments
 (0)