-
-
Notifications
You must be signed in to change notification settings - Fork 35
[DESIGN] Number selection design refinements #859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
8f57d63
6b27f32
dfcaa10
9bd4097
da9377b
17af553
a86acea
8f56bef
68641af
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,6 +1,6 @@ | ||||||
| # Selection on Numerical Values | ||||||
|
|
||||||
| Status: **Accepted** | ||||||
| Status: **Accepted** (moving back to **Proposed**) | ||||||
|
|
||||||
| <details> | ||||||
| <summary>Metadata</summary> | ||||||
|
|
@@ -53,6 +53,21 @@ Both JS and ICU PluralRules implementations provide for determining the plural c | |||||
| of a range based on its start and end values. | ||||||
| Range-based selectors are not initially considered here. | ||||||
|
|
||||||
| In <a href="https://github.com/unicode-org/message-format-wg/pull/842">PR #842</a> | ||||||
| @eemeli points out a number of gaps or infelicities in the current specification | ||||||
| and there was extensive discussion of how to address these gaps. | ||||||
|
|
||||||
| The `key` for exact numeric match in a variant has to be a string. | ||||||
| The format of such strings, therefore, has to be specified if messages are to be portable and interoperable. | ||||||
| In LDML45 Tech Preview we selected JSON's number serialization as a source for `key` values. | ||||||
| The JSON serialization is ambiguous, in that a given number value might be serialized validly in more than one way: | ||||||
| ``` | ||||||
| 123 | ||||||
| 123.0 | ||||||
| 1.23E2 | ||||||
| ... etc... | ||||||
| ``` | ||||||
|
|
||||||
| ## Use-Cases | ||||||
|
|
||||||
| As a user, I want to write messages that use the correct plural for | ||||||
|
|
@@ -75,6 +90,64 @@ either plural or ordinal selection in a single message. | |||||
| > * {{You have {$numRemaining} chances remaining (plural)}} | ||||||
| >``` | ||||||
|
|
||||||
| As a user, I want the selector to match the options specified: | ||||||
| ``` | ||||||
| .local $num = {123.456 :number maximumSignificantDigits=2 maximumFractionDigits=2 minimumFractionDigits=2} | ||||||
| .match {$num} | ||||||
| 120.00 {{This matches}} | ||||||
|
||||||
| 120 {{This does not match}} | ||||||
| 123.47 {{This does not match}} | ||||||
| 123.456 {{This does not match}} | ||||||
| 1.2E2 {{Does this match?}} | ||||||
| * {{ ... }} | ||||||
| ``` | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an accidentally good example of the difficulty of defining number selection that accounts for formatting options: const nf = new Intl.NumberFormat('en', {
maximumSignificantDigits: 2,
maximumFractionDigits: 2,
minimumFractionDigits: 2
})
nf.format(123.456) === '120'In other words, at least in JavaScript it would not make sense for the I think that defining the combined behaviour of specific number formatting options is, and should be kept, outside the scope of MF2. It is not our place to define how number formatting happens, or how the As far as I can tell, this leaves us with two realistic options:
I would prefer option 2 (hence my earlier PR), but I'd be fine with option 1 as well (effectively, what the spec currently does). I would not be fine with the excessive complexity required of the spec and implementations if we were to require matching to account for the formatting options in a predefined manner. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that the matching behavior does not have to match the formatting behavior. The JS result looks like a bug to me? Maybe it isn't actually a bug, but it looks like one, given how much effort was expended trying to get two fraction digits. The following produces const nf = new Intl.NumberFormat('en', {
maximumSignificantDigits: 2,
maximumFractionDigits: 2,
minimumFractionDigits: 2,
style: 'currency',
currency: 'USD',
currencyDisplay: 'symbol'
});
console.log(nf.format(123.456));The existence of this feature/bug does not mean that MF2's definition of numeric matching has to be the same as the formatting output. It might be inconvenient for the implementer in JS, but that doesn't make it the wrong choice. Do you disagree with the logic behind the example? Do you think that multiple keys should match in the example? Is there an example where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Really? You'd be fine with
It's not a bug; this result falls out of the combination of fraction and significant digits depending on the value of
While the implementation complexity does matter, I'm more concerned about the expectations of developers. In the very rare cases where a selection on an exact match on a value like 120 matters, I cannot believe that it makes sense to require a developer to be aware of the specific MF2 selection (but not formatting!) understanding of how the formatting options combine.
Yes. I do not think it's right for us to enforce a specific meaning for number formatting options in MF2, which is required to say that
With this example, if we want to explicitly define which variant is selected, then I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I expect that the formatting behavior will be locale affected while the selection behavior won't be. What's important is that the message author can look at the message (and only the message) and know how to specify the key for a given exact value. That knowledge, for default registry functions, should be transferable to any message/implementation.
Okay. (I think it is weird, but see that it flows from the implementation of ICU's newer number formatter, which bundles everything up inside of "precision"). Since I think that using significant digits to perform number "trimming" (as I do in the example) is probably undesirable vs. actually mutating the value passed in, we should probably focus on the other details.
Because of the significant digits stuff? Or because you don't think the fraction digits should have any effect? I'm fine with saying Although this is a better example:
I don't agree because it is suboptimal and because I'd have a hard time explaining it. For example, currency conversions often have many decimal places, but I don't want to show the extra precision available. Or I might want to turn the value's fractional part off for display. I shouldn't have to match There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. It should match because it needs to match the visible digits (whatever the numberSystem / decimal/grouping- separators. And it also goes for the integer case, as below There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree with this. But I think we find ourselves on different paths starting from the same premise. One consideration that I include in my thinking is that we cannot assume that MF2 authors will remember the details of how formatting options interact with exact variant keys during selection, if and when that behaviour differs from what happens during formatting. And different implementations' formatters will differ in their interpretations of options baskets.
Do we have an example message using non-integer number selection that we could be considering? I continue to be concerned that we're trying to add excessive complexity to solve for an extremely rare edge case .
Because defining exactly which of these matches (as opposed to
If Trying to support some formatting options during selection will not produce predictable results. Trying to support all formatting options during selection is impossible, because not all formatters work the same way. Supporting no formatting options during selection is predictable, and uniform across all implementations. Yes, it does require mutating the input value in some edge cases, but those cases are made predictable.
If we want to ensure that the selected key and the formatted value match, then we must leave exact selection up to each implementation, because different implementations will not always agree in how they format a given value and its bag of options. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I disagree, from the perspective that they already kind of do. The behavior difference isn't that great and is mostly confined to locale-based formatting details (grouping, shaping, separator characters, etc.). Most of these folks are technical enough to know that
How could an input variable have a formatting option pre-attached ( Note that the keys match use cases that the message author is trying to fulfill. Exact match keys need to "exactly" match the perceived value, which, as I think this thread shows, is influenced by formatting options.
This goes back to the transitivity discussion (still pending in the group). I think it's clear that some options are transitive and some not when it comes to formatting. I think the same can be said of selectors. I think that the behavior of a given selector is defined by that selector. We are only talking here about the number selectors in the default registry. For sure other functions might work differently.
Fractional selection is probably most common when working with currencies, percents, and unit values. I can write/unearth some examples later. In the meantime... I'm fearful that we're not capturing this good discussion in the design doc (the purpose of this PR) 🙈 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't think there's any issue about understanding what a key like
That's enabled by using "an implementation-defined type" as per message-format-wg/spec/registry.md Lines 340 to 341 in 132d9c3
const mf = new Intl.MessageFormat('en', '{$num :number maximumFractionDigits=2}')
const num = { valueOf: () => 123.456, options: { roundingMode: 'floor' } }
mf.format({ num }) // '123.45'This makes sense for options like
No, they don't. They need to predictably match, so that a developer doesn't need to refer to the MF2 docs every time when they try to use exact matching. When dealing with a corner-case message like one needing to match |
||||||
|
|
||||||
| Note that badly written keys just don't match, but we want users to be able to intuit whether a given set of keys will work or not. | ||||||
|
|
||||||
| ``` | ||||||
| .local $num = {123.456 :integer} | ||||||
| .match {$num} | ||||||
| 123.456 {{Should not match?}} | ||||||
| 123 {{Should match}} | ||||||
| 123.0 {{Should not match?}} | ||||||
| * {{ ... }} | ||||||
| ``` | ||||||
|
|
||||||
| There can be complications, which we might need to define. Consider: | ||||||
|
|
||||||
| ``` | ||||||
| .local $num = {123.002 :number maximumFractionDigits=1 minimumFractionDigits=0} | ||||||
| .match {$num} | ||||||
| 123.002 {{Should not match?}} | ||||||
| 123.0 {{Does minimumFractionDigits make this not match?}} | ||||||
| 123 {{Does minimumFractionDigits make this match?}} | ||||||
| * {{ ... }} | ||||||
| ``` | ||||||
|
|
||||||
| As an implementer, I am concerned about the cost of incorporating _options_ into the selector. | ||||||
| This might be accomplished by building a "second formatter". | ||||||
| Some implementations, such as ICU4J's, might use interfaces like `FormattedNumber` to feed the selector. | ||||||
| Implementations might also apply options by modifying the number value of the _operand_ | ||||||
| (or shadowing the options effect on the value) | ||||||
|
|
||||||
| As a user, I want to be able to perform exact match using arbitrary digit numeric types where they are available. | ||||||
| As an implementer, I do **not** want to be required to provide or implement arbitrary precision | ||||||
| numeric types not available in my platform. | ||||||
| Programming/runtime environments vary widely in support of these types. | ||||||
| MF2 should not prevent the implementation of e.g. `BigDecimal` or `BigInt` types | ||||||
| and permit their use in MF2 messages. | ||||||
| MF2 should not _require_ implementations to support such types where they do not exist. | ||||||
| The problem of numeric type precision, | ||||||
| which is implementation dependent, | ||||||
| should not affect how message `key` values are specified. | ||||||
|
|
||||||
| > For example: | ||||||
| >``` | ||||||
| >.local $num = {11111111111111.11111111111111 :number} | ||||||
| >.match {$num} | ||||||
| >11111111111111.11111111111111 {{This works on some implementations.}} | ||||||
| >* {{... but not on others? ...}} | ||||||
| >``` | ||||||
|
|
||||||
| ## Requirements | ||||||
|
|
||||||
|
|
@@ -460,3 +533,21 @@ and they _might_ converge on some overlap that users could safely use across pla | |||||
| #### Cons | ||||||
|
|
||||||
| - No guarantees about interoperability for a relatively core feature. | ||||||
|
|
||||||
| ## Alternatives Considered (`key` matching) | ||||||
|
|
||||||
| ### Standardize the Serialization Forms | ||||||
|
|
||||||
| Using the design above, remove the integer-only and no-sig-digits restrictions from LDML45 | ||||||
| and specify numeric matching by specifying the form of matching `key` values. | ||||||
| Comparison is as-if by string comparison of the serialized forms, just as in LDML45. | ||||||
|
||||||
|
|
||||||
| ### Compare numeric values | ||||||
|
|
||||||
| This is the design proposed in #842. | ||||||
|
|
||||||
| This modifies the key-match algorithm to use implementation-defined numeric value exact match: | ||||||
|
|
||||||
| > 1. Let `exact` be the numeric value represented by `key`. | ||||||
| > 1. If `value` and `exact` are numerically equal, then | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this results from the requirements.
In some cases the key has to be a string, in other cases it is enough to be a number.
So the whole section below is only one option: IF we consider the keys to be stings, then ...
The idea that the key can be a number sometimes is not considered.
But it would be natural to map "...foo {}..." and "...|foo| {}..." in syntax to strings, and "...123 {}..." and "...|123| {}..." in syntax to numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key has to be a string because the message is a string. The next line addresses this: if the key is a string, then the format of the string has to be clear so that it can be related to a number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how that one results from the other.
What says that keys and messages should be the same type?
And even if there is something, nothing stops us from changing it.