Skip to content

Comments

Implement the simplified pattern selection algorithm in the spec (#898)#1080

Merged
aphillips merged 31 commits intounicode-org:mainfrom
catamorphism:alternative-selection
Jun 23, 2025
Merged

Implement the simplified pattern selection algorithm in the spec (#898)#1080
aphillips merged 31 commits intounicode-org:mainfrom
catamorphism:alternative-selection

Conversation

@catamorphism
Copy link
Collaborator

Implement Mark Davis's suggested pattern selection algorithm in the spec. Update the "resolved values" example to use the match() and compare() methods instead of selectKeys().
Rewrite the numeric and :string selector specs accordingly.

@catamorphism catamorphism force-pushed the alternative-selection branch 6 times, most recently from 64da247 to a166fef Compare June 10, 2025 22:39
…code-org#898)

Implement Mark Davis's suggested pattern selection algorithm in the spec.
Update the "resolved values" example to use the `match()` and
`compare()` methods instead of `selectKeys()`.
Rewrite the numeric and `:string` selector specs accordingly.
@catamorphism catamorphism force-pushed the alternative-selection branch from a166fef to eff3bac Compare June 10, 2025 22:42
@catamorphism catamorphism marked this pull request as ready for review June 10, 2025 22:45
@catamorphism catamorphism requested review from aphillips, eemeli, macchiati and mihnita and removed request for aphillips June 10, 2025 22:45
Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this preliminary review, and it's looking good.

Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small changes can support BEST, which can significantly improve performance.

1. Set `bestVariant` to `var`.
1. Else:
1. Let `bestVariantKeys` be the keys of `bestVariant`.
1. If SelectorsCompare(res, keys, bestVariantKeys) is `BETTER`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend adding the BEST value. It can be much more efficient, because it can return a result immediately in many cases.

The modification here is

Suggested change
1. If SelectorsCompare(res, keys, bestVariantKeys) is `BETTER`:
1. If SelectorsCompare(res, keys, bestVariantKeys) is `BEST`, select the pattern of `var` and return
2. If SelectorsCompare(res, keys, bestVariantKeys) is `BETTER`:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eemeli's preference back in October was to not include the optimization in the spec (see #898 (comment) ), to keep it as simple as possible. I think I have the same preference.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is an optimization that we should leave out of the spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to discuss this further.

My main concern is that if we specify that a MF can only interpret the selection function return values of {WORSE, SAME, BETTER}, then it makes it difficult to upgrade MF and functions in the future if we do add a BEST.

And BEST can give an improvement, since you don't need to scan any further vars once you hit a BEST (for a 1 key v. message), or BEST,BEST for a 2-key, etc.)

If you take Arabic plurals, for example, the very common cases of 0 and 1 are BEST, and can return immediately when hit.

0
1
zero
one
two
few
many
*

Or with string match, same thing. With the following list, a match for blueberry can stop immediately

apple
banana
blueberry
...
zitherfruit

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, as a committee I think we've determined that performance is not important for selection. I mean, if we did care about it, we'd be considering the variant order to be a validation issue and using first-match selection, which would automatically terminate on the first match and skip the compare stage altogether.

Copy link
Member

@aphillips aphillips Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't characterize it was "not caring about performance". I would instead say that we prioritized producing the best matching outcome for a given set of keys and given a diversity of potential matching strategies by selectors.

I observe that this editorial effort is not addressing any of our group's pressing needs for v48. I agree that simplifying the example is probably worthwhile, but we are not required to optimize the example and we might go so far as to adopt a deoptimized example algorithm if such an example illustrated matching performance more clearly. Perhaps a discussion of performance and the potential for BEST would make a good article for our beautiful website?

1. Let `k2` be the _resolved value_ of `key2` in Unicode Normalization Form C [\[UAX#15\]](https://www.unicode.org/reports/tr15).
1. Let `sel` be the `i`th element of `selectors`.
1. Set `result` to Compare(sel, k1, k2).
1. If `result` is `SAME`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference here is that if the result is BEST, we increment bestCount (initially 0). In the final return, if bestCount == number_of_keys, then we return BEST.

catamorphism and others added 2 commits June 17, 2025 14:16
Co-authored-by: Eemeli Aro <eemeli@gmail.com>
Copy link
Collaborator

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is getting pretty close to being ready.

catamorphism and others added 9 commits June 19, 2025 14:09
Co-authored-by: Eemeli Aro <eemeli@gmail.com>
Co-authored-by: Eemeli Aro <eemeli@gmail.com>
Co-authored-by: Eemeli Aro <eemeli@gmail.com>
Co-authored-by: Eemeli Aro <eemeli@gmail.com>
Co-authored-by: Eemeli Aro <eemeli@gmail.com>
Co-authored-by: Eemeli Aro <eemeli@gmail.com>
Co-authored-by: Eemeli Aro <eemeli@gmail.com>
Copy link
Collaborator

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are the last changes?

catamorphism and others added 3 commits June 23, 2025 09:39
Co-authored-by: Eemeli Aro <eemeli@gmail.com>
Co-authored-by: Eemeli Aro <eemeli@gmail.com>
Co-authored-by: Eemeli Aro <eemeli@gmail.com>
@aphillips aphillips added the editorial Issue is non-normative label Jun 23, 2025
@aphillips aphillips merged commit 73fc2cf into unicode-org:main Jun 23, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

editorial Issue is non-normative

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants