-
-
Notifications
You must be signed in to change notification settings - Fork 35
Address name and literal equality #885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 3 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
89050b6
Address name and literal equality
aphillips 8d26f7f
Typo fix
aphillips 46c80bf
Add a note about not requiring implementations to actually normalize
aphillips c1e4982
Implement changes dicussed in 2024-09-16 call.
aphillips 981dd66
Merge branch 'main' into aphillips-name-equality
aphillips 20cbbe7
Update formatting.md to include keys in NFC
aphillips b5eec2a
Address comments
aphillips eb09a95
Update spec/syntax.md
aphillips 94e1246
Update spec/syntax.md
aphillips File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit excessive. While most name comparisons are internal to the spec, AFAIK the only literal comparison in the spec is for duplicate variant key lists, which tbh I'd prefer to be done with normalization.
All other literal value handling is done by functions, which we should not restrict from applying normalization in their internal processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I'd go a bit further in encouraging function specifiers to normalize when matching: and we make the standard functions do so. That is, something along the lines of:
An NFC comparison (aka Unicode canonically equivalent comparison) produces the same results as if each string value being compared were converted to the Unicode Normalization Form C (NFC). For example, with an NFC comparison against the literal |U\x{3308}|, the same result is obtained as if the literal were |\x{DC}|. For more examples, see the Unicode Standard.
When determining whether two variant key lists are duplicates, NFC comparison MUST be used for literals.
When a selector function evaluates matches to literal keys, the matches SHOULD use NFC comparison. Moreover, the implementation of the standard selector functions MUST use NFC comparison. Thus the standard :string selector function MUST match a string input parameter of "U\x{3308}" with the literal |\x{DC}|.
BTW: Some selector functions, such as the standard numeric selectors, only match literals with all ASCII characters. ASCII literals never change when converted to NFC, and there are only 3 non ASCII characters that change to ASCII. So selector functions whose literals don't include ";", "`", or "K" don't need to use NFC comparison; that includes our numeric selectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, do we agree that literals MAY be non-normalized/denormalized?
If a literal can be a non-normalized string, then we should define when two literals match inside MF2. Literal comparison is for duplicate key lists, but also for matching between the sorted results of a selector and the keys in the message (the sorting is done by a function, but not the matching after sorting). This text says nothing about what functions do or are allowed to do with (possibly not normalized) literal values. All it says is when MF2 considers two literals to be equal. I could add text allowing functions to have greater restriction on equality. @macchiati suggests requiring it for
:string.This is the opposite of what @eemeli is saying? If we allow normalization (but don't require it) we also allow the lack of it.
By not normalizing literals, we allow non-normalized sequences to be used in expressions, option values, or keys. This has positive impacts (for people who know what they're doing when working with combining marks or certain characters) and negative consequences (when people don't)
Why?
I understand the lack of illustrating a compelling use case here. Most of the time the sets of valid keys should be rational, sane, highly-normalized enumerated values and not just random text... in fact, I have a note cautioning people about this right 👇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is far, far more likely that people will make mistakes with non-NFC literals (or input) than the really, really obscure edge case of someone wanting to match non-normalized text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with pattern text, I agree that we should not require the normalization of literal values.
Regarding the latter, we say this:
message-format-wg/spec/formatting.md
Lines 511 to 514 in 80bec52
That MUST is requiring the processing to not normalise any of the values, even if it did so for its internal processing.
I'd be completely fine with us normalising the keys before they're passed to the function, or at least allowing an implementation to do so.
I'm aligned with @macchiati here. We don't need to normalize key values, but we should do their comparison when checking for duplicate key lists as if they were normalized.