unicode-org · aphillips · Sep 17, 2024 · Sep 13, 2024 · Sep 13, 2024 · Sep 13, 2024
diff --git a/spec/formatting.md b/spec/formatting.md
@@ -502,7 +502,7 @@ Next, using `res`, resolve the preferential order for all message keys:
       1. Let `key` be the `var` key at position `i`.
       1. If `key` is not the catch-all key `'*'`:
          1. Assert that `key` is a _literal_.
-         1. Let `ks` be the resolved value of `key`.
+         1. Let `ks` be the resolved value of `key` in Unicode Normalization Form C.
          1. Append `ks` as the last element of the list `keys`.
    1. Let `rv` be the resolved value at index `i` of `res`.
    1. Let `matches` be the result of calling the method MatchSelectorKeys(`rv`, `keys`)
@@ -516,6 +516,9 @@ The returned list MAY be empty.
 The most-preferred key is first,
 with each successive key appearing in order by decreasing preference.
 
+The resolved value of each _key_ MUST be in Unicode Normalization Form C ("NFC"),
+even if the _literal_ for the _key_ is not.
+
 If calling MatchSelectorKeys encounters any error,
 a _Bad Selector_ error is emitted
 and an empty list is returned.

diff --git a/spec/syntax.md b/spec/syntax.md
@@ -444,6 +444,14 @@ A _key_ can be either a _literal_ value or the "catch-all" key `*`.
 The **_<dfn>catch-all key</dfn>_** is a special key, represented by `*`,
 that matches all values for a given _selector_.
 
+The value of each _key_ MUST be treated as if it were in
+[Unicode Normalization Form C](https://unicode.org/reports/tr15/) ("NFC").
+When _keys_ are passed during _pattern selection_, the _key_ values MUST
+be normalized into NFC.
+Two _keys_ are considered equal if they are canonically equivalent strings,
+that is, if they consist of the same sequence of Unicode code points after
+Unicode Normalization Form C has been applied to both.
+
 ## Expressions
 
 An **_<dfn>expression</dfn>_** is a part of a _message_ that will be determined
@@ -690,6 +698,20 @@ except for U+0000 NULL or the surrogate code points U+D800 through U+DFFF.
 
 All code points are preserved.
 
+> [!IMPORTANT]
+> Most text, including that produced by common keyboards and input methods,
+> is already encoded in the canonical form known as
+> [Unicode Normalization Form C](https://unicode.org/reports/tr15) ("NFC").
+> A few languages, legacy character encoding conversions, or operating environments
+> can result in _literal_ values that are not in this form.
+> Some uses of _literals_ in MessageFormat,
+> notably as the value of _keys_,
+> apply NFC to the _literal_ value during processing or comparison.
+> While there is no requirement that the _literal_ value actually be entered
+> in a normalized form,
+> users are cautioned to employ the same character sequences
+> for equivalent values and, whenever possible, ensure _literals_ are in NFC.
+
 A **_<dfn>quoted literal</dfn>_** begins and ends with U+005E VERTICAL BAR `|`.
 The characters `\` and `|` within a _quoted literal_ MUST be
 escaped as `\\` and `\|`.
@@ -714,21 +736,6 @@ number-literal   = ["-"] (%x30 / (%x31-39 *DIGIT)) ["." 1*DIGIT] [%i"e" ["-" / "
 
 ### Names and Identifiers
 
-An **_<dfn>identifier</dfn>_** is a character sequence that
-identifies a _function_, _markup_, or _option_.
-Each _identifier_ consists of a _name_ optionally preceeded by
-a _namespace_. 
-When present, the _namespace_ is separated from the _name_ by a
-U+003A COLON `:`.
-Built-in _functions_ and their _options_ do not have a _namespace_ identifier.
-
-The _namespace_ `u` (U+0075 LATIN SMALL LETTER U)
-is reserved for future standardization.
-
-_Function_ _identifiers_ are prefixed with `:`.
-_Markup_ _identifiers_ are prefixed with `#` or `/`.
-_Option_ _identifiers_ have no prefix.
-
 A **_<dfn>name</dfn>_** is a character sequence used in an _identifier_ 
 or as the name for a _variable_
 or the value of an _unquoted literal_.
@@ -740,6 +747,20 @@ when matching _name_ or _identifier_ strings or _unquoted literal_ values.
 
 _Variable_ _names_ are prefixed with `$`.
 
+Two _names_ are considered equal if they are canonically equivalent strings,
+that is, if they consist of the same sequence of Unicode code points after
+[Unicode Normalization Form C](https://unicode.org/reports/tr15/) ("NFC")
+has been applied to both.
+
+> [!NOTE]
+> Implementations are not required to normalize all _names_.
+> Comparisons of _name_ values only need be done "as-if" normalization
+> has occured.
+> Since most text in the wild is already in NFC
+> and since checking for NFC is fast and efficient,
+> implementations can often substitute checking for actually applying normalization
+> to _name_ values.
+
 Valid content for _names_ is based on <cite>Namespaces in XML 1.0</cite>'s 
 [NCName](https://www.w3.org/TR/xml-names/#NT-NCName).
 This is different from XML's [Name](https://www.w3.org/TR/xml/#NT-Name)
@@ -751,6 +772,21 @@ Otherwise, the set of characters allowed in a _name_ is large.
 > Such variables cannot be referenced in a _message_,
 > but are not otherwise errors.
 
+An **_<dfn>identifier</dfn>_** is a character sequence that
+identifies a _function_, _markup_, or _option_.
+Each _identifier_ consists of a _name_ optionally preceeded by
+a _namespace_. 
+When present, the _namespace_ is separated from the _name_ by a
+U+003A COLON `:`.
+Built-in _functions_ and their _options_ do not have a _namespace_ identifier.
+
+The _namespace_ `u` (U+0075 LATIN SMALL LETTER U)
+is reserved for future standardization.
+
+_Function_ _identifiers_ are prefixed with `:`.
+_Markup_ _identifiers_ are prefixed with `#` or `/`.
+_Option_ _identifiers_ have no prefix.
+
 Examples:
 > A variable:
 >```