Skip to content

Commit d534eb0

Browse files
authored
CLDR-19216 v49 keyboard spec: clarify default backspace normalization (#5436)
1 parent 23eace3 commit d534eb0

File tree

2 files changed

+15
-2
lines changed

2 files changed

+15
-2
lines changed

docs/ldml/tr35-keyboards.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,7 @@ The LDML specification is divided into the following parts:
132132
* [Example Post-reorder transforms](#example-post-reorder-transforms)
133133
* [Reorder and Markers](#reorder-and-markers)
134134
* [Backspace Transforms](#backspace-transforms)
135+
* [Default Backspace Transform](#default-backspace-transform)
135136
* [Invariants](#invariants)
136137
* [Keyboard IDs](#keyboard-ids)
137138
* [Principles for Keyboard IDs](#principles-for-keyboard-ids)
@@ -3122,9 +3123,20 @@ The above example is simplified, and doesn't fully handle the interaction betwee
31223123
31233124
The first three transforms above delete various ligatures with a single keypress. The other transforms handle prebase characters. There are two in this Burmese keyboard. The transforms delete the characters preceding the prebase character up to base which gets replaced with the prebase filler string, which represents a null base. Finally the prebase filler string + prebase is deleted as a unit.
31243125

3125-
If no specified transform among all `transformGroup`s under the `<transforms type="backspace">` element matches, a default will be used instead — an implied final transform that simply deletes the codepoint at the end of the input context. This implied transform is effectively similar to the following code sample, even though the `*` operator is not actually allowed in `from=`. See the documentation for *Match a single Unicode codepoint* under [transform syntax](#regex-like-syntax) and [markers](#markers), above.
3126+
#### Default Backspace Transform
31263127

3127-
It is important that implementations do not by default delete more than one non-marker codepoint at a time, except in the case of emoji clusters. Note that implementations will vary in the emoji handling due to the iterative nature of successive Unicode releases. See [UTS#51 §2.4.2: Emoji Modifiers in Text](https://www.unicode.org/reports/tr51/#Emoji_Modifiers_in_Text)
3128+
If no specified transform among all `transformGroup`s under the `<transforms type="backspace">` element matches, a default will be used instead — an implied final transform that simply deletes a single codepoint at the end of the input context.
3129+
Because the context is in NFD, this default behavior may break apart what the user considers to be one character.
3130+
For example, if at the end of the context is the string ``, in NFD form, this will be the codepoints `D` (U+0044), `u` (U+0075) followed by `¨` (U+0308). Pressing backspace once will delete the U+0308 codepoint, leaving `Du` in the context. Pressing backspace again will leave only `D`.
3131+
3132+
This implied transform is effectively similar to the following code sample, even though the `*` operator is not actually allowed in `from=`.
3133+
See the documentation for *Match a single Unicode codepoint* under [transform syntax](#regex-like-syntax) and [markers](#markers), above.
3134+
3135+
It is important that implementations do not by default delete more than one non-marker codepoint at a time, except in the case of emoji clusters.
3136+
Note that implementations will vary in the emoji handling due to the iterative nature of successive Unicode releases. See [UTS#51 §2.4.2: Emoji Modifiers in Text](https://www.unicode.org/reports/tr51/#Emoji_Modifiers_in_Text)
3137+
3138+
Keyboard authors should almost always include backspace transforms in their keyboards, to ensure that backspacing has intuitive and expected behavior for users.
3139+
The default backspace transform described here may yield unexpected behavior for users.
31283140

31293141
```xml
31303142
<transforms type="backspace">

docs/ldml/tr35-modifications.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,7 @@ The rules have been converted to a “flat” format, which is easier for client
120120
* [`display`](tr35-keyboards.md#element-display): Noted that a key without output may be indicated by means of the `keyId=` attribute on the display.
121121
* [`layer`](tr35-keyboards.md#element-layer): Noted the use of the `modifiers=` attribute for hardware layouts being used as touch layouts.
122122
* References and links into the section concerning keyboard test data (which was removed prior to spec finalization) were removed.
123+
* Normalization for the default backspace transform was clarified, and authors were encouraged to add backspace transforms to avoid the default.
123124

124125
### Modifications section
125126

0 commit comments

Comments
 (0)