You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- The general proposal here is to use the sequence `U+FFFF U+XXXX` to represent marker #XXXX (starting with `U+0001`)
34
+
- Keyman already uses `UC_SENTINEL``U+FFFF` (noncharacter), with `CODE_DEADKEY` (0x0008)
35
+
- The general proposal here is to use the sequence `U+FFFF U+0008 U+XXXX` to represent marker #XXXX (starting with `U+0001`)
36
36
-`U+FFFF` cannot otherwise occur in text, so it is unique
37
-
-`U+FFFF U+FFFF` to indicate 'any marker' corresponds to `\m{.}`
38
-
- This scheme allows for 65,534 (0xFFFE) unique markers, from `U+FFFF U+0001` through `U+FFFF U+FFFE`
37
+
-`U+FFFF U+0008 U+FFFE` to indicate 'any marker' corresponds to `\m{.}`
38
+
- This scheme allows for 65,533 (0xFFFD) unique markers, from `U+FFFF U+0008 U+0001` through `U+FFFF U+0008 U+FFFD`
39
39
40
40
## Terminology
41
41
- A marker's "number" is its position in the `markers` list, starting at index 1 (U+0001) being the first element in that list.
@@ -55,20 +55,20 @@ Note that this is different from other 0-based indices in KMX+. If there are thr
55
55
### Other sections
56
56
57
57
-`string value='\m{…}'` will simply store `\m{…}`, for application when expanded as with other variables.
58
-
- Other emitters, such as `key`, `transform` will include the string `U+FFFF U+XXXX` where XXXX corresponds to the marker's 1-based number.
59
-
- Transforms will need to match against the marker or markers desired, so may need to emit sequences such as `(?:\uFFFF\u0123)` meaning a match to marker #0x0123
60
-
- matching `\m{.}` may then expand to `(?:\uFFFF.)`
58
+
- Other emitters, such as `key`, `transform` will include the string `U+FFFF U+0008 U+XXXX` where XXXX corresponds to the marker's 1-based number.
59
+
- Transforms will need to match against the marker or markers desired, so may need to emit sequences such as `(?:\uFFFF\u0008\u0123)` meaning a match to marker #0x0123
60
+
- matching `\m{.}` may then expand to `(?:\uFFFF\u0008.)` (match a single codepoint after `UC_SENTINEL CODE_DEADKEY`)
61
61
62
62
## Binary (.kmx plus)
63
63
64
64
- The `vars.markers` is a pointer into the `list` section with a list (binary order) of the marker names
65
-
- Other strings will be in `U+FFFF U+0123` form etc. as if it was in the original text stream as such.
65
+
- Other strings will be in `U+FFFF U+0008 U+0123` form etc. as if it was in the original text stream as such.
66
66
67
67
## Implementation (core)
68
68
69
-
- Core needs to recognize `U+FFFF …` sequences and convert them to markers in the context stream, with `state->context().push_marker(marker_number)`
70
-
- For normal processing, Core does _not_ need to correlate the marker _number_ with a marker _id_, although this would be helpful for a debugging or tracing facility. I.e. `U+FFFF U+0123` corresponding to entry 0x0123 in the `vars.markers` -> `list` table.
71
-
- Core needs to remove `U+FFFF …` sequences before they are passed to the OS.
69
+
- Core needs to recognize `U+FFFF U+0008 …` sequences and convert them to markers in the context stream, with `state->context().push_marker(marker_number)`
70
+
- For normal processing, Core does _not_ need to correlate the marker _number_ with a marker _id_, although this would be helpful for a debugging or tracing facility. I.e. `U+FFFF U+0008 U+0123` corresponding to entry 0x0123 in the `vars.markers` -> `list` table.
71
+
- Core needs to remove `U+FFFF U+0008 …` sequences before they are passed to the OS.
72
72
73
73
- Transform processing needs to recognize these markers in the context stream and pass them to the transforms appropriately.
74
74
- User-defined backspace processing `<transformtype="backspace"/> may specifically operate on backspaces in the context stream, just as with other transform processing.
0 commit comments