Skip to content

Commit 7cdea8e

Browse files
eemeliaphillips
andauthored
Simplify source bidi isolation rules (#781)
* Simplify source bidi isolation rules * Refactor as added alternative * Apply suggestions from code review Co-authored-by: Addison Phillips <[email protected]> --------- Co-authored-by: Addison Phillips <[email protected]>
1 parent b45b02e commit 7cdea8e

File tree

1 file changed

+63
-1
lines changed

1 file changed

+63
-1
lines changed

exploration/bidi-usability.md

Lines changed: 63 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,12 @@ Status: **Proposed**
77
<dl>
88
<dt>Contributors</dt>
99
<dd>@aphillips</dd>
10+
<dd>@eemeli</dd>
1011
<dt>First proposed</dt>
1112
<dd>2024-03-27</dd>
1213
<dt>Pull Requests</dt>
13-
<dd>#754</dd>
14+
<dd><a href="https://github.com/unicode-org/message-format-wg/pull/754">#754</a></dd>
15+
<dd><a href="https://github.com/unicode-org/message-format-wg/pull/781">#781</a></dd>
1416
</dl>
1517
</details>
1618

@@ -346,6 +348,66 @@ the results or debug what is wrong with their messages.
346348
By contrast, if users insert too many or the wrong controls using the recommended design,
347349
the _message_ would still be functional and would emit no undesired characters.
348350

351+
352+
### Loose isolation
353+
354+
Apply bidi isolates in a slightly different way.
355+
The main differences to the proposed solution are:
356+
1. The open/close isolate characters are not syntactically required to be paired.
357+
This avoids introducing parse errors for missing or required invisible characters,
358+
which would lead to bad user experiences.
359+
2. Rather than patching the `name` rule with an optional trailing LRM/RLM/ALM,
360+
allow for its proper isolation.
361+
362+
Quoted patterns, quoted literals, and names may be isolated by LRI/RLI/FSI...PDI.
363+
For names and quoted literals, the isolate characters are outside the body of the token,
364+
but for quoted patterns, the isolates are in the middle of the `{{` and `}}` characters.
365+
This avoids adding a lookahead requirement for detecting a `complex-message` start,
366+
and differentiates a `quoted-pattern` from a `quoted` `key` in a `variant`.
367+
368+
Expressions and markup may be isolated by LRI...PDI immediately within the `{` and `}`.
369+
370+
An LRI is allowed immediately after a newline outside patterns and within expressions.
371+
This is intended to allow left-to-right representation for "code"
372+
even if it contains a newline followed by content
373+
that could otherwise prompt the paragraph direction to be detected as right-to-left.
374+
375+
```abnf
376+
name = [open-isolate] name-start *name-char [close-isolate]
377+
quoted = [open-isolate] "|" *(quoted-char / quoted-escape) "|" [close-isolate]
378+
quoted-pattern = "{" [open-isolate] "{" pattern "}" [close-isolate] "}"
379+
380+
literal-expression = "{" [LRI] [s] literal [s annotation] *(s attribute) [s] [close-isolate] "}"
381+
variable-expression = "{" [LRI] [s] variable [s annotation] *(s attribute) [s] [close-isolate] "}"
382+
annotation-expression = "{" [LRI] [s] annotation *(s attribute) [s] [close-isolate] "}"
383+
384+
markup = "{" [LRI] [s] "#" identifier *(s option) *(s attribute) [s] ["/"] [close-isolate] "}"
385+
/ "{" [LRI] [s] "/" identifier *(s option) *(s attribute) [s] [close-isolate] "}"
386+
387+
s = 1*( SP / HTAB / CR / LF [LRI] / %x3000 )
388+
LRI = %x2066
389+
open-isolate = %x2066-2068
390+
close-isolate = %x2069
391+
```
392+
393+
Isolating rather than marking `name` helps ensure
394+
that its directionality does not spill over to adjoining syntax.
395+
For example, this allows for the proper rendering of the expression
396+
```
397+
{⁦:⁧אחת⁩:⁧שתיים⁩⁩}
398+
```
399+
where "אחת" is the `namespace` of the `identifier`.
400+
Without `name` isolation, this would render as
401+
```
402+
{⁦:אחת:שתיים⁩}
403+
```
404+
405+
In the syntax, it's much simpler to include the changes to `name` in that rule,
406+
rather than patching every place where `name` is used.
407+
Either way, the parsed value of the name should not include the open/close isolates,
408+
just as they're not included in the parsed values of quoted literals or quoted patterns.
409+
410+
349411
### Deeper Syntax Changes
350412
We could alter the syntax to make it more "bidi robust",
351413
such as by using strongly directional instead of neutrals.

0 commit comments

Comments
 (0)