Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 63 additions & 1 deletion exploration/bidi-usability.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,12 @@ Status: **Proposed**
<dl>
<dt>Contributors</dt>
<dd>@aphillips</dd>
<dd>@eemeli</dd>
<dt>First proposed</dt>
<dd>2024-03-27</dd>
<dt>Pull Requests</dt>
<dd>#754</dd>
<dd><a href="https://github.com/unicode-org/message-format-wg/pull/754">#754</a></dd>
<dd><a href="https://github.com/unicode-org/message-format-wg/pull/781">#781</a></dd>
</dl>
</details>

Expand Down Expand Up @@ -346,6 +348,66 @@ the results or debug what is wrong with their messages.
By contrast, if users insert too many or the wrong controls using the recommended design,
the _message_ would still be functional and would emit no undesired characters.


### Loose isolation

Apply bidi isolates in a slightly different way.
The main differences to the proposed solution are:
1. The open/close isolate characters are not syntactically required to be paired.
This avoids introducing parse errors for missing or required invisible characters,
which would lead to bad user experiences.
2. Rather than patching the `name` rule with an optional trailing LRM/RLM/ALM,
allow for its proper isolation.
Comment on lines +359 to +360
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't call what we did above "patching". What we allow above with the strongly directional marks is allow bidi users to include them (to make the string look okay in a normal text editor) the way they might normally do when editing text. The productions we used don't make these marks part of the token, so they don't affect processing.

Allowing isolation is a separate consideration.


Quoted patterns, quoted literals, and names may be isolated by LRI/RLI/FSI...PDI.
For names and quoted literals, the isolate characters are outside the body of the token,
but for quoted patterns, the isolates are in the middle of the `{{` and `}}` characters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"middle" could mean anywhere inside the pattern quotes.

Suggested change
but for quoted patterns, the isolates are in the middle of the `{{` and `}}` characters.
but for quoted patterns, the isolates are in between the `{` and `}` in the `{{` and `}}` sequences.

This avoids adding a lookahead requirement for detecting a `complex-message` start,
and differentiates a `quoted-pattern` from a `quoted` `key` in a `variant`.

Expressions and markup may be isolated by LRI...PDI immediately within the `{` and `}`.

An LRI is allowed immediately after a newline outside patterns and within expressions.
This is intended to allow left-to-right representation for "code"
even if it contains a newline followed by content
that could otherwise prompt the paragraph direction to be detected as right-to-left.

```abnf
name = [open-isolate] name-start *name-char [close-isolate]
quoted = [open-isolate] "|" *(quoted-char / quoted-escape) "|" [close-isolate]
quoted-pattern = "{" [open-isolate] "{" pattern "}" [close-isolate] "}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This puts the isolate inside the {{ and }}? Asking to be sure I'm reading this right. The above text didn't seem to mean this, although now I see your intention.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the intent: {\u2066{


literal-expression = "{" [LRI] [s] literal [s annotation] *(s attribute) [s] [close-isolate] "}"
variable-expression = "{" [LRI] [s] variable [s annotation] *(s attribute) [s] [close-isolate] "}"
annotation-expression = "{" [LRI] [s] annotation *(s attribute) [s] [close-isolate] "}"

markup = "{" [LRI] [s] "#" identifier *(s option) *(s attribute) [s] ["/"] [close-isolate] "}"
/ "{" [LRI] [s] "/" identifier *(s option) *(s attribute) [s] [close-isolate] "}"

s = 1*( SP / HTAB / CR / LF [LRI] / %x3000 )
LRI = %x2066
open-isolate = %x2066-2068
close-isolate = %x2069
```

Isolating rather than marking `name` helps ensure
that its directionality does not spill over to adjoining syntax.
For example, this allows for the proper rendering of the expression
```
{⁦:⁧אחת⁩:⁧שתיים⁩⁩}
```
where "אחת" is the `namespace` of the `identifier`.
Without `name` isolation, this would render as
```
{⁦:אחת:שתיים⁩}
```

In the syntax, it's much simpler to include the changes to `name` in that rule,
rather than patching every place where `name` is used.
Either way, the parsed value of the name should not include the open/close isolates,
just as they're not included in the parsed values of quoted literals or quoted patterns.


### Deeper Syntax Changes
We could alter the syntax to make it more "bidi robust",
such as by using strongly directional instead of neutrals.
Expand Down