Disallow whitespace as the first character of a reserved-body in a reserved-statement. #731

bhaible · 2024-03-15T16:00:26Z

In the 'reserved-statement' nonterminal, there is an ambiguity if there is more than one whitespace character between the 'reserved-keyword' and the first non-whitespace character of the 'reserved-body', because these whitespace characters can be seen as part of the 's' nonterminal or as part of the 'reserved-body' nonterminal.

According to the principles explained in #725 and the proposed resolution of #721, it is not desired that a 'reserved-body' starts with a whitespace character; rather, such a whitespace character is meant to be interpreted as part of the preceding 's' nonterminal.

Test case:

.regex   /foo/{xyz}{{hello}}

This patch removes this ambiguity, by disallowing whitespace as the first character of a 'reserved-body' in a reserved-statement.

It thus fixes the first part of #721.

Details:

In the other occurrences of 'resolved-body' as well (in a 'reserved-annotation' or 'private-use-annotation') the leading whitespace is separated as well. This has no influence on the set of inputs that the 'reserved-annotation' and 'private-use-annotation' nonterminals can match, but highlights that the parser should better trim off this leading whitespace in these places before entering the 'resolved-body' into the data model.
Two nonterminals 'reserved-body-start' and 'resolved-body-part' are introduced, each referenced once. The purpose is clarity and to follow the common *-start / *-part idiom.

eemeli

If we're going to encode in the syntax that the value of reserved-body does not include its leading whitespace, then this seems like a better way of doing so than #730.

Approving, but note my inline suggested simplification, which ought to be applied syntax.md as well.

This does introduce one concern that should be noted:

If someone introduces private syntax {&foo} for which they do not allow for space between the & and foo, a standard MF2 processor, when encountering a privately invalid expression {& foo}, could reserialize it as {&foo}, a privately valid expression.

Similarly, this change would effectively disallow private and future syntax from requiring a space immediately after the initial sigil, as an MF2.0 processor would likely dismiss the space as insignificant.

I'm okay with accepting these limitations.

eemeli · 2024-03-16T10:01:27Z

spec/message.abnf

+reserved-body             = reserved-body-start *reserved-body-part
+reserved-body-start       = reserved-char / reserved-escape / quoted
+reserved-body-part        = [s] 1*(reserved-char / reserved-escape / quoted)


This can be simplified:

Suggested change

reserved-body = reserved-body-start *reserved-body-part

reserved-body-start = reserved-char / reserved-escape / quoted

reserved-body-part = [s] 1*(reserved-char / reserved-escape / quoted)

reserved-body = reserved-body-part *([s] reserved-body-part)

reserved-body-part = reserved-char / reserved-escape / quoted

Thanks for the suggestion. I like it, and I'll apply it if there is consensus. (I had introduced the reserved-body-start nonterminal following @aphillips comment #730 (comment) .)

Let's apply this here and below.

catamorphism

The grammar change looks good. It took me ages to understand what this is doing; maybe I'm just slow, but I think it might help future readers to add a note in syntax.md as I suggested.

catamorphism · 2024-03-18T22:01:08Z

spec/syntax.md


 ```abnf
-reserved-annotation       = reserved-annotation-start reserved-body
+reserved-annotation       = reserved-annotation-start [[s] reserved-body]


I think it's worth adding some prose here explaining why the grammar is structured this way. It wasn't obvious immediately that it had to be, so it won't be obvious to the reader either. Including the example would be good.

I also think a note should be added to the "Reserved Statements" section of the same doc that explains what the limitations are of reserved-statement that arise because of this change. You could use your example from #721.

Let's use a separate PR to do any text changes.

aphillips · 2024-03-18T23:04:23Z

I think the group is settled on this one over #730. Let's close #730 and concentrate on finishing this one off.

Adding a note to syntax.md is probably useful.

…served-statement. In the 'reserved-statement' nonterminal, there is an ambiguity if there is more than one whitespace character between the 'reserved-keyword' and the first non-whitespace character of the 'reserved-body', because these whitespace characters can be seen as part of the 's' nonterminal or as part of the 'reserved-body' nonterminal. According to the principles explained in unicode-org#725 and the proposed resolution of unicode-org#721, it is not desired that a 'reserved-body' starts with a whitespace character; rather, such a whitespace character is meant to be interpreted as part of the preceding 's' nonterminal. Test case: ``` .regex /foo/{xyz}{{hello}} ``` This patch removes this ambiguity, by disallowing whitespace as the first character of a 'reserved-body' in a reserved-statement. It thus fixes the first part of unicode-org#721. Details: - In the other occurrences of 'resolved-body' as well (in a 'reserved-annotation' or 'private-use-annotation') the leading whitespace is separated as well. This has no influence on the set of inputs that the 'reserved-annotation' and 'private-use-annotation' nonterminals can match, but highlights that the parser should better trim off this leading whitespace in these places before entering the resolved-body into the data model. - A nonterminal 'resolved-body-part' is introduced.

bhaible · 2024-03-25T15:44:43Z

The simplification suggested by @eemeli is now included.

This assumes that PR unicode-org#731 is applied. In detail: * Since the reserved keywords start with a `.`, we need to talk about the keyword `.match`, not `match`. * Saying that "A _reserved annotation_ starts with a reserved character" and "A _reserved annotation_ MAY be empty" is a contradiction. Therefore, we need to distinguish a _reserved annotation_ from a _reserved body_. This distinction is also useful because the _reserved body_ occurs in two other places as well. * The statement that a _reserved body_ contains "arbitrary text" is not true, since we have now decided that (unless empty) it must start and end with a non-whitespace character. * A nonterminal `reserved-start` does not exist. What is meant is `reserved-annotation-start`. * The statement "Implementations MUST NOT remove or alter the contents of a _reserved annotation_." needs to be constrained, because now implementations are SUPPOSED to trim whitespace around the _reserved body_.

* A few proposed tweaks to syntax.md. This assumes that PR #731 is applied. In detail: * Since the reserved keywords start with a `.`, we need to talk about the keyword `.match`, not `match`. * Saying that "A _reserved annotation_ starts with a reserved character" and "A _reserved annotation_ MAY be empty" is a contradiction. Therefore, we need to distinguish a _reserved annotation_ from a _reserved body_. This distinction is also useful because the _reserved body_ occurs in two other places as well. * The statement that a _reserved body_ contains "arbitrary text" is not true, since we have now decided that (unless empty) it must start and end with a non-whitespace character. * A nonterminal `reserved-start` does not exist. What is meant is `reserved-annotation-start`. * The statement "Implementations MUST NOT remove or alter the contents of a _reserved annotation_." needs to be constrained, because now implementations are SUPPOSED to trim whitespace around the _reserved body_. * Subtle wording regarding whitespace around a reserved-body. Co-authored-by: Addison Phillips <[email protected]> --------- Co-authored-by: Addison Phillips <[email protected]>

bhaible mentioned this pull request Mar 15, 2024

Disallow whitespace as the first character of a reserved-body in a reserved-statement. #730

Closed

eemeli approved these changes Mar 16, 2024

View reviewed changes

aphillips added syntax Issues related with syntax or ABNF Agenda+ Requested for upcoming teleconference labels Mar 16, 2024

catamorphism mentioned this pull request Mar 18, 2024

syntax.md and message.abnf can get out of sync #734

Closed

catamorphism requested changes Mar 18, 2024

View reviewed changes

aphillips added fast-track Editorial change permitted to use fast-track merge rules LDML45 and removed Agenda+ Requested for upcoming teleconference labels Mar 18, 2024

aphillips approved these changes Mar 20, 2024

View reviewed changes

bhaible force-pushed the fix-syntax-ambiguity-before-resolve-body-alt branch from a238850 to dcf6f32 Compare March 25, 2024 15:44

bhaible mentioned this pull request Mar 25, 2024

A few proposed tweaks to syntax.md. #745

Merged

aphillips merged commit 20a61b4 into unicode-org:main Mar 25, 2024

eemeli added this to the LDML 45 milestone Jul 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Disallow whitespace as the first character of a reserved-body in a reserved-statement. #731

Disallow whitespace as the first character of a reserved-body in a reserved-statement. #731

Uh oh!

bhaible commented Mar 15, 2024

Uh oh!

eemeli left a comment •

edited

Loading

Uh oh!

eemeli Mar 16, 2024

Uh oh!

bhaible Mar 16, 2024

Uh oh!

aphillips Mar 20, 2024

Uh oh!

catamorphism left a comment

Uh oh!

catamorphism Mar 18, 2024

Uh oh!

aphillips Mar 20, 2024

Uh oh!

aphillips commented Mar 18, 2024

Uh oh!

bhaible commented Mar 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Disallow whitespace as the first character of a reserved-body in a reserved-statement. #731

Disallow whitespace as the first character of a reserved-body in a reserved-statement. #731

Uh oh!

Conversation

bhaible commented Mar 15, 2024

Uh oh!

eemeli left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eemeli Mar 16, 2024

Choose a reason for hiding this comment

Uh oh!

bhaible Mar 16, 2024

Choose a reason for hiding this comment

Uh oh!

aphillips Mar 20, 2024

Choose a reason for hiding this comment

Uh oh!

catamorphism left a comment

Choose a reason for hiding this comment

Uh oh!

catamorphism Mar 18, 2024

Choose a reason for hiding this comment

Uh oh!

aphillips Mar 20, 2024

Choose a reason for hiding this comment

Uh oh!

aphillips commented Mar 18, 2024

Uh oh!

bhaible commented Mar 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eemeli left a comment •

edited

Loading