Skip to content

Conversation

@bhaible
Copy link
Contributor

@bhaible bhaible commented Mar 15, 2024

In the 'reserved-statement' nonterminal, there is an ambiguity if there is more than one whitespace character between the 'reserved-keyword' and the first non-whitespace character of the 'reserved-body', because these whitespace characters can be seen as part of the 's' nonterminal or as part of the 'reserved-body' nonterminal.

According to the principles explained in #725 and the proposed resolution of #721, it is not desired that a 'reserved-body' starts with a whitespace character; rather, such a whitespace character is meant to be interpreted as part of the preceding 's' nonterminal.

Test case:

.regex   /foo/{xyz}{{hello}}

This patch removes this ambiguity, by disallowing whitespace as the first character of a 'reserved-body' in a reserved-statement.

It thus fixes the first part of #721.

Details:

  • Other occurrences of 'resolved-body' (after a 'reserved-annotation' or 'private-use-annotation') are not affected.
  • A new nonterminal 'resolved-body-part' is introduced, referenced twice.
  • A new nonterminal 'reserved-body-trimmed' is introduced, referenced once. Its purpose is to clarify that the two parts ('reserved-body-part' and 'reserved-body') belong together.


; Reserve additional .keywords for use by future versions of this specification.
reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression)
reserved-statement = reserved-keyword [s reserved-body-trimmed] 1*([s] expression)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming here is unlike our usual naming. It would be better to follow the start/part convention and eschew trimmed. Something like:

reserved-body = reserved-body-start [reserved-body-part]
reserved-body-start = *([s] 1*reserved-body-part)
reserved-body-part = reserved-char / reserved-escape / quoted

Note that my version makes an empty body possible and also a 1 character body possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to follow the start/part convention

I see what you mean. However, the changes you proposed are a no-op. Let me formulate two alternative proposals.

@aphillips aphillips added syntax Issues related with syntax or ABNF Agenda+ Requested for upcoming teleconference labels Mar 15, 2024
@bhaible bhaible force-pushed the fix-syntax-ambiguity-before-resolve-body branch from e418a4e to 43c9264 Compare March 15, 2024 15:38
…served-statement.

In the 'reserved-statement' nonterminal, there is an ambiguity if there is more
than one whitespace character between the 'reserved-keyword' and the first
non-whitespace character of the 'reserved-body', because these whitespace
characters can be seen as part of the 's' nonterminal or as part of the
'reserved-body' nonterminal.

According to the principles explained in unicode-org#725 and the proposed resolution
of unicode-org#721, it is not desired that a 'reserved-body' starts with a whitespace
character; rather, such a whitespace character is meant to be interpreted
as part of the preceding 's' nonterminal.

Test case:
```
.regex   /foo/{xyz}{{hello}}
```

This patch removes this ambiguity, by disallowing whitespace as the first
character of a 'reserved-body' in a reserved-statement.

It thus fixes the first part of unicode-org#721.

Details:
  - Other occurrences of 'resolved-body' (after a 'reserved-annotation' or
    'private-use-annotation') are not affected.
  - A new nonterminal 'resolved-body-part' is introduced, referenced twice.
  - A new nonterminal 'reserved-body-in-statement' is introduced, referenced
    once. Its purpose is to clarify that the two parts belong together.
  - A new nonterminal 'reserved-body-in-statement-start' is introduced,
    in order to follow the common *-start / *-part idiom.
@bhaible bhaible force-pushed the fix-syntax-ambiguity-before-resolve-body branch from 43c9264 to bbf6880 Compare March 15, 2024 15:51

; Reserve additional .keywords for use by future versions of this specification.
reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression)
reserved-statement = reserved-keyword [s reserved-body-in-statement] 1*([s] expression)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should steer clear of the -in-statement part.

reserved-statement = reserved-keyword [s reserved-body] 1*([s] expression)
reserved-body = reserved-body-start *([s] 1*reserved-body-part)
reserved-body-start = *([s] 1*reserved-body-part)
reserved-body-part = reserved-char / reserved-escape / quoted

I recognize that reserved-body is also used by private-use-annotation. However, the intention there should not be to capture ending whitespace either. private-use-annotation is part of annotation and has the same relationship to e.g. the optional attribute in expression. Trimming private use is thus also desirable to eliminate ambiguity.

@bhaible
Copy link
Contributor Author

bhaible commented Mar 15, 2024

In response to your comment, I am proposing two alternatives: one in this pull request, and one in #731. Pick the one you prefer and discard the other one.

@bhaible
Copy link
Contributor Author

bhaible commented Mar 15, 2024

@aphillips writes:

the intention there should not be to capture ending whitespace either

The ending whitespace has already been handled through #727. Here we are discussing the whitespace at the start / before 'reserved-body'.

@aphillips aphillips added resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. and removed Agenda+ Requested for upcoming teleconference labels Mar 18, 2024
@bhaible
Copy link
Contributor Author

bhaible commented Mar 20, 2024

Canceling in favour of #731 (group decision from 2024-03-18).

@bhaible bhaible closed this Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. syntax Issues related with syntax or ABNF

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants