-
Notifications
You must be signed in to change notification settings - Fork 82
RFC: Empty Statement Syntax #146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,86 @@ | ||
| # Empty Statement Syntax | ||
|
|
||
| ## Summary | ||
|
|
||
| Allow for empty statements with semicolons, as introduced in Lua 5.2. | ||
|
|
||
| ```luau | ||
| ;do ; end; | ||
| ``` | ||
| Would be a syntactically valid document equivalent to: | ||
| ```luau | ||
| do end; | ||
| ``` | ||
|
|
||
| ## Motivation | ||
|
|
||
| The semicolon symbol `;` can be used at the end of a statement, explicitly ending it. This is useful when a statement is ambiguous (especially as line breaks do not end statements), or for visual separation when several statements are in a single line. It is however not valid where the beginning of a statement would be expected, and a statement in a block can only be followed by the beginning of another statement or end of the block. Thus, a semicolon can't be used as an "empty statement" on its own. Users may expect this to be the same as an empty file, but is instead a parsing error: | ||
|
|
||
| ```luau | ||
| ; | ||
| ``` | ||
| > Error:1: Expected identifier when parsing expression, got ';' | ||
|
|
||
| In human-written code, a deliberate empty statement is seemingly "useless", but can have intentional stylistic purposes. Mostly as intentional no-ops, visual separation of code blocks, or placeholders for future code. Lua began allowing empty statement syntax in 5.2: users of Lua may be unpleasantly surprised to find that Luau has not followed suit. | ||
|
|
||
| In particular, who this syntax can be very useful for is tools that generate Luau code. Take this example: | ||
|
|
||
| ```luau | ||
| print("Hello") | ||
| GLOBAL_EVENT "Foo" | ||
| print("World") | ||
| ``` | ||
|
|
||
| A tool may want to replace the statement `GLOBAL_EVENT "Foo"`, with for example `(_G.events["Foo"] :: Event):fire()`. Inserting it naively would cause an error due to the ambiguity of the parentheses: the first two lines are a single statement. Luau currently errors with statements like this, refusing to parse them. | ||
|
|
||
| ```luau | ||
| print("Hello") | ||
| (_G.events["Foo"] :: Event):fire() | ||
| print("World") | ||
| ``` | ||
| > Error:2: Ambiguous syntax: this looks like an argument list for a function call, but could also be a start of new statement; use ';' to separate statements | ||
|
|
||
| To solve it, the tool will likely want to insert a semicolon at the beginning, and, to be sure, end of the statement. Doing so would also be dangerous, as a syntax error would occur if the statement already had any surrounding semicolons or no statement was preceding it in the block. Tools need to check the context to skip introducing a semicolon where it'd cause an error, adding unnecessary and probably unexpected complexity. If empty statements were allowed, tools could safely surround statements with semicolons, guarding against ambiguity without having to worry about context. | ||
|
|
||
| This is especially useful for simple "insert-replace" tools that don't perform a syntactic analysis of the code, such as search-and-replace operations in code editors, where currently each replacement would need review to avoid syntax errors. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The tooling argument is compelling, but this argument is not. Using find-and-replace to make your code a horrible mess sounds like an antipattern, and the fact that you today might get parsing errors to correct intentionally actually sounds much better in comparison.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Writing prettier or uglier code with more or fewer semicolons is up to the user. If they insert unnecessary semicolons, they can manually remove them where they see fit or pass the code through a formatter. "Ugly syntax" that is not ambiguous should not be a parsing error, regardless of how ugly it is.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If unnecessary semicolons are too horrendous to pass normally, Luau's linter could add a "DuplicatedSemicolon" lint when two or more semicolons are used in a row without whitespace in between. But this is merely style and imo should be up to an external tool. And hey, maybe someone wants to use double semicolons to be extra explicit about a no-op. They should be able to without errors or lint warnings.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To make a clearer and more reasonable example: in my code, I might have a specific comment that I want to replace with a series of debugging statements when built for debug. For this, I use a simple CLI tool that replaces occurrences of that comment with my statements. If I want to avoid ambiguity, I'd need to either add a preceding semicolon where appropriate to the comments, or use a more sophisticated and unnecessary solution that performs a syntactic analysis. Both unideal for something theoretically dead simple. |
||
|
|
||
| Allowing this syntax does not introduce any ambiguities or conflicts with existing valid ones. | ||
|
|
||
| ## Design | ||
|
|
||
| The parser would be updated to make a semicolon in a block its own [`AstStat`](https://github.com/luau-lang/luau/blob/6b787963bc2b590f9909bf33aced826c43328444/Ast/include/Luau/Ast.h#L239), and allow for any amount of semicolon statements after `laststat`. This would be similar to how Lua 5.2 and later handle them. The EBNF grammar would be updated as follows: | ||
|
|
||
| ```ebnf | ||
| block ::= {stat | ';'} [laststat {';'}] | ||
| ``` | ||
|
|
||
| As already is the case, empty statements have no impact in generated bytecode. | ||
|
|
||
| ## Drawbacks | ||
|
|
||
| Previously erroring code would now parse. This is not considered to be a problem, as such programs were never valid in the first place. | ||
|
|
||
| This syntax, when used, is not backwards-compatible. However, it can easily be removed without changing the meaning of code (which isn't the case for many other syntax features introduced by Luau). | ||
aatxe marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| This syntax addition technically allows for unusual "ugly" code such as: | ||
|
|
||
| ```luau | ||
| ;;; ;; while x do ; ;;;;; print(x);;foo();; ;x += 1;; ; end return 1,2;;; | ||
| ``` | ||
|
|
||
| Code using unnecessary semicolons is likely computer-generated and not meant to be human-readable, where a formatter would be needed (and probably already used). | ||
|
|
||
| Tools parsing Luau code will need to be updated to handle this minor syntax change. | ||
|
|
||
| ## Alternatives | ||
|
|
||
| More generally, similar "void" separators could also be allowed for other list-like syntaxes, such as generic lists, parameters, table constructors, or expression lists: | ||
|
|
||
| ```luau | ||
| local function test<,,,,A,, B,, C...,,,,>(,,a: A,,,, b: B,, ...: C...,,): (,number,,, string,) | ||
| local _ = {1,,, 2;;,;, 3;;;;,} | ||
| return ,,42,,,, "hello",,; | ||
| end | ||
| ``` | ||
|
|
||
| This syntax however, unlike with statement blocks, is considerably confusing and unusual, has an unclear purpose, and is very likely a mistake, with little to no practical use cases. In table constructors, a single trailing separator is already allowed. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are these obviously a mistake, but the proposed syntax isn't obviously a mistake? That feels like it needs to be articulated much more crisply.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unlike with statements, an "empty element" in these other list syntaxes does not make sense and needs to either be
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not asking you to convince me here, I'm asking you to revise the RFC to be more convincing.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just stating it here for documentation and amending with any further observations when I can |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beginning and end? Why? It's trivial to decide whether a semicolon is necessary by literally just stringifying the next statement, and then see if that string starts with
(. If it starts with(, then add a;to the current line, then\nand then add the indentation and finally, the statement string in question.Something like this:
You can avoid an intermediate table if you wish, but I don't care. This argument is weak. Obviously, the specific code I've given you will still generate
;even when it isn't strictly necessary, but you can just replaceends_with_semi_colonwith a function that takes somestmtand returns a boolean indicating whether the givenstmtfragment ends with an expression and without a semi-colon. Still trivial.