Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions crates/solidity-v2/inputs/language/src/BREAKING_CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,54 @@ We should consider adding validation for these at a later stage if needed:
- They were only enabled after `0.7.0`.
- `HexLiteral` and `YulHexLiteral` and `DecimalLiteral` and `YulDecimalLiteral`:
- It was illegal for them to be followed by `IdentifierStart`. Now we will produce two separate tokens rather than rejecting it.

## Grammar

The following changes modify the language definition to support the new parser and resolve grammar ambiguities.
In some cases we also try to simplify the model.

### AddressKeyword

- Made the `address` keyword reserved in all versions, handling the few cases where it can be used as an identifier separately.
- `IdentifierPathElement` handles the cases where `address` can be used as an `Identifier`, either in an `IdentifierPath` or a `MemberAccessExpression`.

### IdentifierPathElement

New enum added to allow the reserved `address` keyword in identifier paths and member access expressions
(from Solidity 0.6.0):

- Variants: `Identifier` | `AddressKeyword` (enabled from 0.6.0)
- Used in `MemberAccessExpression` and in `IdentifierPath`

### IdentifierPath

Changed from a `Separated` list of `Identifier`, to a list of `IdentifierPathElement`, to capture the reserved
`AddressKeyword` as part of the path.

- **Before**: `Separated(name = IdentifierPath, reference = Identifier, separator = Period)`
- **After**: `Separated(name = IdentifierPath, reference = IdentifierPathElement, separator = Period)`

### TupleDeconstructionStatement

Major restructuring to resolve ambiguities with tuple expressions and assignment expressions:

- **Before**: Single struct with optional `var_keyword`, `open_paren`, `elements: TupleDeconstructionElements`, `close_paren`, `equal`, `expression`, `semicolon`.
- **After**: Split into typed and untyped (var) variants:
- `TupleDeconstructionStatement`: Contains `target: TupleDeconstructionTarget`, `equal`, `expression`, `semicolon`
- `TupleDeconstructionTarget`: Enum with `VarTupleDeconstructionTarget` (till 0.5.0) | `TypedTupleDeconstructionTarget`
- `VarTupleDeconstructionTarget`: For `var (a, b) = ...` syntax (till 0.5.0)
- `TypedTupleDeconstructionTarget`: For `(uint a, uint b) = ...` syntax

This makes certain cases that were allowed before disallowed in V2, in particular having untyped declarations (like `(a, bool b) = ...`)
or having typed together with `var` (like `var (a, bool b) = ...`).
The cases where using empty tuples are still ambiguous, `(,,,) = ...` can still be a `TupleDeconstructionStatement` or a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For another PR/discussion, we considered merging TupleDeconstructionStatement and VariableDeclaration, to merge all variable declarations together, however I think this will look a bit artificial since their shape is quite different.

I think having this distinction (declaring new vars VS assigning to existing ones) is worth the destinction, both syntactically and semantically. WDYT of having the following structure, if it works with LALR?

  • use VariableDeclarationStatement for any syntax that declares a new name:
    • var x = ... already supported
    • int x = ... already supported
    • change VariableDeclarationStatement::name field to an enum with two variants:
      • name: Identifier -> existing
      • elements -> a struct holding LeftParen + Separated(elements) + RightParen
  • use AssignmentExpression for any syntax that just assigns values to the LHS:
    • x = ....
    • (x, y) = ....
    • (,,,) = ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having this distinction (...) is worth the destinction

I completely agree

The problem I see with the proposed structure is that currently the int and the var in the first 2 cases are captured by the same definition, so you could end up with a language accepting something like int (a, b, c) = ....

But also, since you want to allow for the elements of the tuple to have the type within it, you'd maybe want to make the var/int struct optional, to allow (bool a, uint b) = ..., but that would also parse x = ... as valid.

Coming from a perspective of appeasing LALRPOP, I'd say the distinction has to be a bit stronger, so allowing VariableDeclarationStatement to be an enum over:

  • SingleExplicitDeclaration: int x = ...
  • MultiExplicitDeclaration: (bool a, , int b) = ...
  • ImplicitDeclaration (until 0.5.0): var a = ... and var (a, , b) = ... (so this one would have the enum allowing either a single Identifier or a tuple of Identifier)

This could be simplified when lowering it to an AST

What do you think? I'll try to push a single commit with these changes so you can review them.

an `AssignmentExpression` with a `TupleExpression` on the lhs.
Comment on lines +70 to +71
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can still be a TupleDeconstructionStatement or a an AssignmentExpression

Which one? I wonder if we have existing cst_output tests for this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how about var (,,,,)? this is legal AFAICT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one?

This is a tricky question since there's still no parser on V2, I'll try to answer it:

  • Right now the v2 language definition (with these changes) is still ambiguous, so in theory parsing either one of those options is correct.
  • If we choose to do like the V1 and give priority to definitions higher up the definition.rs file, then they'd be parsed as TupleDeconstructionStatement
  • Once the V2 parser is done, we need to handle this ambiguity and choose one, I'd go for those cases being an AssignmentExpression, since they're not declaring a variable at all.
  • Right now there's no way to express "a separated item, where every item is optional but has to appear at least once" in the language definition DSL, so it makes it difficult to express this within the language DSL
    • We could separate it into a prefix of empty tuple (ie (,,,,), then an element that must be there (ie bool a), and then a postfix of possible empty tuple elements (, bool b, , ,)); but that would make a parsing problem make the CST and general API worse.
  • Also, I just checked solc doesn't seem to allow empty tuples at all (ie (,,,) = ... or () = ...) after 0.5.0, so maybe we need to validate this after parsing.

I wonder if we have existing cst_output tests for this case?

We have some cases, I added a few more (they're only testing V1 for now)

Also, how about var (,,,,)? this is legal AFAICT.

This is legal with the new definition as well, since the elements of the Separated (UntypedTupleDeconstructionElement) have an optional identifier as its field:

Struct(
    name = UntypedTupleDeconstructionElement,
    enabled = Till("0.5.0"),
    fields = (name = Optional(reference = Identifier))
),


Removed types: `TupleDeconstructionElements`, `TupleDeconstructionElement`, `TupleMember`, `TypedTupleMember`, `UntypedTupleMember`

Added types: `TupleDeconstructionTarget`, `VarTupleDeconstructionTarget`, `UntypedTupleDeconstructionElements`, `UntypedTupleDeconstructionElement`, `TypedTupleDeconstructionTarget`, `TypedTupleDeconstructionElements`, `TypedTupleDeconstructionElement`, `TypedTupleDeconstructionMember`

### NamedArgumentsDeclaration

- Changed `arguments` field from `Optional(NamedArgumentGroup)` to `Required(NamedArgumentGroup)`.
- This avoids ambiguity with empty argument lists `()` which could be either positional or named.
100 changes: 70 additions & 30 deletions crates/solidity-v2/inputs/language/src/definition.rs
Original file line number Diff line number Diff line change
Expand Up @@ -492,10 +492,13 @@ language_v2_macros::compile!(Language(
definitions = [KeywordDefinition(value = Atom("abstract"))]
),
Keyword(
// `address` is a reserved keyword, but it can still be used as an identifier in some contexts,
// in particular as a member access (e.g., `myPayload.address`) or as an identifier
// path
// See `IdentifierPathElement` for details
name = AddressKeyword,
identifier = Identifier,
definitions =
[KeywordDefinition(reserved = Never, value = Atom("address"))]
definitions = [KeywordDefinition(value = Atom("address"))]
),
Keyword(
name = AfterKeyword,
Expand Down Expand Up @@ -2925,49 +2928,75 @@ language_v2_macros::compile!(Language(
items = [
Struct(
name = TupleDeconstructionStatement,
error_recovery = FieldsErrorRecovery(terminator = semicolon),
fields = (
target = Required(TupleDeconstructionTarget),
equal = Required(Equal),
expression = Required(Expression),
semicolon = Required(Semicolon)
)
),
Enum(
name = TupleDeconstructionTarget,
variants = [
EnumVariant(
reference = VarTupleDeconstructionTarget,
enabled = Till("0.5.0")
),
EnumVariant(reference = TypedTupleDeconstructionTarget)
]
),
Struct(
name = VarTupleDeconstructionTarget,
enabled = Till("0.5.0"),
error_recovery = FieldsErrorRecovery(
terminator = semicolon,
delimiters =
FieldDelimiters(open = open_paren, close = close_paren)
),
fields = (
var_keyword =
Optional(reference = VarKeyword, enabled = Till("0.5.0")),
var_keyword = Required(VarKeyword),
open_paren = Required(OpenParen),
elements = Required(TupleDeconstructionElements),
close_paren = Required(CloseParen),
equal = Required(Equal),
expression = Required(Expression),
semicolon = Required(Semicolon)
elements = Required(UntypedTupleDeconstructionElements),
close_paren = Required(CloseParen)
)
),
Separated(
name = TupleDeconstructionElements,
reference = TupleDeconstructionElement,
separator = Comma
name = UntypedTupleDeconstructionElements,
reference = UntypedTupleDeconstructionElement,
separator = Comma,
enabled = Till("0.5.0")
),
Struct(
name = TupleDeconstructionElement,
fields = (member = Optional(reference = TupleMember))
),
Enum(
name = TupleMember,
variants = [
EnumVariant(reference = TypedTupleMember),
EnumVariant(reference = UntypedTupleMember)
]
name = UntypedTupleDeconstructionElement,
enabled = Till("0.5.0"),
fields = (name = Optional(reference = Identifier))
),
Struct(
name = TypedTupleMember,
name = TypedTupleDeconstructionTarget,
error_recovery = FieldsErrorRecovery(
delimiters =
FieldDelimiters(open = open_paren, close = close_paren)
),
fields = (
type_name = Required(TypeName),
storage_location = Optional(reference = StorageLocation),
name = Required(Identifier)
open_paren = Required(OpenParen),
elements = Required(TypedTupleDeconstructionElements),
close_paren = Required(CloseParen)
)
),
Separated(
name = TypedTupleDeconstructionElements,
reference = TypedTupleDeconstructionElement,
separator = Comma
),
Struct(
name = UntypedTupleMember,
name = TypedTupleDeconstructionElement,
fields =
(member = Optional(reference = TypedTupleDeconstructionMember))
),
Struct(
name = TypedTupleDeconstructionMember,
fields = (
type_name = Required(TypeName),
storage_location = Optional(reference = StorageLocation),
name = Required(Identifier)
)
Expand Down Expand Up @@ -3498,7 +3527,7 @@ language_v2_macros::compile!(Language(
model = Postfix,
fields = (
period = Required(Period),
member = Required(Identifier)
member = Required(IdentifierPathElement)
)
)]
),
Expand Down Expand Up @@ -3590,7 +3619,7 @@ language_v2_macros::compile!(Language(
),
fields = (
open_paren = Required(OpenParen),
arguments = Optional(reference = NamedArgumentGroup),
arguments = Required(NamedArgumentGroup),
close_paren = Required(CloseParen)
)
),
Expand Down Expand Up @@ -3985,11 +4014,22 @@ language_v2_macros::compile!(Language(
title = "Identifiers",
lexical_context = Solidity,
items = [
// Since an identifier path can include the reserved keyword `address` as parth of the path
// we use `IdentifierPathElement` to represent each element, instead of `Identifier`.
Separated(
name = IdentifierPath,
reference = Identifier,
reference = IdentifierPathElement,
separator = Period
),
Enum(
// An element of an identifier path can be either an identifier or the reserved `address` keyword
// Note: This is also used on `MemberAccessExpression`
name = IdentifierPathElement,
variants = [
EnumVariant(reference = Identifier),
EnumVariant(reference = AddressKeyword, enabled = From("0.6.0"))
]
),
Token(
name = Identifier,
definitions = [TokenDefinition(Sequence([
Expand Down
Loading