Skip to content

Commit e55e238

Browse files
authored
Big list of AST difference between Expr & GreenNode (#246)
Copy this list from issue #88 to the docs.
1 parent bc8bd78 commit e55e238

File tree

1 file changed

+69
-19
lines changed

1 file changed

+69
-19
lines changed

README.md

Lines changed: 69 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -366,10 +366,75 @@ Expr(:ncat)
366366

367367
## Tree differences between GreenNode and Expr
368368

369-
Wherever possible, the tree structure of `GreenNode`/`SyntaxNode` is 1:1 with
370-
`Expr`. There are, however, some exceptions. First, `GreenNode` inherently
371-
stores source position, so there's no need for the `LineNumberNode`s used by
372-
`Expr`. There's also a small number of other differences
369+
The tree structure of `GreenNode`/`SyntaxNode` is similar to Julia's `Expr`
370+
data structure but there are various differences:
371+
372+
### Source ordered children
373+
374+
The children of our trees are strictly in source order. This has many
375+
consequences in places where `Expr` reorders child expressions.
376+
377+
* Infix and postfix operator calls have the operator name in the *second* child position. `a + b` is parsed as `(call-i a + b)` - where the infix `-i` flag indicates infix child position - rather than `Expr(:call, :+, :a, :b)`.
378+
* Flattened generators are represented in source order
379+
380+
### No `LineNumberNode`s
381+
382+
Our syntax nodes inherently stores source position, so there's no need for the
383+
`LineNumberNode`s used by `Expr`.
384+
385+
### More consistent / less redundant `block`s
386+
387+
Sometimes `Expr` needs redundant block constructs to store `LineNumberNode`s,
388+
but we don't need these. Also in cases which do use blocks we try to use them
389+
consistently.
390+
391+
* No block is used on the right hand side of short form function syntax
392+
* No block is used for the conditional in `elseif`
393+
* No block is used for the body of anonymous functions after the `->`
394+
* `let` argument lists always use a block regardless of number or form of bindings
395+
396+
### Faithful representation of the source text / avoid premature lowering
397+
398+
Some cases of "premature lowering" have been removed, preferring to represent
399+
the source text more closely.
400+
401+
* `K"macrocall"` - allow users to easily distinguish macrocalls with parentheses from those without them (#218)
402+
* Grouping parentheses are represented with a node of kind `K"parens"` (#222)
403+
* Ternary syntax is not immediately lowered to an `if` node: `a ? b : c` parses as `(? a b c)` rather than `Expr(:if, :a, :b, :c)` (#85)
404+
* `global const` and `const global` are not normalized by the parser. This is done in `Expr` conversion (#130)
405+
* The AST for `do` is flatter and not lowered to a lambda by the parser: `f(x) do y ; body end` is parsed as `(do (call f x) (tuple y) (block body))` (#98)
406+
* `@.` is not lowered to `@__dot__` inside the parser (#146)
407+
* Docstrings use the `K"doc"` kind, and are not lowered to `Core.@doc` until later (#217)
408+
409+
### Containers for string-like constructs
410+
411+
String-like constructs always come within a container node, not as a single
412+
token. These are useful for tooling which works with the tokens of the source
413+
text. Also separating the delimiters from the text they delimit removes a whole
414+
class of tokenization errors and lets the parser deal with them.
415+
416+
* string always use `K"string"` to wrap strings, even when they only contain a single string chunk (#94)
417+
* char literals are wrapped in the `K"char"` kind, containing the character literal string along with their delimiters (#121)
418+
* backticks use the `K"cmdstring"` kind
419+
* `var""` syntax uses `K"var"` as the head (#127)
420+
* The parser splits triple quoted strings into string chunks interspersed with whitespace trivia
421+
422+
### Improvements for AST inconsistencies
423+
424+
* Dotted call syntax like `f.(a,b)` and `a .+ b` has been made consistent with the `K"dotcall"` head (#90)
425+
* Standalone dotted operators are always parsed as `(. op)`. For example `.*(x,y)` is parsed as `(call (. *) x y)` (#240)
426+
* The `K"="` kind is used for keyword syntax rather than `kw`, to avoid various inconsistencies and ambiguities (#103)
427+
* Unadorned postfix adjoint is parsed as `call` rather than as a syntactic operator for consistency with suffixed versions like `x'ᵀ` (#124)
428+
429+
### Improvements to awkward AST forms
430+
431+
* Frakentuples with multiple parameter blocks like `(a=1, b=2; c=3; d=4)` are flattened into the parent tuple instead of using nested `K"parameters"` nodes (#133)
432+
* Using `try catch else finally end` is parsed with `K"catch"` `K"else"` and `K"finally"` children to avoid the awkwardness of the optional child nodes in the `Expr` representation (#234)
433+
* The dotted import path syntax as in `import A.b.c` is parsed with a `K"importpath"` kind rather than `K"."`, because a bare `A.b.c` has a very different nested/quoted expression representation (#244)
434+
* We use flags rather than child nodes to represent the difference between `struct` and `mutable struct`, `module` and `baremodule` (#220)
435+
436+
437+
## More detail on tree differences
373438

374439
### Flattened generators
375440

@@ -460,21 +525,6 @@ julia> text = "x = \"\"\"\n \$a\n b\"\"\""
460525
21:23 │ """ "\"\"\""
461526
```
462527

463-
### Less redundant `block`s
464-
465-
Sometimes `Expr` needs to contain redundant block constructs in order to have a
466-
place to store `LineNumberNode`s, but we don't need these and avoid adding them
467-
in several cases:
468-
* The right hand side of short form function syntax
469-
* The conditional in `elseif`
470-
* The body of anonymous functions after the `->`
471-
472-
### Distinct conditional ternary expression
473-
474-
The syntax `a ? b : c` is the same as `if a b else c` in `Expr` so macros can't
475-
distinguish these cases. Instead, we use a distinct expression head `K"?"` and
476-
lower to `Expr(:if)` during `Expr` conversion.
477-
478528
### String nodes always wrapped in `K"string"` or `K"cmdstring"`
479529

480530
All strings are surrounded by a node of kind `K"string"`, even non-interpolated

0 commit comments

Comments
 (0)