Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ version = "1.0.2"
Serialization = "1.0"
julia = "1.0"

[deps]

[extras]
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
Serialization = "9e88b42a-f829-5b0c-bbe9-9e923198166b"
Expand Down
38 changes: 21 additions & 17 deletions docs/src/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,43 +56,47 @@ We use a hand-written lexer (a heavily modified version of
The main parser innovation is the `ParseStream` interface which provides a
stream-like I/O interface for writing the parser. The parser does not
depend on or produce any concrete tree data structure as part of the parsing
phase but the output spans can be post-processed into various tree data
phase but the output nodes can be post-processed into various tree data
structures as required. This is like the design of rust-analyzer though with a
simpler implementation.

Parsing proceeds by recursive descent;

* The parser consumes a flat list of lexed tokens as *input* using `peek()` to
examine tokens and `bump()` to consume them.
* The parser produces a flat list of text spans as *output* using `bump()` to
transfer tokens to the output and `position()`/`emit()` for nonterminal ranges.
* The parser produces a flat list of `RawGreenNode`s as *output* using `bump()` to
transfer tokens to the output and `position()`/`emit()` for nonterminal nodes.
* Diagnostics are emitted as separate text spans
* Whitespace and comments are automatically `bump()`ed and don't need to be
handled explicitly. The exception is syntactically relevant newlines in space
sensitive mode.
* Parser modes are passed down the call tree using `ParseState`.

The output spans track the byte range, a syntax "kind" stored as an integer
tag, and some flags. The kind tag makes the spans a [sum
type](https://blog.waleedkhan.name/union-vs-sum-types/) but where the type is
tracked explicitly outside of Julia's type system.
The output nodes track the byte range, a syntax "kind" stored as an integer
tag, and some flags. Each node also stores either the number of child nodes
(for non-terminals) or the original token kind (for terminals). The kind tag
makes the nodes a [sum type](https://blog.waleedkhan.name/union-vs-sum-types/)
but where the type is tracked explicitly outside of Julia's type system.

For lossless parsing the output spans must cover the entire input text. Using
For lossless parsing the output nodes must cover the entire input text. Using
`bump()`, `position()` and `emit()` in a natural way also ensures that:
* Spans are cleanly nested with children contained entirely within their parents
* Siblings spans are emitted in source order
* Parent spans are emitted after all their children.
* Nodes are cleanly nested with children contained entirely within their parents
* Sibling nodes are emitted in source order
* Parent nodes are emitted after all their children.

These properties make the output spans naturally isomorphic to a
These properties make the output nodes a post-order traversal of a
["green tree"](#raw-syntax-tree--green-tree)
in the terminology of C#'s Roslyn compiler.
in the terminology of C#'s Roslyn compiler, with the tree structure
implicit in the node spans.

### Tree construction

The `build_tree` function performs a depth-first traversal of the `ParseStream`
output spans allowing it to be assembled into a concrete tree data structure,
for example using the `GreenNode` data type. We further build on top of this to
define `build_tree` for the AST type `SyntaxNode` and for normal Julia `Expr`.
The `build_tree` function uses the implicit tree structure in the `ParseStream`
output to assemble concrete tree data structures. Since the output is already
a post-order traversal of `RawGreenNode`s with node spans encoding parent-child
relationships, tree construction is straightforward. We build on top of this to
define `build_tree` for various tree types including `GreenNode`, the AST type
`SyntaxNode`, and for normal Julia `Expr`.

### Error recovery

Expand Down
5 changes: 3 additions & 2 deletions src/JuliaSyntax.jl
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ export @K_str, kind

export SyntaxNode

@_public GreenNode,
@_public GreenNode, RedTreeCursor, GreenTreeCursor,
span

# Helper utilities
Expand All @@ -95,7 +95,8 @@ include("parser_api.jl")
include("literal_parsing.jl")

# Tree data structures
include("green_tree.jl")
include("tree_cursors.jl")
include("green_node.jl")
include("syntax_tree.jl")
include("expr.jl")

Expand Down
Loading
Loading