@@ -63,10 +63,10 @@ line:col│ byte_range │ tree │ file_name
63
63
```
64
64
65
65
Internally this has a full representation of all syntax trivia (whitespace and
66
- comments) as can be seen with the more raw "green tree" representation with
67
- ` GreenNode ` . Here ranges on the left are byte ranges, and ` ✔ ` flags nontrivia
68
- tokens. Note that the parentheses are trivia in the tree representation,
69
- despite being important for parsing.
66
+ comments) as can be seen with the more raw [ "green tree"] ( #raw-syntax-tree--green-tree )
67
+ representation with ` GreenNode ` . Here ranges on the left are byte ranges, and
68
+ ` ✔ ` flags nontrivia tokens. Note that the parentheses are trivia in the tree
69
+ representation, despite being important for parsing.
70
70
71
71
``` julia
72
72
julia> text = " (x + y)*z"
@@ -211,7 +211,7 @@ For lossless parsing the output spans must cover the entire input text. Using
211
211
* Parent spans are emitted after all their children.
212
212
213
213
These properties make the output spans naturally isomorphic to a
214
- [ "green tree"] ( https://ericlippert.com/2012/06/08/red- green-trees/ )
214
+ [ "green tree"] ( #raw-syntax-tree-- green-tree )
215
215
in the terminology of C#'s Roslyn compiler.
216
216
217
217
### Tree construction
@@ -533,6 +533,8 @@ There's arguably a few downsides:
533
533
534
534
# Differences from the flisp parser
535
535
536
+ _ See also the [ § Comparisons to other packages] ( #comparisons-to-other-packages ) section._
537
+
536
538
Practically the flisp parser is not quite a classic [ recursive descent
537
539
parser] ( https://en.wikipedia.org/wiki/Recursive_descent_parser ) , because it
538
540
often looks back and modifies the output tree it has already produced. We've
@@ -767,6 +769,8 @@ parsing `key=val` pairs inside parentheses.
767
769
768
770
### Official Julia compiler
769
771
772
+ _ See also the [ § Differences from the flisp parser] ( #differences-from-the-flisp-parser ) section._
773
+
770
774
The official Julia compiler frontend lives in the Julia source tree. It's
771
775
mostly contained in just a few files:
772
776
* The parser in [ src/julia-parser.scm] ( https://github.com/JuliaLang/julia/blob/9c4b75d7f63d01d12b67aaf7ce8bb4a078825b52/src/julia-parser.scm )
@@ -793,41 +797,42 @@ structures and FFI is complex and inefficient.
793
797
### JuliaParser.jl
794
798
795
799
[ JuliaParser.jl] ( https://github.com/JuliaLang/JuliaParser.jl )
796
- was a direct port of Julia's flisp reference parser but was abandoned around
797
- Julia 0.5 or so. However it doesn't support lossless parsing and doing so would
798
- amount to a full rewrite. Given the divergence with the flisp reference parser
799
- since Julia-0.5, it seemed better just to start with the reference parser
800
- instead.
800
+ was a direct port of Julia's flisp reference parser, but was abandoned around
801
+ Julia 0.5 or so. Furthermore, it doesn't support lossless parsing, and adding
802
+ that feature would amount to a full rewrite. Given its divergence with the flisp
803
+ reference parser since Julia-0.5, it seemed better just to start anew from the
804
+ reference parser instead.
801
805
802
806
### Tokenize.jl
803
807
804
808
[ Tokenize.jl] ( https://github.com/JuliaLang/Tokenize.jl )
805
809
is a fast lexer for Julia code. The code from Tokenize has been
806
810
imported and used in JuliaSyntax, with some major modifications as discussed in
807
- the lexer implementation section.
811
+ the [ lexer implementation] ( #lexing ) section.
808
812
809
813
### CSTParser.jl
810
814
811
815
[ CSTParser.jl] ( https://github.com/julia-vscode/CSTParser.jl )
812
816
is a ([ mostly?] ( https://github.com/domluna/JuliaFormatter.jl/issues/52#issuecomment-529945126 ) )
813
- lossless parser with goals quite similar to JuliaParser and used extensively in
814
- the VSCode / LanguageServer / JuliaFormatter ecosystem. CSTParser is very useful
815
- but I do find the implementation hard to understand and I wanted to try a fresh
816
- approach with a focus on:
817
+ lossless parser with goals quite similar to JuliaParser. It is used extensively
818
+ in the VSCode / LanguageServer / JuliaFormatter ecosystem. CSTParser is very
819
+ useful, but I do find the implementation hard to understand, and I wanted to try
820
+ a fresh approach with a focus on:
817
821
818
- * "Production readyness ": Good docs, tests, diagnostics and maximum similarity
822
+ * "Production readiness ": Good docs, tests, diagnostics and maximum similarity
819
823
with the flisp parser, with the goal of getting the new parser into ` Core ` .
820
824
* Learning from the latest ideas about composable parsing and data structures
821
825
from outside Julia. In particular the implementation of ` rust-analyzer ` is
822
- very clean, well documented, and a great source of inspiration.
826
+ very clean, well documented, and was a great source of inspiration.
823
827
* Composability of tree data structures — I feel like the trees should be
824
- layered somehow with a really lightweight green tree at the most basic level,
825
- similar to Roslyn or rust-analyzer. In comparison CSTParser uses a more heavy
826
- weight non-layered data structure. Alternatively or additionally, have a
827
- common tree API with many concrete task-specific implementations.
828
+ layered somehow with a really lightweight [ green tree] ( #raw-syntax-tree--green-tree )
829
+ at the most basic level, similar to Roslyn or rust-analyzer. In comparison,
830
+ CSTParser uses a more heavyweight non-layered data structure. Alternatively or
831
+ additionally, have a common tree API with many concrete task-specific
832
+ implementations.
828
833
829
834
A big benefit of the JuliaSyntax parser is that it separates the parser code
830
- from the tree data structures entirely which should give a lot of flexibility
835
+ from the tree data structures entirely, which should give a lot of flexibility
831
836
in experimenting with various tree representations.
832
837
833
838
I also want JuliaSyntax to tackle macro expansion and other lowering steps, and
@@ -840,12 +845,12 @@ Using a modern production-ready parser generator like `tree-sitter` is an
840
845
interesting option and some progress has already been made in
841
846
[ tree-sitter-julia] ( https://github.com/tree-sitter/tree-sitter-julia ) .
842
847
But I feel like the grammars for parser generators are only marginally more
843
- expressive than writing the parser by hand after accounting for the effort
848
+ expressive than writing the parser by hand, after accounting for the effort
844
849
spent on the weird edge cases of a real language and writing the parser's tests
845
850
and "supporting code".
846
851
847
- On the other hand a hand-written parser is completely flexible and can be
848
- mutually understood with the reference implementation so I chose that approach
852
+ On the other hand, a hand-written parser is completely flexible and can be
853
+ mutually understood with the reference implementation, so I chose that approach
849
854
for JuliaSyntax.
850
855
851
856
# Resources
@@ -1020,7 +1025,8 @@ work flows:
1020
1025
1021
1026
### Raw syntax tree / Green tree
1022
1027
1023
- Raw syntax tree (or "Green tree" in the terminology from Roslyn)
1028
+ Raw syntax tree (or [ "Green tree"] ( https://ericlippert.com/2012/06/08/red-green-trees/ )
1029
+ in the terminology from Roslyn)
1024
1030
1025
1031
We want GreenNode to be
1026
1032
* * structurally minimal* — For efficiency and generality
0 commit comments