Skip to content

Commit 5e46a0d

Browse files
authored
Various improvements to the README (#195)
- Slightly reword some passages for clarity - Use punctuation to help reveal sentence structure - Use section links to provide additional context and make the document more cohesive
1 parent 947359c commit 5e46a0d

File tree

1 file changed

+32
-26
lines changed

1 file changed

+32
-26
lines changed

README.md

Lines changed: 32 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -63,10 +63,10 @@ line:col│ byte_range │ tree │ file_name
6363
```
6464

6565
Internally this has a full representation of all syntax trivia (whitespace and
66-
comments) as can be seen with the more raw "green tree" representation with
67-
`GreenNode`. Here ranges on the left are byte ranges, and `` flags nontrivia
68-
tokens. Note that the parentheses are trivia in the tree representation,
69-
despite being important for parsing.
66+
comments) as can be seen with the more raw ["green tree"](#raw-syntax-tree--green-tree)
67+
representation with `GreenNode`. Here ranges on the left are byte ranges, and
68+
`` flags nontrivia tokens. Note that the parentheses are trivia in the tree
69+
representation, despite being important for parsing.
7070

7171
```julia
7272
julia> text = "(x + y)*z"
@@ -211,7 +211,7 @@ For lossless parsing the output spans must cover the entire input text. Using
211211
* Parent spans are emitted after all their children.
212212

213213
These properties make the output spans naturally isomorphic to a
214-
["green tree"](https://ericlippert.com/2012/06/08/red-green-trees/)
214+
["green tree"](#raw-syntax-tree--green-tree)
215215
in the terminology of C#'s Roslyn compiler.
216216

217217
### Tree construction
@@ -533,6 +533,8 @@ There's arguably a few downsides:
533533

534534
# Differences from the flisp parser
535535

536+
_See also the [§ Comparisons to other packages](#comparisons-to-other-packages) section._
537+
536538
Practically the flisp parser is not quite a classic [recursive descent
537539
parser](https://en.wikipedia.org/wiki/Recursive_descent_parser), because it
538540
often looks back and modifies the output tree it has already produced. We've
@@ -767,6 +769,8 @@ parsing `key=val` pairs inside parentheses.
767769

768770
### Official Julia compiler
769771

772+
_See also the [§ Differences from the flisp parser](#differences-from-the-flisp-parser) section._
773+
770774
The official Julia compiler frontend lives in the Julia source tree. It's
771775
mostly contained in just a few files:
772776
* The parser in [src/julia-parser.scm](https://github.com/JuliaLang/julia/blob/9c4b75d7f63d01d12b67aaf7ce8bb4a078825b52/src/julia-parser.scm)
@@ -793,41 +797,42 @@ structures and FFI is complex and inefficient.
793797
### JuliaParser.jl
794798

795799
[JuliaParser.jl](https://github.com/JuliaLang/JuliaParser.jl)
796-
was a direct port of Julia's flisp reference parser but was abandoned around
797-
Julia 0.5 or so. However it doesn't support lossless parsing and doing so would
798-
amount to a full rewrite. Given the divergence with the flisp reference parser
799-
since Julia-0.5, it seemed better just to start with the reference parser
800-
instead.
800+
was a direct port of Julia's flisp reference parser, but was abandoned around
801+
Julia 0.5 or so. Furthermore, it doesn't support lossless parsing, and adding
802+
that feature would amount to a full rewrite. Given its divergence with the flisp
803+
reference parser since Julia-0.5, it seemed better just to start anew from the
804+
reference parser instead.
801805

802806
### Tokenize.jl
803807

804808
[Tokenize.jl](https://github.com/JuliaLang/Tokenize.jl)
805809
is a fast lexer for Julia code. The code from Tokenize has been
806810
imported and used in JuliaSyntax, with some major modifications as discussed in
807-
the lexer implementation section.
811+
the [lexer implementation](#lexing) section.
808812

809813
### CSTParser.jl
810814

811815
[CSTParser.jl](https://github.com/julia-vscode/CSTParser.jl)
812816
is a ([mostly?](https://github.com/domluna/JuliaFormatter.jl/issues/52#issuecomment-529945126))
813-
lossless parser with goals quite similar to JuliaParser and used extensively in
814-
the VSCode / LanguageServer / JuliaFormatter ecosystem. CSTParser is very useful
815-
but I do find the implementation hard to understand and I wanted to try a fresh
816-
approach with a focus on:
817+
lossless parser with goals quite similar to JuliaParser. It is used extensively
818+
in the VSCode / LanguageServer / JuliaFormatter ecosystem. CSTParser is very
819+
useful, but I do find the implementation hard to understand, and I wanted to try
820+
a fresh approach with a focus on:
817821

818-
* "Production readyness": Good docs, tests, diagnostics and maximum similarity
822+
* "Production readiness": Good docs, tests, diagnostics and maximum similarity
819823
with the flisp parser, with the goal of getting the new parser into `Core`.
820824
* Learning from the latest ideas about composable parsing and data structures
821825
from outside Julia. In particular the implementation of `rust-analyzer` is
822-
very clean, well documented, and a great source of inspiration.
826+
very clean, well documented, and was a great source of inspiration.
823827
* Composability of tree data structures — I feel like the trees should be
824-
layered somehow with a really lightweight green tree at the most basic level,
825-
similar to Roslyn or rust-analyzer. In comparison CSTParser uses a more heavy
826-
weight non-layered data structure. Alternatively or additionally, have a
827-
common tree API with many concrete task-specific implementations.
828+
layered somehow with a really lightweight [green tree](#raw-syntax-tree--green-tree)
829+
at the most basic level, similar to Roslyn or rust-analyzer. In comparison,
830+
CSTParser uses a more heavyweight non-layered data structure. Alternatively or
831+
additionally, have a common tree API with many concrete task-specific
832+
implementations.
828833

829834
A big benefit of the JuliaSyntax parser is that it separates the parser code
830-
from the tree data structures entirely which should give a lot of flexibility
835+
from the tree data structures entirely, which should give a lot of flexibility
831836
in experimenting with various tree representations.
832837

833838
I also want JuliaSyntax to tackle macro expansion and other lowering steps, and
@@ -840,12 +845,12 @@ Using a modern production-ready parser generator like `tree-sitter` is an
840845
interesting option and some progress has already been made in
841846
[tree-sitter-julia](https://github.com/tree-sitter/tree-sitter-julia).
842847
But I feel like the grammars for parser generators are only marginally more
843-
expressive than writing the parser by hand after accounting for the effort
848+
expressive than writing the parser by hand, after accounting for the effort
844849
spent on the weird edge cases of a real language and writing the parser's tests
845850
and "supporting code".
846851

847-
On the other hand a hand-written parser is completely flexible and can be
848-
mutually understood with the reference implementation so I chose that approach
852+
On the other hand, a hand-written parser is completely flexible and can be
853+
mutually understood with the reference implementation, so I chose that approach
849854
for JuliaSyntax.
850855

851856
# Resources
@@ -1020,7 +1025,8 @@ work flows:
10201025

10211026
### Raw syntax tree / Green tree
10221027

1023-
Raw syntax tree (or "Green tree" in the terminology from Roslyn)
1028+
Raw syntax tree (or ["Green tree"](https://ericlippert.com/2012/06/08/red-green-trees/)
1029+
in the terminology from Roslyn)
10241030

10251031
We want GreenNode to be
10261032
* *structurally minimal* — For efficiency and generality

0 commit comments

Comments
 (0)