Skip to content

Commit b1d1201

Browse files
committed
README: Add comparisons to other packages
1 parent e87e8a4 commit b1d1201

File tree

1 file changed

+88
-15
lines changed

1 file changed

+88
-15
lines changed

README.md

Lines changed: 88 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ A Julia frontend, written in Julia.
1313
* "Compilation as an API" to support all sorts of tooling
1414
* Grow to encompass the rest of the compiler frontend: macro expansion,
1515
desugaring and other lowering steps.
16+
* Once mature, replace Julia's flisp-based reference frontend in `Core`
1617

1718
### Design Opinions
1819

@@ -24,6 +25,13 @@ A Julia frontend, written in Julia.
2425
* Fancy parser generators still seem marginal for production compilers. We use
2526
a boring but flexible recursive descent parser.
2627

28+
### Status
29+
30+
The library is in pre-0.1 stage, but parses all of Base correctly with only a
31+
handful of failures remaining in the Base tests and standard library.
32+
The tree data structures should be somewhat usable but will evolve as we try
33+
out various use cases.
34+
2735
# Examples
2836

2937
Here's what parsing of a small piece of code currently looks like in various
@@ -325,9 +333,9 @@ DSLs this is fine and good but some such allowed syntaxes don't seem very
325333
useful, even for DSLs:
326334

327335
* `macro (x) end` is allowed but there are no anonymous macros.
328-
* `abstract type A < B end` and other subtypes comparisons are allowed, but
336+
* `abstract type A < B end` and other subtype comparisons are allowed, but
329337
only `A <: B` makes sense.
330-
* `x where {S T}` produces `(where x (bracescat (row S T)))`
338+
* `x where {S T}` produces `(where x (bracescat (row S T)))`. This seems pretty weird!
331339

332340
### `kw` and `=` inconsistencies
333341

@@ -421,19 +429,80 @@ seems to be to flatten the generators:
421429
* `import A..` produces `(import (. A .))` which is arguably nonsensical, as `.`
422430
can't be a normal identifier.
423431

424-
* The raw string escaping rules are *super* confusing for backslashes near vs
425-
at the end of the string: `raw"\\\\ "` contains four backslashes, whereas
426-
`raw"\\\\"` contains only two. It's unclear whether anything can be done
427-
about this, however.
432+
* The raw string escaping rules are *super* confusing for backslashes near
433+
the end of the string: `raw"\\\\ "` contains four backslashes, whereas
434+
`raw"\\\\"` contains only two. However this was an intentional feature to
435+
allow all strings to be represented and it's unclear whether the situation
436+
can be improved.
428437

429438
* In braces after macrocall, `@S{a b}` is invalid but both `@S{a,b}` and
430439
`@S {a b}` parse. Conversely, `@S[a b]` parses.
431440

441+
# Comparisons to other packages
442+
443+
### JuliaParser.jl
444+
445+
[JuliaParser.jl](https://github.com/JuliaLang/JuliaParser.jl)
446+
was a direct port of Julia's flisp reference parser but was abandoned around
447+
Julia 0.5 or so. However it doesn't support lossless parsing and doing so would
448+
amount to a full rewrite. Given the divergence with the flisp reference parser
449+
since Julia-0.5, it seemed better just to start with the reference parser
450+
instead.
451+
452+
### Tokenize.jl
453+
454+
[Tokenize.jl](https://github.com/JuliaLang/Tokenize.jl)
455+
is a fast lexer for Julia code. The code from Tokenize has been
456+
imported and used in JuliaSyntax, with some major modifications as discussed in
457+
the lexer implementation section.
458+
459+
### CSTParser.jl
460+
461+
[CSTParser.jl](https://github.com/julia-vscode/CSTParser.jl)
462+
is a ([mostly?](https://github.com/domluna/JuliaFormatter.jl/issues/52#issuecomment-529945126))
463+
lossless parser with goals quite similar to JuliaParser and used extensively in
464+
the VSCode / LanguageServer / JuliaFormatter ecosystem. CSTParser is very useful
465+
but I do find the implementation hard to understand and I wanted to try a fresh
466+
approach with a focus on:
467+
468+
* "Production readyness": Good docs, tests, diagnostics and maximum similarity
469+
with the flisp parser, with the goal of getting the new parser into `Core`.
470+
* Learning from the latest ideas about composable parsing and data structures
471+
from outside Julia. In particular the implementation of `rust-analyzer` is
472+
very clean, well documented, and a great source of inspiration.
473+
* Composability of tree data structures — I feel like the trees should be
474+
layered somehow with a really lightweight green tree at the most basic level,
475+
similar to Roslyn or rust-analyzer. In comparison CSTParser uses a more heavy
476+
weight non-layered data structure. Alternatively or additionally, have a
477+
common tree API with many concrete task-specific implementations.
478+
479+
A big benefit of the JuliaSyntax parser is that it separates the parser code
480+
from the tree data structures entirely which should give a lot of flexibility
481+
in experimenting with various tree representations.
482+
483+
I also want JuliaSyntax to tackle macro expansion and other lowering steps, and
484+
provide APIs for this which can be used by both the core language and the
485+
editor tooling.
486+
487+
### tree-sitter-julia
488+
489+
Using a modern production-ready parser generator like `tree-sitter` is an
490+
interesting option and some progress has already been made in
491+
[tree-sitter-julia](https://github.com/tree-sitter/tree-sitter-julia).
492+
But I feel like the grammars for parser generators are only marginally more
493+
expressive than writing the parser by hand after accounting for the effort
494+
spent on the weird edge cases of a real language and writing the parser's tests
495+
and "supporting code".
496+
497+
On the other hand a hand-written parser completely flexible and can be mutually
498+
understood with the reference implementation so I chose that approach for
499+
JuliaSyntax.
500+
432501
# Resources
433502

434503
## Julia issues
435504

436-
Here's a few links to relevant Julia issues. No doubt there's many more.
505+
Here's a few links to relevant Julia issues.
437506

438507
#### Macro expansion
439508

@@ -760,12 +829,16 @@ f(a,
760829
761830
# Fun research questions
762831
763-
* Given source and syntax tree, can we regress/learn a generative model of
764-
indentation from the syntax tree? Source formatting involves a big pile of
765-
heuristics to get something which "looks nice"... and ML systems have become
766-
very good at heuristics. Also, we've got huge piles of training data — just
767-
choose some high quality, tastefully hand-formatted libraries.
832+
### Formatting
833+
834+
Given source and syntax tree, can we regress/learn a generative model of
835+
indentation from the syntax tree? Source formatting involves a big pile of
836+
heuristics to get something which "looks nice"... and ML systems have become
837+
very good at heuristics. Also, we've got huge piles of training data — just
838+
choose some high quality, tastefully hand-formatted libraries.
839+
840+
### Parser Recovery
768841
769-
* Similarly, can we learn fast and reasonably accurate recovery heuristics for
770-
when the parser encounters broken syntax rather than hand-coding these? How
771-
do we set the parser up so that training works and inference is nonintrusive?
842+
Similarly, can we learn fast and reasonably accurate recovery heuristics for
843+
when the parser encounters broken syntax rather than hand-coding these? How
844+
do we set the parser up so that training works and inference is nonintrusive?

0 commit comments

Comments
 (0)