@@ -13,6 +13,7 @@ A Julia frontend, written in Julia.
13
13
* "Compilation as an API" to support all sorts of tooling
14
14
* Grow to encompass the rest of the compiler frontend: macro expansion,
15
15
desugaring and other lowering steps.
16
+ * Once mature, replace Julia's flisp-based reference frontend in ` Core `
16
17
17
18
### Design Opinions
18
19
@@ -24,6 +25,13 @@ A Julia frontend, written in Julia.
24
25
* Fancy parser generators still seem marginal for production compilers. We use
25
26
a boring but flexible recursive descent parser.
26
27
28
+ ### Status
29
+
30
+ The library is in pre-0.1 stage, but parses all of Base correctly with only a
31
+ handful of failures remaining in the Base tests and standard library.
32
+ The tree data structures should be somewhat usable but will evolve as we try
33
+ out various use cases.
34
+
27
35
# Examples
28
36
29
37
Here's what parsing of a small piece of code currently looks like in various
@@ -325,9 +333,9 @@ DSLs this is fine and good but some such allowed syntaxes don't seem very
325
333
useful, even for DSLs:
326
334
327
335
* ` macro (x) end ` is allowed but there are no anonymous macros.
328
- * ` abstract type A < B end ` and other subtypes comparisons are allowed, but
336
+ * ` abstract type A < B end ` and other subtype comparisons are allowed, but
329
337
only ` A <: B ` makes sense.
330
- * ` x where {S T} ` produces ` (where x (bracescat (row S T))) `
338
+ * ` x where {S T} ` produces ` (where x (bracescat (row S T))) ` . This seems pretty weird!
331
339
332
340
### ` kw ` and ` = ` inconsistencies
333
341
@@ -421,19 +429,80 @@ seems to be to flatten the generators:
421
429
* ` import A.. ` produces ` (import (. A .)) ` which is arguably nonsensical, as ` . `
422
430
can't be a normal identifier.
423
431
424
- * The raw string escaping rules are * super* confusing for backslashes near vs
425
- at the end of the string: ` raw"\\\\ " ` contains four backslashes, whereas
426
- ` raw"\\\\" ` contains only two. It's unclear whether anything can be done
427
- about this, however.
432
+ * The raw string escaping rules are * super* confusing for backslashes near
433
+ the end of the string: ` raw"\\\\ " ` contains four backslashes, whereas
434
+ ` raw"\\\\" ` contains only two. However this was an intentional feature to
435
+ allow all strings to be represented and it's unclear whether the situation
436
+ can be improved.
428
437
429
438
* In braces after macrocall, ` @S{a b} ` is invalid but both ` @S{a,b} ` and
430
439
` @S {a b} ` parse. Conversely, ` @S[a b] ` parses.
431
440
441
+ # Comparisons to other packages
442
+
443
+ ### JuliaParser.jl
444
+
445
+ [ JuliaParser.jl] ( https://github.com/JuliaLang/JuliaParser.jl )
446
+ was a direct port of Julia's flisp reference parser but was abandoned around
447
+ Julia 0.5 or so. However it doesn't support lossless parsing and doing so would
448
+ amount to a full rewrite. Given the divergence with the flisp reference parser
449
+ since Julia-0.5, it seemed better just to start with the reference parser
450
+ instead.
451
+
452
+ ### Tokenize.jl
453
+
454
+ [ Tokenize.jl] ( https://github.com/JuliaLang/Tokenize.jl )
455
+ is a fast lexer for Julia code. The code from Tokenize has been
456
+ imported and used in JuliaSyntax, with some major modifications as discussed in
457
+ the lexer implementation section.
458
+
459
+ ### CSTParser.jl
460
+
461
+ [ CSTParser.jl] ( https://github.com/julia-vscode/CSTParser.jl )
462
+ is a ([ mostly?] ( https://github.com/domluna/JuliaFormatter.jl/issues/52#issuecomment-529945126 ) )
463
+ lossless parser with goals quite similar to JuliaParser and used extensively in
464
+ the VSCode / LanguageServer / JuliaFormatter ecosystem. CSTParser is very useful
465
+ but I do find the implementation hard to understand and I wanted to try a fresh
466
+ approach with a focus on:
467
+
468
+ * "Production readyness": Good docs, tests, diagnostics and maximum similarity
469
+ with the flisp parser, with the goal of getting the new parser into ` Core ` .
470
+ * Learning from the latest ideas about composable parsing and data structures
471
+ from outside Julia. In particular the implementation of ` rust-analyzer ` is
472
+ very clean, well documented, and a great source of inspiration.
473
+ * Composability of tree data structures — I feel like the trees should be
474
+ layered somehow with a really lightweight green tree at the most basic level,
475
+ similar to Roslyn or rust-analyzer. In comparison CSTParser uses a more heavy
476
+ weight non-layered data structure. Alternatively or additionally, have a
477
+ common tree API with many concrete task-specific implementations.
478
+
479
+ A big benefit of the JuliaSyntax parser is that it separates the parser code
480
+ from the tree data structures entirely which should give a lot of flexibility
481
+ in experimenting with various tree representations.
482
+
483
+ I also want JuliaSyntax to tackle macro expansion and other lowering steps, and
484
+ provide APIs for this which can be used by both the core language and the
485
+ editor tooling.
486
+
487
+ ### tree-sitter-julia
488
+
489
+ Using a modern production-ready parser generator like ` tree-sitter ` is an
490
+ interesting option and some progress has already been made in
491
+ [ tree-sitter-julia] ( https://github.com/tree-sitter/tree-sitter-julia ) .
492
+ But I feel like the grammars for parser generators are only marginally more
493
+ expressive than writing the parser by hand after accounting for the effort
494
+ spent on the weird edge cases of a real language and writing the parser's tests
495
+ and "supporting code".
496
+
497
+ On the other hand a hand-written parser completely flexible and can be mutually
498
+ understood with the reference implementation so I chose that approach for
499
+ JuliaSyntax.
500
+
432
501
# Resources
433
502
434
503
## Julia issues
435
504
436
- Here's a few links to relevant Julia issues. No doubt there's many more.
505
+ Here's a few links to relevant Julia issues.
437
506
438
507
#### Macro expansion
439
508
@@ -760,12 +829,16 @@ f(a,
760
829
761
830
# Fun research questions
762
831
763
- * Given source and syntax tree, can we regress/learn a generative model of
764
- indentation from the syntax tree? Source formatting involves a big pile of
765
- heuristics to get something which "looks nice"... and ML systems have become
766
- very good at heuristics. Also, we've got huge piles of training data — just
767
- choose some high quality, tastefully hand-formatted libraries.
832
+ ### Formatting
833
+
834
+ Given source and syntax tree, can we regress/learn a generative model of
835
+ indentation from the syntax tree? Source formatting involves a big pile of
836
+ heuristics to get something which "looks nice"... and ML systems have become
837
+ very good at heuristics. Also, we've got huge piles of training data — just
838
+ choose some high quality, tastefully hand-formatted libraries.
839
+
840
+ ### Parser Recovery
768
841
769
- * Similarly, can we learn fast and reasonably accurate recovery heuristics for
770
- when the parser encounters broken syntax rather than hand-coding these? How
771
- do we set the parser up so that training works and inference is nonintrusive?
842
+ Similarly, can we learn fast and reasonably accurate recovery heuristics for
843
+ when the parser encounters broken syntax rather than hand-coding these? How
844
+ do we set the parser up so that training works and inference is nonintrusive?
0 commit comments