@@ -519,9 +519,9 @@ expressive than writing the parser by hand after accounting for the effort
519
519
spent on the weird edge cases of a real language and writing the parser's tests
520
520
and "supporting code".
521
521
522
- On the other hand a hand-written parser completely flexible and can be mutually
523
- understood with the reference implementation so I chose that approach for
524
- JuliaSyntax.
522
+ On the other hand a hand-written parser is completely flexible and can be
523
+ mutually understood with the reference implementation so I chose that approach
524
+ for JuliaSyntax.
525
525
526
526
# Resources
527
527
@@ -673,6 +673,7 @@ work flows:
673
673
* Syntax transformations
674
674
- Choose some macros to implement. This is a basic test of mixing source
675
675
trees from different files while preserving precise source locations.
676
+ (Done in <test/syntax_interpolation.jl>.)
676
677
* Formatting
677
678
- Re-indent a file. This tests the handling of syntax trivia.
678
679
* Refactoring
@@ -704,7 +705,6 @@ The simplest idea possible is to have:
704
705
* Leaf nodes are a single token
705
706
* Children are in source order
706
707
707
-
708
708
Call represents a challenge for the AST vs Green tree in terms of node
709
709
placement / iteration for infix operators vs normal prefix function calls.
710
710
@@ -730,10 +730,10 @@ There seems to be a few ways forward:
730
730
* We can use the existing ` Expr ` during macro expansion and try to recover
731
731
source information after macro expansion using heuristics. Likely the
732
732
presence of correct hygiene can help with this.
733
- * Introducing a new AST would be possible if it were opt-in for new-style
734
- macros only. Fixing hygiene should go along with this. Design challenge: How
735
- do we make manipulating expressions reasonable when literals need to carry
736
- source location?
733
+ * Introducing a new AST would be possible if it were opt-in for some
734
+ hypothetical "new-style macros" only. Fixing hygiene should go along with
735
+ this. Design challenge: How do we make manipulating expressions reasonable
736
+ when literals need to carry source location?
737
737
738
738
One option which may help bridge between locationless ASTs and something new
739
739
may be to have wrappers for the small number of literal types we need to cover.
@@ -762,11 +762,11 @@ Some disorganized musings about error recovery
762
762
Different types of errors seem to occur...
763
763
764
764
* Disallowed syntax (such as lack of spaces in conditional expressions)
765
- where we can reasonably just continue parsing the production and emit the
766
- node with an error flag which is otherwise fully formed. In some cases like
767
- parsing infix expressions with a missing tail, emitting a zero width error
768
- token can lead to a fully formed parse tree without the productions up the
769
- stack needing to participate in recovery.
765
+ where we can reasonably just continue parsing and emit the node with an error
766
+ flag which is otherwise fully formed. In some cases like parsing infix
767
+ expressions with a missing tail, emitting a zero width error token can lead
768
+ to a fully formed parse tree without the productions up the stack needing to
769
+ participate in recovery.
770
770
* A token which is disallowed in current context. Eg, ` = ` in parse_atom, or a
771
771
closing token inside an infix expression. Here we can emit a ` K"error" ` , but
772
772
we can't descend further into the parse tree; we must pop several recursive
@@ -820,9 +820,6 @@ example:
820
820
- Restart parsing
821
821
- Somehow make sure all of this can't result in infinite recursion 😅
822
822
823
- For this kind of recovery it sure would be good if we could reify the program
824
- stack into a parser state object...
825
-
826
823
Missing commas or closing brackets in nested structures also present the
827
824
existing parser with a problem.
828
825
@@ -848,12 +845,25 @@ But not always!
848
845
``` julia
849
846
f (a,
850
847
g (b,
851
- c # -- missing closing `) ` ?
852
- d)
848
+ c # -- missing closing `, ` ?
849
+ d))
853
850
```
854
851
852
+ Another particularly difficult problem for diagnostics in the current system is
853
+ broken parentheses or double quotes in string interpolations, especially when
854
+ nested.
855
+
855
856
# Fun research questions
856
857
858
+ ### Parser Recovery
859
+
860
+ Can we learn fast and reasonably accurate recovery heuristics for when the
861
+ parser encounters broken syntax, rather than hand-coding these? How would we
862
+ set the parser up so that training works and injecting the model is
863
+ nonintrusive? If the model is embedded in and works together with the parser,
864
+ can it be made compact enough that training is fast and the model itself is
865
+ tiny?
866
+
857
867
### Formatting
858
868
859
869
Given source and syntax tree, can we regress/learn a generative model of
@@ -862,8 +872,3 @@ heuristics to get something which "looks nice"... and ML systems have become
862
872
very good at heuristics. Also, we've got huge piles of training data — just
863
873
choose some high quality, tastefully hand-formatted libraries.
864
874
865
- ### Parser Recovery
866
-
867
- Similarly, can we learn fast and reasonably accurate recovery heuristics for
868
- when the parser encounters broken syntax rather than hand-coding these? How
869
- do we set the parser up so that training works and inference is nonintrusive?
0 commit comments