Skip to content

Commit ff99b65

Browse files
committed
Minor README tweaks
1 parent af28f0c commit ff99b65

File tree

1 file changed

+28
-23
lines changed

1 file changed

+28
-23
lines changed

README.md

Lines changed: 28 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -519,9 +519,9 @@ expressive than writing the parser by hand after accounting for the effort
519519
spent on the weird edge cases of a real language and writing the parser's tests
520520
and "supporting code".
521521

522-
On the other hand a hand-written parser completely flexible and can be mutually
523-
understood with the reference implementation so I chose that approach for
524-
JuliaSyntax.
522+
On the other hand a hand-written parser is completely flexible and can be
523+
mutually understood with the reference implementation so I chose that approach
524+
for JuliaSyntax.
525525

526526
# Resources
527527

@@ -673,6 +673,7 @@ work flows:
673673
* Syntax transformations
674674
- Choose some macros to implement. This is a basic test of mixing source
675675
trees from different files while preserving precise source locations.
676+
(Done in <test/syntax_interpolation.jl>.)
676677
* Formatting
677678
- Re-indent a file. This tests the handling of syntax trivia.
678679
* Refactoring
@@ -704,7 +705,6 @@ The simplest idea possible is to have:
704705
* Leaf nodes are a single token
705706
* Children are in source order
706707

707-
708708
Call represents a challenge for the AST vs Green tree in terms of node
709709
placement / iteration for infix operators vs normal prefix function calls.
710710

@@ -730,10 +730,10 @@ There seems to be a few ways forward:
730730
* We can use the existing `Expr` during macro expansion and try to recover
731731
source information after macro expansion using heuristics. Likely the
732732
presence of correct hygiene can help with this.
733-
* Introducing a new AST would be possible if it were opt-in for new-style
734-
macros only. Fixing hygiene should go along with this. Design challenge: How
735-
do we make manipulating expressions reasonable when literals need to carry
736-
source location?
733+
* Introducing a new AST would be possible if it were opt-in for some
734+
hypothetical "new-style macros" only. Fixing hygiene should go along with
735+
this. Design challenge: How do we make manipulating expressions reasonable
736+
when literals need to carry source location?
737737

738738
One option which may help bridge between locationless ASTs and something new
739739
may be to have wrappers for the small number of literal types we need to cover.
@@ -762,11 +762,11 @@ Some disorganized musings about error recovery
762762
Different types of errors seem to occur...
763763

764764
* Disallowed syntax (such as lack of spaces in conditional expressions)
765-
where we can reasonably just continue parsing the production and emit the
766-
node with an error flag which is otherwise fully formed. In some cases like
767-
parsing infix expressions with a missing tail, emitting a zero width error
768-
token can lead to a fully formed parse tree without the productions up the
769-
stack needing to participate in recovery.
765+
where we can reasonably just continue parsing and emit the node with an error
766+
flag which is otherwise fully formed. In some cases like parsing infix
767+
expressions with a missing tail, emitting a zero width error token can lead
768+
to a fully formed parse tree without the productions up the stack needing to
769+
participate in recovery.
770770
* A token which is disallowed in current context. Eg, `=` in parse_atom, or a
771771
closing token inside an infix expression. Here we can emit a `K"error"`, but
772772
we can't descend further into the parse tree; we must pop several recursive
@@ -820,9 +820,6 @@ example:
820820
- Restart parsing
821821
- Somehow make sure all of this can't result in infinite recursion 😅
822822

823-
For this kind of recovery it sure would be good if we could reify the program
824-
stack into a parser state object...
825-
826823
Missing commas or closing brackets in nested structures also present the
827824
existing parser with a problem.
828825

@@ -848,12 +845,25 @@ But not always!
848845
```julia
849846
f(a,
850847
g(b,
851-
c # -- missing closing `)` ?
852-
d)
848+
c # -- missing closing `,` ?
849+
d))
853850
```
854851
852+
Another particularly difficult problem for diagnostics in the current system is
853+
broken parentheses or double quotes in string interpolations, especially when
854+
nested.
855+
855856
# Fun research questions
856857
858+
### Parser Recovery
859+
860+
Can we learn fast and reasonably accurate recovery heuristics for when the
861+
parser encounters broken syntax, rather than hand-coding these? How would we
862+
set the parser up so that training works and injecting the model is
863+
nonintrusive? If the model is embedded in and works together with the parser,
864+
can it be made compact enough that training is fast and the model itself is
865+
tiny?
866+
857867
### Formatting
858868
859869
Given source and syntax tree, can we regress/learn a generative model of
@@ -862,8 +872,3 @@ heuristics to get something which "looks nice"... and ML systems have become
862872
very good at heuristics. Also, we've got huge piles of training data — just
863873
choose some high quality, tastefully hand-formatted libraries.
864874
865-
### Parser Recovery
866-
867-
Similarly, can we learn fast and reasonably accurate recovery heuristics for
868-
when the parser encounters broken syntax rather than hand-coding these? How
869-
do we set the parser up so that training works and inference is nonintrusive?

0 commit comments

Comments
 (0)