Skip to content

Commit 820b646

Browse files
authored
AST: Use a single kind K"op=" for updating assignments (#530)
Make all updating assignment operators like `+=` be represented with a single `K"op="` head, with the operator itself in infix position. For example, `x += 1` is now parsed as [op=] x :: Identifier + :: Identifier y :: Identifier This greatly reduces the number of distinct forms here from a rather big list (`$=` `%=` `&=` `*=` `+=` `-=` `//=` `/=` `<<=` `>>=` `>>>=` `\=` `^=` `|=` `÷=` `⊻=`) and makes the operator itself appear in the AST as kind `K"Identifier"`, as it should. It also makes it possible to add further unicode updating operators while keeping the AST stable. The need for this was highlighted when working on JuliaLowering. When using `K"+="` as a head, one needs to look up the appropriate operator from the list of updating operators or use string munging on the Kind itself. This is quite awkward especially as it needs special rules for inferring the macro scope of the `+` identifier. In addition, having a single head for this form means update operator semantics only need to be dealt with in one place.
1 parent 55e3d69 commit 820b646

File tree

9 files changed

+105
-67
lines changed

9 files changed

+105
-67
lines changed

docs/src/reference.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ class of tokenization errors and lets the parser deal with them.
8080
* We use flags rather than child nodes to represent the difference between `struct` and `mutable struct`, `module` and `baremodule` (#220)
8181
* Iterations are represented with the `iteration` and `in` heads rather than `=` within the header of a `for`. Thus `for i=is ; body end` parses to `(for (iteration (in i is)) (block body))`. Cartesian iteration as in `for a=as, b=bs body end` are represented with a nested `(iteration (in a as) (in b bs))` rather than a `block` containing `=` because these lists of iterators are neither semantically nor syntactically a sequence of statements, unlike other uses of `block`. Generators also use the `iteration` head - see information on that below.
8282
* Short form functions like `f(x) = x + 1` are represented with the `function` head rather than the `=` head. In this case the `SHORT_FORM_FUNCTION_FLAG` flag is set to allow the surface syntactic form to be easily distinguished from long form functions.
83+
* All kinds of updating assignment operators like `+=` are represented with a single `K"op="` head, with the operator itself in infix position. For example, `x += 1` is `(op= x + 1)`, where the plus token is of kind `K"Identifer"`. This greatly reduces the number of distinct forms here from a rather big list (`$=` `%=` `&=` `*=` `+=` `-=` `//=` `/=` `<<=` `>>=` `>>>=` `\=` `^=` `|=` `÷=` `⊻=`) and makes the operator itself appear in the AST as kind `K"Identifier"`, as it should. It also makes it possible to add further unicode updating operators while keeping the AST stable.
8384

8485
## More detail on tree differences
8586

src/expr.jl

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -232,6 +232,16 @@ function _internal_node_to_Expr(source, srcrange, head, childranges, childheads,
232232

233233
if k == K"?"
234234
headsym = :if
235+
elseif k == K"op=" && length(args) == 3
236+
lhs = args[1]
237+
op = args[2]
238+
rhs = args[3]
239+
headstr = string(args[2], '=')
240+
if is_dotted(head)
241+
headstr = '.'*headstr
242+
end
243+
headsym = Symbol(headstr)
244+
args = Any[lhs, rhs]
235245
elseif k == K"macrocall"
236246
if length(args) >= 2
237247
a2 = args[2]

src/kinds.jl

Lines changed: 1 addition & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -293,23 +293,8 @@ register_kinds!(JuliaSyntax, 0, [
293293
"BEGIN_ASSIGNMENTS"
294294
"BEGIN_SYNTACTIC_ASSIGNMENTS"
295295
"="
296-
"+="
297-
"-=" # Also used for "−="
298-
"*="
299-
"/="
300-
"//="
301-
"|="
302-
"^="
303-
"÷="
304-
"%="
305-
"<<="
306-
">>="
307-
">>>="
308-
"\\="
309-
"&="
296+
"op=" # Updating assignment operator ( $= %= &= *= += -= //= /= <<= >>= >>>= \= ^= |= ÷= ⊻= )
310297
":="
311-
"\$="
312-
"⊻="
313298
"END_SYNTACTIC_ASSIGNMENTS"
314299
"~"
315300
"≔"

src/parse_stream.jl

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -871,8 +871,9 @@ end
871871
Bump the next token, splitting it into several pieces
872872

873873
Tokens are defined by a number of `token_spec` of shape `(nbyte, kind, flags)`.
874-
The number of input bytes of the last spec is taken from the remaining bytes of
875-
the input token, with the associated `nbyte` ignored.
874+
If all `nbyte` are positive, the sum must equal the token length. If one
875+
`nbyte` is negative, that token is given `tok_len + nbyte` bytes and the sum of
876+
all `nbyte` must equal zero.
876877

877878
This is a hack which helps resolves the occasional lexing ambiguity. For
878879
example
@@ -887,12 +888,14 @@ function bump_split(stream::ParseStream, split_spec::Vararg{Any, N}) where {N}
887888
tok = stream.lookahead[stream.lookahead_index]
888889
stream.lookahead_index += 1
889890
b = _next_byte(stream)
891+
toklen = tok.next_byte - b
890892
for (i, (nbyte, k, f)) in enumerate(split_spec)
891893
h = SyntaxHead(k, f)
892-
b = (i == length(split_spec)) ? tok.next_byte : b + nbyte
894+
b += nbyte < 0 ? (toklen + nbyte) : nbyte
893895
orig_k = k == K"." ? K"." : kind(tok)
894896
push!(stream.tokens, SyntaxToken(h, orig_k, false, b))
895897
end
898+
@assert tok.next_byte == b
896899
stream.peek_count = 0
897900
return position(stream)
898901
end

src/parser.jl

Lines changed: 28 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -340,7 +340,7 @@ function bump_dotsplit(ps, flags=EMPTY_FLAGS;
340340
bump_trivia(ps)
341341
mark = position(ps)
342342
k = remap_kind != K"None" ? remap_kind : kind(t)
343-
pos = bump_split(ps, (1, K".", TRIVIA_FLAG), (0, k, flags))
343+
pos = bump_split(ps, (1, K".", TRIVIA_FLAG), (-1, k, flags))
344344
if emit_dot_node
345345
pos = emit(ps, mark, K".")
346346
end
@@ -626,7 +626,22 @@ function parse_assignment_with_initial_ex(ps::ParseState, mark, down::T) where {
626626
# a += b ==> (+= a b)
627627
# a .= b ==> (.= a b)
628628
is_short_form_func = k == K"=" && !is_dotted(t) && was_eventually_call(ps)
629-
bump(ps, TRIVIA_FLAG)
629+
if k == K"op="
630+
# x += y ==> (op= x + y)
631+
# x .+= y ==> (.op= x + y)
632+
bump_trivia(ps)
633+
if is_dotted(t)
634+
bump_split(ps, (1, K".", TRIVIA_FLAG),
635+
(-2, K"Identifier", EMPTY_FLAGS), # op
636+
(1, K"=", TRIVIA_FLAG))
637+
else
638+
bump_split(ps,
639+
(-1, K"Identifier", EMPTY_FLAGS), # op
640+
(1, K"=", TRIVIA_FLAG))
641+
end
642+
else
643+
bump(ps, TRIVIA_FLAG)
644+
end
630645
bump_trivia(ps)
631646
# Syntax Edition TODO: We'd like to call `down` here when
632647
# is_short_form_func is true, to prevent `f() = 1 = 2` from parsing.
@@ -1843,7 +1858,7 @@ function parse_resword(ps::ParseState)
18431858
# let x::1 ; end ==> (let (block (::-i x 1)) (block))
18441859
# let x ; end ==> (let (block x) (block))
18451860
# let x=1,y=2 ; end ==> (let (block (= x 1) (= y 2) (block)))
1846-
# let x+=1 ; end ==> (let (block (+= x 1)) (block))
1861+
# let x+=1 ; end ==> (let (block (op= x + 1)) (block))
18471862
parse_comma_separated(ps, parse_eq_star)
18481863
end
18491864
emit(ps, m, K"block")
@@ -2571,7 +2586,7 @@ function parse_import_path(ps::ParseState)
25712586
# Modules with operator symbol names
25722587
# import .⋆ ==> (import (importpath . ⋆))
25732588
bump_trivia(ps)
2574-
bump_split(ps, (1,K".",EMPTY_FLAGS), (1,peek(ps),EMPTY_FLAGS))
2589+
bump_split(ps, (1,K".",EMPTY_FLAGS), (-1,peek(ps),EMPTY_FLAGS))
25752590
else
25762591
# import @x ==> (import (importpath @x))
25772592
# import $A ==> (import (importpath ($ A)))
@@ -2599,7 +2614,12 @@ function parse_import_path(ps::ParseState)
25992614
warning="space between dots in import path")
26002615
end
26012616
bump_trivia(ps)
2602-
bump_split(ps, (1,K".",TRIVIA_FLAG), (1,k,EMPTY_FLAGS))
2617+
m = position(ps)
2618+
bump_split(ps, (1,K".",TRIVIA_FLAG), (-1,k,EMPTY_FLAGS))
2619+
if is_syntactic_operator(k)
2620+
# import A.= ==> (import (importpath A (error =)))
2621+
emit(ps, m, K"error", error="syntactic operators not allowed in import")
2622+
end
26032623
elseif k == K"..."
26042624
# Import the .. operator
26052625
# import A... ==> (import (importpath A ..))
@@ -3550,13 +3570,13 @@ function parse_atom(ps::ParseState, check_identifiers=true)
35503570
bump_dotsplit(ps, emit_dot_node=true, remap_kind=
35513571
is_syntactic_operator(leading_kind) ? leading_kind : K"Identifier")
35523572
if check_identifiers && !is_valid_identifier(leading_kind)
3553-
# += ==> (error +=)
3573+
# += ==> (error (op= +))
35543574
# ? ==> (error ?)
3555-
# .+= ==> (error (. +=))
3575+
# .+= ==> (error (. (op= +)))
35563576
emit(ps, mark, K"error", error="invalid identifier")
35573577
else
35583578
# Quoted syntactic operators allowed
3559-
# :+= ==> (quote-: +=)
3579+
# :+= ==> (quote-: (op= +))
35603580
end
35613581
elseif is_keyword(leading_kind)
35623582
if leading_kind == K"var" && (t = peek_token(ps,2);

src/tokenize.jl

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,7 @@ function _nondot_symbolic_operator_kinds()
9393
K"isa"
9494
K"in"
9595
K".'"
96+
K"op="
9697
])
9798
end
9899

@@ -527,14 +528,14 @@ function _next_token(l::Lexer, c)
527528
elseif c == '-'
528529
return lex_minus(l);
529530
elseif c == '−' # \minus '−' treated as hyphen '-'
530-
return emit(l, accept(l, '=') ? K"-=" : K"-")
531+
return emit(l, accept(l, '=') ? K"op=" : K"-")
531532
elseif c == '`'
532533
return lex_backtick(l);
533534
elseif is_identifier_start_char(c)
534535
return lex_identifier(l, c)
535536
elseif isdigit(c)
536537
return lex_digit(l, K"Integer")
537-
elseif (k = get(_unicode_ops, c, K"error")) != K"error"
538+
elseif (k = get(_unicode_ops, c, K"None")) != K"None"
538539
return emit(l, k)
539540
else
540541
emit(l,
@@ -797,12 +798,12 @@ function lex_greater(l::Lexer)
797798
if accept(l, '>')
798799
if accept(l, '>')
799800
if accept(l, '=')
800-
return emit(l, K">>>=")
801+
return emit(l, K"op=")
801802
else # >>>?, ? not a =
802803
return emit(l, K">>>")
803804
end
804805
elseif accept(l, '=')
805-
return emit(l, K">>=")
806+
return emit(l, K"op=")
806807
else
807808
return emit(l, K">>")
808809
end
@@ -819,7 +820,7 @@ end
819820
function lex_less(l::Lexer)
820821
if accept(l, '<')
821822
if accept(l, '=')
822-
return emit(l, K"<<=")
823+
return emit(l, K"op=")
823824
else # '<<?', ? not =, ' '
824825
return emit(l, K"<<")
825826
end
@@ -888,15 +889,15 @@ end
888889

889890
function lex_percent(l::Lexer)
890891
if accept(l, '=')
891-
return emit(l, K"%=")
892+
return emit(l, K"op=")
892893
else
893894
return emit(l, K"%")
894895
end
895896
end
896897

897898
function lex_bar(l::Lexer)
898899
if accept(l, '=')
899-
return emit(l, K"|=")
900+
return emit(l, K"op=")
900901
elseif accept(l, '>')
901902
return emit(l, K"|>")
902903
elseif accept(l, '|')
@@ -910,7 +911,7 @@ function lex_plus(l::Lexer)
910911
if accept(l, '+')
911912
return emit(l, K"++")
912913
elseif accept(l, '=')
913-
return emit(l, K"+=")
914+
return emit(l, K"op=")
914915
end
915916
return emit(l, K"+")
916917
end
@@ -925,7 +926,7 @@ function lex_minus(l::Lexer)
925926
elseif !l.dotop && accept(l, '>')
926927
return emit(l, K"->")
927928
elseif accept(l, '=')
928-
return emit(l, K"-=")
929+
return emit(l, K"op=")
929930
end
930931
return emit(l, K"-")
931932
end
@@ -934,35 +935,35 @@ function lex_star(l::Lexer)
934935
if accept(l, '*')
935936
return emit(l, K"Error**") # "**" is an invalid operator use ^
936937
elseif accept(l, '=')
937-
return emit(l, K"*=")
938+
return emit(l, K"op=")
938939
end
939940
return emit(l, K"*")
940941
end
941942

942943
function lex_circumflex(l::Lexer)
943944
if accept(l, '=')
944-
return emit(l, K"^=")
945+
return emit(l, K"op=")
945946
end
946947
return emit(l, K"^")
947948
end
948949

949950
function lex_division(l::Lexer)
950951
if accept(l, '=')
951-
return emit(l, K"÷=")
952+
return emit(l, K"op=")
952953
end
953954
return emit(l, K"÷")
954955
end
955956

956957
function lex_dollar(l::Lexer)
957958
if accept(l, '=')
958-
return emit(l, K"$=")
959+
return emit(l, K"op=")
959960
end
960961
return emit(l, K"$")
961962
end
962963

963964
function lex_xor(l::Lexer)
964965
if accept(l, '=')
965-
return emit(l, K"=")
966+
return emit(l, K"op=")
966967
end
967968
return emit(l, K"⊻")
968969
end
@@ -1110,7 +1111,7 @@ function lex_amper(l::Lexer)
11101111
if accept(l, '&')
11111112
return emit(l, K"&&")
11121113
elseif accept(l, '=')
1113-
return emit(l, K"&=")
1114+
return emit(l, K"op=")
11141115
else
11151116
return emit(l, K"&")
11161117
end
@@ -1148,20 +1149,20 @@ end
11481149
function lex_forwardslash(l::Lexer)
11491150
if accept(l, '/')
11501151
if accept(l, '=')
1151-
return emit(l, K"//=")
1152+
return emit(l, K"op=")
11521153
else
11531154
return emit(l, K"//")
11541155
end
11551156
elseif accept(l, '=')
1156-
return emit(l, K"/=")
1157+
return emit(l, K"op=")
11571158
else
11581159
return emit(l, K"/")
11591160
end
11601161
end
11611162

11621163
function lex_backslash(l::Lexer)
11631164
if accept(l, '=')
1164-
return emit(l, K"\=")
1165+
return emit(l, K"op=")
11651166
end
11661167
return emit(l, K"\\")
11671168
end
@@ -1193,7 +1194,7 @@ function lex_dot(l::Lexer)
11931194
elseif pc == '−'
11941195
l.dotop = true
11951196
readchar(l)
1196-
return emit(l, accept(l, '=') ? K"-=" : K"-")
1197+
return emit(l, accept(l, '=') ? K"op=" : K"-")
11971198
elseif pc =='*'
11981199
l.dotop = true
11991200
readchar(l)
@@ -1222,7 +1223,7 @@ function lex_dot(l::Lexer)
12221223
l.dotop = true
12231224
readchar(l)
12241225
if accept(l, '=')
1225-
return emit(l, K"&=")
1226+
return emit(l, K"op=")
12261227
else
12271228
if accept(l, '&')
12281229
return emit(l, K"&&")

test/expr.jl

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -501,6 +501,16 @@
501501
@test parsestmt("./x", ignore_errors=true) == Expr(:call, Expr(:error, Expr(:., :/)), :x)
502502
end
503503

504+
@testset "syntactic update-assignment operators" begin
505+
@test parsestmt("x += y") == Expr(:(+=), :x, :y)
506+
@test parsestmt("x .+= y") == Expr(:(.+=), :x, :y)
507+
@test parsestmt(":+=") == QuoteNode(Symbol("+="))
508+
@test parsestmt(":(+=)") == QuoteNode(Symbol("+="))
509+
@test parsestmt(":.+=") == QuoteNode(Symbol(".+="))
510+
@test parsestmt(":(.+=)") == QuoteNode(Symbol(".+="))
511+
@test parsestmt("x \u2212= y") == Expr(:(-=), :x, :y)
512+
end
513+
504514
@testset "let" begin
505515
@test parsestmt("let x=1\n end") ==
506516
Expr(:let, Expr(:(=), :x, 1), Expr(:block, LineNumberNode(2)))

0 commit comments

Comments
 (0)