Skip to content

Commit 8098cb0

Browse files
authored
Fix combined cartesian / flattened generators (#263)
The combined cartesian/flattened generator syntax is a very unusual use case found in only one or two packages in the general registry. It looks like this ((a,b,c,d) for a=as, b=bs for c=cs, d=ds) Practically all such generators end up as a flat list (though ... should they?) but the ordering of tuples (a,b,c,d) in the list depends on which parts are cartesian (separated by commas) and which parts are flattened (separated by for's). Here we fix the `SyntaxNode` representation of these, preferring a single `generator` head rather than combined `flatten` and `generator`, and instead grouping the sets of cartesian iterations into `cartesian_iterator` groups. This keeps the tree structure flat and closer to the source text while also faithfully representing these unusual forms. For example, the above parses as (generator (tuple a b c d) (cartesian_iterator (= a as) (= b bs)) (cartesian_iterator (= c cs) (= d ds))) In addition, we also make use of the `cartesian_iterator` head in `for` loops, replacing the `block` which was used previously (but is not semantically or syntactically a block!). Thus `for i=is, j=js body end` now parses as (for (cartesian_iterator (= i is) (= j js)) body) This also improves Expr conversion, removing some special cases which were necessary when processing `block's` in that situation.
1 parent 4d9c50a commit 8098cb0

File tree

6 files changed

+191
-118
lines changed

6 files changed

+191
-118
lines changed

README.md

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -375,7 +375,7 @@ The children of our trees are strictly in source order. This has many
375375
consequences in places where `Expr` reorders child expressions.
376376

377377
* Infix and postfix operator calls have the operator name in the *second* child position. `a + b` is parsed as `(call-i a + b)` - where the infix `-i` flag indicates infix child position - rather than `Expr(:call, :+, :a, :b)`.
378-
* Flattened generators are represented in source order
378+
* Generators are represented in source order as a single node rather than multiple nested flatten and generator expressions.
379379

380380
### No `LineNumberNode`s
381381

@@ -435,15 +435,16 @@ class of tokenization errors and lets the parser deal with them.
435435
* Using `try catch else finally end` is parsed with `K"catch"` `K"else"` and `K"finally"` children to avoid the awkwardness of the optional child nodes in the `Expr` representation (#234)
436436
* The dotted import path syntax as in `import A.b.c` is parsed with a `K"importpath"` kind rather than `K"."`, because a bare `A.b.c` has a very different nested/quoted expression representation (#244)
437437
* We use flags rather than child nodes to represent the difference between `struct` and `mutable struct`, `module` and `baremodule` (#220)
438+
* Multiple iterations within the header of a `for`, as in `for a=as, b=bs body end` are represented with a `cartesian_iterator` head rather than a `block`, as these lists of iterators are neither semantically nor syntactically a sequence of statements. Unlike other uses of `block` (see also generators).
438439

439440

440441
## More detail on tree differences
441442

442-
### Flattened generators
443+
### Generators
443444

444445
Flattened generators are uniquely problematic because the Julia AST doesn't
445446
respect a key rule we normally expect: that the children of an AST node are a
446-
*contiguous* range in the source text. This is because the `for`s in
447+
*contiguous* range in the source text. For example, the `for`s in
447448
`[xy for x in xs for y in ys]` are parsed in the normal order of a for loop to
448449
mean
449450

@@ -470,16 +471,30 @@ however, note that if this tree were flattened, the order would be
470471
source order.
471472

472473
However, our green tree is strictly source-ordered, so we must deviate from the
473-
Julia AST. The natural representation seems to be to remove the generators and
474-
use a flattened structure:
474+
Julia AST. We deal with this by grouping cartesian products of iterators
475+
(separated by commas) within `cartesian_iterator` blocks as in `for` loops, and
476+
use the presence of multiple iterator blocks rather than the `flatten` head to
477+
distinguish flattened iterators. The nested flattens and generators of `Expr`
478+
forms are reconstructed later. In this form the tree structure resembles the
479+
source much more closely. For example, `(xy for x in xs for y in ys)` is parsed as
475480

476481
```
477-
(flatten
482+
(generator
478483
xy
479484
(= x xs)
480485
(= y ys))
481486
```
482487

488+
And the cartesian iteration `(xy for x in xs, y in ys)` is parsed as
489+
490+
```
491+
(generator
492+
xy
493+
(cartesian_iterator
494+
(= x xs)
495+
(= y ys)))
496+
```
497+
483498
### Whitespace trivia inside strings
484499

485500
For triple quoted strings, the indentation isn't part of the string data so

src/expr.jl

Lines changed: 54 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -126,41 +126,34 @@ function _to_expr(node::SyntaxNode; iteration_spec=false, need_linenodes=true,
126126
# Convert children
127127
insert_linenums = (headsym === :block || headsym === :toplevel) && need_linenodes
128128
args = Vector{Any}(undef, length(node_args)*(insert_linenums ? 2 : 1))
129-
if headsym === :for && length(node_args) == 2
130-
# No line numbers in for loop iteration spec
131-
args[1] = _to_expr(node_args[1], iteration_spec=true, need_linenodes=false)
132-
args[2] = _to_expr(node_args[2])
133-
elseif headsym === :let && length(node_args) == 2
134-
# No line numbers in let statement binding list
135-
args[1] = _to_expr(node_args[1], need_linenodes=false)
136-
args[2] = _to_expr(node_args[2])
129+
eq_to_kw_in_call =
130+
((headsym === :call || headsym === :dotcall) && is_prefix_call(node)) ||
131+
headsym === :ref
132+
eq_to_kw_all = (headsym === :parameters && !map_kw_in_params) ||
133+
(headsym === :parens && eq_to_kw)
134+
in_vcbr = headsym === :vect || headsym === :curly || headsym === :braces || headsym === :ref
135+
if insert_linenums && isempty(node_args)
136+
push!(args, source_location(LineNumberNode, node.source, node.position))
137137
else
138-
eq_to_kw_in_call =
139-
((headsym === :call || headsym === :dotcall) && is_prefix_call(node)) ||
140-
headsym === :ref
141-
eq_to_kw_all = (headsym === :parameters && !map_kw_in_params) ||
142-
(headsym === :parens && eq_to_kw)
143-
in_vcbr = headsym === :vect || headsym === :curly || headsym === :braces || headsym === :ref
144-
if insert_linenums && isempty(node_args)
145-
push!(args, source_location(LineNumberNode, node.source, node.position))
146-
else
147-
for i in 1:length(node_args)
148-
n = node_args[i]
149-
if insert_linenums
150-
args[2*i-1] = source_location(LineNumberNode, n.source, n.position)
151-
end
152-
eq_to_kw = eq_to_kw_in_call && i > 1 || eq_to_kw_all
153-
coalesce_dot_with_ops = i==1 &&
154-
(nodekind in KSet"call dotcall curly" ||
155-
nodekind == K"quote" && flags(node) == COLON_QUOTE)
156-
args[insert_linenums ? 2*i : i] =
157-
_to_expr(n, eq_to_kw=eq_to_kw,
158-
map_kw_in_params=in_vcbr,
159-
coalesce_dot=coalesce_dot_with_ops)
160-
end
161-
if nodekind == K"block" && has_flags(node, PARENS_FLAG)
162-
popfirst!(args)
138+
for i in 1:length(node_args)
139+
n = node_args[i]
140+
if insert_linenums
141+
args[2*i-1] = source_location(LineNumberNode, n.source, n.position)
163142
end
143+
eq_to_kw = eq_to_kw_in_call && i > 1 || eq_to_kw_all
144+
coalesce_dot_with_ops = i==1 &&
145+
(nodekind in KSet"call dotcall curly" ||
146+
nodekind == K"quote" && flags(node) == COLON_QUOTE)
147+
args[insert_linenums ? 2*i : i] =
148+
_to_expr(n, eq_to_kw=eq_to_kw,
149+
map_kw_in_params=in_vcbr,
150+
coalesce_dot=coalesce_dot_with_ops,
151+
iteration_spec=(headsym == :for && i == 1),
152+
need_linenodes=!(headsym == :let && i == 1)
153+
)
154+
end
155+
if nodekind == K"block" && has_flags(node, PARENS_FLAG)
156+
popfirst!(args)
164157
end
165158
end
166159

@@ -202,6 +195,10 @@ function _to_expr(node::SyntaxNode; iteration_spec=false, need_linenodes=true,
202195
elseif headsym in (:ref, :curly)
203196
# Move parameters blocks to args[2]
204197
reorder_parameters!(args, 2)
198+
elseif headsym === :for
199+
if Meta.isexpr(args[1], :cartesian_iterator)
200+
args[1] = Expr(:block, args[1].args...)
201+
end
205202
elseif headsym in (:tuple, :vect, :braces)
206203
# Move parameters blocks to args[1]
207204
reorder_parameters!(args, 1)
@@ -262,18 +259,32 @@ function _to_expr(node::SyntaxNode; iteration_spec=false, need_linenodes=true,
262259
push!(args, else_)
263260
end
264261
end
265-
elseif headsym === :filter
266-
pushfirst!(args, last(args))
267-
pop!(args)
268-
elseif headsym === :flatten
269-
# The order of nodes inside the generators in Julia's flatten AST
270-
# is noncontiguous in the source text, so need to reconstruct
271-
# Julia's AST here from our alternative `flatten` expression.
272-
gen = Expr(:generator, args[1], args[end])
273-
for i in length(args)-1:-1:2
274-
gen = Expr(:flatten, Expr(:generator, gen, args[i]))
262+
elseif headsym === :generator
263+
# Reconstruct the nested Expr form for generator from our flatter
264+
# source-ordered `generator` format.
265+
gen = args[1]
266+
for j = length(args):-1:2
267+
if Meta.isexpr(args[j], :cartesian_iterator)
268+
gen = Expr(:generator, gen, args[j].args...)
269+
else
270+
gen = Expr(:generator, gen, args[j])
271+
end
272+
if j < length(args)
273+
# Additional `for`s flatten the inner generator
274+
gen = Expr(:flatten, gen)
275+
end
275276
end
276277
return gen
278+
elseif headsym == :filter
279+
@assert length(args) == 2
280+
iterspec = args[1]
281+
outargs = Any[args[2]]
282+
if Meta.isexpr(iterspec, :cartesian_iterator)
283+
append!(outargs, iterspec.args)
284+
else
285+
push!(outargs, iterspec)
286+
end
287+
args = outargs
277288
elseif headsym in (:nrow, :ncat)
278289
# For lack of a better place, the dimension argument to nrow/ncat
279290
# is stored in the flags

src/kinds.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -905,7 +905,7 @@ const _kind_names =
905905
# Comprehensions
906906
"generator"
907907
"filter"
908-
"flatten"
908+
"cartesian_iterator"
909909
"comprehension"
910910
"typed_comprehension"
911911
"END_SYNTAX_KINDS"

src/parser.jl

Lines changed: 36 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1783,13 +1783,9 @@ function parse_resword(ps::ParseState)
17831783
emit(ps, mark, K"while")
17841784
elseif word == K"for"
17851785
# for x in xs end ==> (for (= x xs) (block))
1786-
# for x in xs, y in ys \n a \n end ==> (for (block (= x xs) (= y ys)) (block a))
1786+
# for x in xs, y in ys \n a \n end ==> (for (cartesian_iterator (= x xs) (= y ys)) (block a))
17871787
bump(ps, TRIVIA_FLAG)
1788-
m = position(ps)
1789-
n_subexprs = parse_comma_separated(ps, parse_iteration_spec)
1790-
if n_subexprs > 1
1791-
emit(ps, m, K"block")
1792-
end
1788+
parse_iteration_specs(ps)
17931789
parse_block(ps)
17941790
bump_closing_token(ps, K"end")
17951791
emit(ps, mark, K"for")
@@ -2625,6 +2621,16 @@ function parse_iteration_spec(ps::ParseState)
26252621
emit(ps, mark, K"=")
26262622
end
26272623

2624+
# Parse an iteration spec, or a comma separate list of such for for loops and
2625+
# generators
2626+
function parse_iteration_specs(ps::ParseState)
2627+
mark = position(ps)
2628+
n_iters = parse_comma_separated(ps, parse_iteration_spec)
2629+
if n_iters > 1
2630+
emit(ps, mark, K"cartesian_iterator")
2631+
end
2632+
end
2633+
26282634
# flisp: parse-space-separated-exprs
26292635
function parse_space_separated_exprs(ps::ParseState)
26302636
ps = with_space_sensitive(ps)
@@ -2669,61 +2675,41 @@ function parse_vect(ps::ParseState, closer)
26692675
return (K"vect", EMPTY_FLAGS)
26702676
end
26712677

2672-
# Flattened generators are hard because the Julia AST doesn't respect a key
2673-
# rule we normally expect: that the children of an AST node are a contiguous
2674-
# range in the source text. This is because the `for`s in
2675-
# `[xy for x in xs for y in ys]` are parsed in the normal order of a for as
2678+
# Parse generators
26762679
#
2677-
# (flatten
2678-
# (generator
2679-
# (generator
2680-
# xy
2681-
# y in ys)
2682-
# x in xs))
2680+
# We represent generators quite differently from `Expr`:
2681+
# * Cartesian products of iterators are grouped within cartesian_iterator
2682+
# nodes, as in the short form of `for` loops.
2683+
# * The `generator` kind is used for both cartesian and flattened generators
26832684
#
2684-
# We deal with this by only emitting the flatten:
2685-
#
2686-
# (flatten xy (= x xs) (= y ys))
2687-
#
2688-
# then reconstructing the nested flattens and generators when converting to Expr.
2689-
#
2690-
# [x for a = as for b = bs if cond1 for c = cs if cond2] ==> (comprehension (flatten x (= a as) (filter (= b bs) cond1) (filter (= c cs) cond2)))
2691-
# [x for a = as if begin cond2 end] => (comprehension (generator x (filter (= a as) (block cond2))))
2685+
# (x for a in as for b in bs) ==> (parens (generator x (= a as) (= b bs)))
2686+
# (x for a in as, b in bs) ==> (parens (generator x (cartesian_iterator (= a as) (= b bs))))
2687+
# (x for a in as, b in bs if z) ==> (parens (generator x (filter (cartesian_iterator (= a as) (= b bs)) z)))
26922688
#
26932689
# flisp: parse-generator
2694-
function parse_generator(ps::ParseState, mark, flatten=false)
2695-
t = peek_token(ps)
2696-
if !preceding_whitespace(t)
2697-
# [(x)for x in xs] ==> (comprehension (generator (parens x) (error) (= x xs)))
2698-
bump_invisible(ps, K"error", TRIVIA_FLAG,
2699-
error="Expected space before `for` in generator")
2700-
end
2701-
@check kind(t) == K"for"
2702-
bump(ps, TRIVIA_FLAG)
2703-
filter_mark = position(ps)
2704-
parse_comma_separated(ps, parse_iteration_spec)
2705-
if peek(ps) == K"if"
2706-
# (a for x in xs if cond) ==> (parens (generator a (filter (= x xs) cond)))
2690+
function parse_generator(ps::ParseState, mark)
2691+
while (t = peek_token(ps); kind(t) == K"for")
2692+
if !preceding_whitespace(t)
2693+
# ((x)for x in xs) ==> (parens (generator (parens x) (error) (= x xs)))
2694+
bump_invisible(ps, K"error", TRIVIA_FLAG,
2695+
error="Expected space before `for` in generator")
2696+
end
27072697
bump(ps, TRIVIA_FLAG)
2708-
parse_cond(ps)
2709-
emit(ps, filter_mark, K"filter")
2710-
end
2711-
t = peek_token(ps)
2712-
if kind(t) == K"for"
2713-
# (xy for x in xs for y in ys) ==> (parens (flatten xy (= x xs) (= y ys)))
2714-
# (xy for x in xs for y in ys for z in zs) ==> (parens (flatten xy (= x xs) (= y ys) (= z zs)))
2715-
parse_generator(ps, mark, true)
2716-
if !flatten
2717-
emit(ps, mark, K"flatten")
2698+
iter_mark = position(ps)
2699+
parse_iteration_specs(ps)
2700+
if peek(ps) == K"if"
2701+
# (x for a in as if z) ==> (parens (generator x (filter (= a as) z)))
2702+
bump(ps, TRIVIA_FLAG)
2703+
parse_cond(ps)
2704+
emit(ps, iter_mark, K"filter")
27182705
end
2719-
elseif !flatten
2720-
# (x for a in as) ==> (parens (generator x (= a as)))
2721-
emit(ps, mark, K"generator")
27222706
end
2707+
emit(ps, mark, K"generator")
27232708
end
27242709

27252710
# flisp: parse-comprehension
27262711
function parse_comprehension(ps::ParseState, mark, closer)
2712+
# [x for a in as] ==> (comprehension (generator x a in as))
27272713
ps = ParseState(ps, whitespace_newline=true,
27282714
space_sensitive=false,
27292715
end_symbol=false)

0 commit comments

Comments
 (0)