Skip to content

Commit c49e461

Browse files
authored
Merge pull request #731 from wado-lang/claude/fix-cst-group-afN1p
Implement CST Group support for Gale parser generator
2 parents eae653b + f5116d7 commit c49e461

File tree

8 files changed

+1029
-1181
lines changed

8 files changed

+1029
-1181
lines changed

package-gale/TODO.md

Lines changed: 37 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,49 @@
11
# Gale TODO
22

3-
## CST Group Fix
3+
## Remove `store` parameter from `gen_element`
44

5-
CST Group support (storing `(rule_a | rule_b)*` results in the parse tree) is implemented but blocked by a compiler limitation.
5+
`gen_element` has a `store: bool` parameter that controls generated variable naming:
66

7-
### What's done
7+
- `store=true`: `let field_name = parse_xxx(p)?;` (meaningful name for struct field assignment)
8+
- `store=false`: `let tok = parse_xxx(p)?;` (throwaway name)
89

9-
- **generator.wado**: Group variant type generation (`SqlStmtListOrErrorGroup` etc.)
10-
- **parser_gen.wado**: Parser stores Group results in variant fields
11-
- **visitor_gen.wado**: Walker generates match dispatch for Group variants
12-
- **All code is tested and works** when compiled standalone
10+
`store=false` is a workaround for a naming collision problem in `dedup_name`. The underlying issue is that `dedup_name` assumes sequential variable creation, but branching code (optionals, groups, prediction trees) creates variables in different scopes:
1311

14-
### What's blocked
12+
```
13+
// gen_optional_with_lookahead generates:
14+
if condition {
15+
let k_limit = p.expect(...)?;
16+
let expr = parse_expr(p)?; // dedup count 0
17+
if grp_kind == TK_K_OFFSET { ... }
18+
let expr = parse_expr(p)?; // dedup count 0 again → collision!
19+
}
20+
```
1521

16-
The walker generates `for let item of &node.group_list` which iterates `&Array<GroupVariant>`. This requires `IntoIterator for &Array<T>``ArrayRefIter<T>``Iterator` trait impl.
22+
With `store=false`, both become `let tok = ...` / `let tok_2 = ...` (separate namespace), avoiding the collision. But this is a hack — all generated variables should use meaningful names.
1723

18-
`ArrayRefIter<T>` now implements `Iterator` (e2e tests pass), but the monomorphizer **eagerly instantiates all default methods** (collect, map, filter, fold, etc.) for every `T`, causing OOM when compiling large programs like gale itself.
24+
### Root cause
1925

20-
### Fix path
26+
`dedup_name` tracks a flat counter per name. Optional/group paths increment this counter even though their variables are scoped to if-blocks. When mandatory elements follow, `gen_field_assignments` expects names at count N but the parsing code produced count N+K (shifted by optional paths).
2127

22-
```
23-
1. Lazy monomorphization of trait default methods
24-
(only instantiate methods that are actually called)
25-
26-
2. Gale compiles without OOM (ArrayRefIter<T> already implements Iterator)
27-
28-
3. CST Group fix can use `for let item of &node.group_list`
29-
30-
4. SQLite parser's to_tree() test passes
31-
```
28+
### Fix approach
29+
30+
1. **Scope-aware naming**: Track which variables are in which scope. Variables inside if/else blocks don't affect the parent scope's counter.
31+
2. **Or**: Make `gen_field_assignments` use the same `name_counts` instance that the parsing code used, instead of replaying from scratch.
32+
3. **Or**: Wrap all non-field parsing (groups, optionals, prediction branches) in labeled scope blocks (`__scope: { ... }`) and use fresh `name_counts` inside. This isolates variable names to their scope.
33+
34+
Approach 3 was partially implemented but incomplete — optional paths that share elements with the enclosing alt (e.g., `LIMIT expr ((OFFSET|',') expr)?` where `expr` appears both in the alt and inside the optional) need the parent's dedup counter to produce `expr_2` for the second occurrence.
35+
36+
A complete fix likely requires combining approaches: scope blocks for isolation + parent counter awareness for shared element types.
37+
38+
## Incomplete CST walker coverage
39+
40+
The XML unparse output is missing tokens/nodes for several patterns due to `store=false` and non-simple groups:
41+
42+
- **Repeated separators**: `(',' result_column)*` — the `','` and subsequent `result_column` are not stored, so `SELECT a, b` only shows `a`.
43+
- **Token-only groups**: `(K_INSERT | K_REPLACE)` — groups with only TokenRef alternatives are not `is_simple_cst_group` (requires RuleRef alternatives), so `INSERT` keyword is missing from `insert_stmt`.
44+
- **Multi-element single-alt groups**: `(';'+ sql_stmt)*` — the Star inner group has two elements (Plus and RuleRef), not a simple CST group, so second statements are not stored.
3245

33-
### Workaround (if lazy monomorphization is too large)
46+
Fixing these requires either:
3447

35-
Change the walker to use value iteration (`for let item of node.group_list` without `&`). This copies array elements but avoids the `&Array<T>` IntoIterator issue. The walker already works with value semantics for non-Group fields.
48+
1. Extending `is_simple_cst_group` to handle token-only and mixed groups (generating variant types for tokens too).
49+
2. Or making `gen_repeat` for Star/Plus store all inner elements (requires struct types for group alternatives with multiple elements).

package-gale/src/gen_util.wado

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
use { Alternative } from "./ir.wado";
2+
13
pub fn to_pascal_case(name: &String) -> String {
24
let mut result = String::with_capacity(name.len());
35
let mut capitalize_next = true;
@@ -195,6 +197,49 @@ pub fn strip_suffix(s: &String, suffix: &String) -> String {
195197
return result;
196198
}
197199

200+
pub fn is_simple_cst_group(alts: &Array<Alternative>) -> bool {
201+
if alts.len() < 2 { return false; }
202+
for let alt of alts {
203+
if alt.elements.len() != 1 { return false; }
204+
if let RuleRef(_) = alt.elements[0] {
205+
// ok
206+
} else {
207+
return false;
208+
}
209+
}
210+
return true;
211+
}
212+
213+
pub fn group_variant_type_name(alts: &Array<Alternative>) -> String {
214+
let mut result = String::with_capacity(64);
215+
for let mut i = 0; i < alts.len(); i += 1 {
216+
if i > 0 { result.append("Or"); }
217+
if let RuleRef(name) = &alts[i].elements[0] {
218+
result.append(&to_pascal_case(name));
219+
}
220+
}
221+
result.append("Group");
222+
return result;
223+
}
224+
225+
pub fn group_field_base_name(alts: &Array<Alternative>) -> String {
226+
let mut result = String::with_capacity(64);
227+
for let mut i = 0; i < alts.len(); i += 1 {
228+
if i > 0 { result.append("_or_"); }
229+
if let RuleRef(name) = &alts[i].elements[0] {
230+
result.append(&to_snake_case(name));
231+
}
232+
}
233+
return result;
234+
}
235+
236+
pub fn group_case_name(alt: &Alternative) -> String {
237+
if let RuleRef(name) = &alt.elements[0] {
238+
return to_pascal_case(name);
239+
}
240+
return "Unknown";
241+
}
242+
198243
pub fn array_contains_str(arr: &Array<String>, value: &String) -> bool {
199244
for let item of arr {
200245
if *item == *value {

package-gale/src/generator.wado

Lines changed: 64 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
use { Grammar, ParserRule, Alternative, Element, RepeatElement, LabelElement, RepeatKind, LexerRule } from "./ir.wado";
2-
use { to_pascal_case, to_snake_case, to_lower, rule_type_name, literal_field_name, strip_suffix } from "./gen_util.wado";
2+
use { to_pascal_case, to_snake_case, to_lower, rule_type_name, literal_field_name, strip_suffix, is_simple_cst_group, group_variant_type_name, group_field_base_name, group_case_name } from "./gen_util.wado";
33
use { collect_literal_tokens, gen_token_constants, gen_lexer } from "./lexer_gen.wado";
44
use { gen_parser } from "./parser_gen.wado";
55
use { gen_visitor } from "./visitor_gen.wado";
@@ -78,6 +78,7 @@ fn gen_parser_rule_type(w: &mut CodeWriter, rule: &ParserRule) {
7878
}
7979

8080
fn gen_struct_for_alt(w: &mut CodeWriter, type_name: &String, alt: &Alternative) {
81+
gen_group_variant_types(w, &alt.elements);
8182
let fields = collect_fields(&alt.elements);
8283
let mut s = StructSpec::new(type_name);
8384
s.set_pub();
@@ -89,6 +90,57 @@ fn gen_struct_for_alt(w: &mut CodeWriter, type_name: &String, alt: &Alternative)
8990
w.blank();
9091
}
9192

93+
fn repeat_inner_field(elem: &Element) -> Option<Field> {
94+
if let Group(alts) = *elem {
95+
if is_simple_cst_group(&alts) {
96+
return Option::<Field>::Some(Field {
97+
name: group_field_base_name(&alts),
98+
type_str: group_variant_type_name(&alts),
99+
});
100+
}
101+
return null;
102+
}
103+
return element_to_field(elem);
104+
}
105+
106+
fn gen_group_variant_types(w: &mut CodeWriter, elements: &Array<Element>) {
107+
for let elem of elements {
108+
let alts = extract_group_alts(&elem);
109+
if let Some(group_alts) = alts {
110+
if is_simple_cst_group(group_alts) {
111+
gen_group_variant_type(w, group_alts);
112+
}
113+
}
114+
}
115+
}
116+
117+
fn extract_group_alts(elem: &Element) -> Option<&Array<Alternative>> {
118+
if let Group(alts) = elem {
119+
return Option::Some(alts);
120+
}
121+
if let Repeat(rep) = elem {
122+
if let Group(alts) = &rep.element {
123+
return Option::Some(alts);
124+
}
125+
}
126+
return null;
127+
}
128+
129+
fn gen_group_variant_type(w: &mut CodeWriter, alts: &Array<Alternative>) {
130+
let type_name = group_variant_type_name(alts);
131+
let mut v = VariantSpec::new(&type_name);
132+
v.set_pub();
133+
for let alt of alts {
134+
if let RuleRef(name) = &alt.elements[0] {
135+
let case_name = to_pascal_case(name);
136+
let payload = rule_type_name(name);
137+
v.add_case_with_payload(&case_name, &payload);
138+
}
139+
}
140+
v.emit(w);
141+
w.blank();
142+
}
143+
92144
struct Field {
93145
name: String,
94146
type_str: String,
@@ -148,7 +200,8 @@ fn element_to_field(elem: &Element) -> Option<Field> {
148200
});
149201
}
150202
if let Repeat(rep) = *elem {
151-
if let Some(inner) = element_to_field(&rep.element) {
203+
let inner_field = repeat_inner_field(&rep.element);
204+
if let Some(inner) = inner_field {
152205
let field = match rep.kind {
153206
Star | Plus => Field {
154207
name: `{inner.name}_list`,
@@ -172,6 +225,15 @@ fn element_to_field(elem: &Element) -> Option<Field> {
172225
}
173226
return null;
174227
}
228+
if let Group(alts) = *elem {
229+
if is_simple_cst_group(&alts) {
230+
return Option::<Field>::Some(Field {
231+
name: group_field_base_name(&alts),
232+
type_str: `Option<{group_variant_type_name(&alts)}>`,
233+
});
234+
}
235+
return null;
236+
}
175237
return null;
176238
}
177239

package-gale/src/generator_test.wado

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,68 @@ test "generate duplicate field names" {
180180
assert str_contains(&output, " pub item_2: ItemNode,");
181181
}
182182

183+
test "generate group variant type for repeated group" {
184+
let grammar = Grammar {
185+
name: "Test",
186+
parser_rules: [
187+
ParserRule {
188+
name: "parse",
189+
alternatives: [
190+
Alternative {
191+
label: null,
192+
elements: [
193+
Element::Repeat(RepeatElement {
194+
kind: RepeatKind::Star,
195+
element: Element::Group([
196+
Alternative { label: null, elements: [Element::RuleRef("stmt")] },
197+
Alternative { label: null, elements: [Element::RuleRef("error")] },
198+
]),
199+
non_greedy: false,
200+
}),
201+
Element::TokenRef("EOF"),
202+
],
203+
},
204+
],
205+
},
206+
],
207+
lexer_rules: [],
208+
};
209+
let output = generate(&grammar);
210+
assert str_contains(&output, "pub variant StmtOrErrorGroup {");
211+
assert str_contains(&output, " Stmt(StmtNode),");
212+
assert str_contains(&output, " Error(ErrorNode),");
213+
assert str_contains(&output, " pub stmt_or_error_list: Array<StmtOrErrorGroup>,");
214+
}
215+
216+
test "generate standalone group as optional field" {
217+
let grammar = Grammar {
218+
name: "Test",
219+
parser_rules: [
220+
ParserRule {
221+
name: "cmd",
222+
alternatives: [
223+
Alternative {
224+
label: null,
225+
elements: [
226+
Element::Group([
227+
Alternative { label: null, elements: [Element::RuleRef("foo")] },
228+
Alternative { label: null, elements: [Element::RuleRef("bar")] },
229+
]),
230+
Element::TokenRef("EOF"),
231+
],
232+
},
233+
],
234+
},
235+
],
236+
lexer_rules: [],
237+
};
238+
let output = generate(&grammar);
239+
assert str_contains(&output, "pub variant FooOrBarGroup {");
240+
assert str_contains(&output, " Foo(FooNode),");
241+
assert str_contains(&output, " Bar(BarNode),");
242+
assert str_contains(&output, " pub foo_or_bar: Option<FooOrBarGroup>,");
243+
}
244+
183245
fn str_contains(haystack: &String, needle: &String) -> bool {
184246
let h_len = haystack.len();
185247
let n_len = needle.len();

0 commit comments

Comments
 (0)