Add benchmark SQL tests, remove dead expansion code, document failed approaches

claude · claude · commit adf55f676a48 · 2026-03-28T04:37:06.000Z
- Copy 18 complex SQL queries from benchmark/sqlite_parse/queries.sql to driver_sqlite_test.wado for better parser regression coverage (JOINs, recursive CTEs, correlated subqueries, CASE, set operations, etc.) - Remove dead code from parser_gen.wado: sll_expand_rule_ref, try_expand_opaque, strip_all_consume (not called, caused correctness bugs when active) - Keep zero-overhead return stack infrastructure (SllReturn, push_return, pop_return, return-stack-aware sll_config_first/sll_advance_inner) - Document the RuleRef expansion approach and its 3 failure modes in package-gale/CLAUDE.md to prevent repeating the same mistakes https://claude.ai/code/session_01ACVN5Rr7waUZWXtv8MFN2C
diff --git a/package-gale/AGENTS.md b/package-gale/AGENTS.md
@@ -101,6 +101,30 @@ Commit the updated golden files.
 
 - **No backtracking in new code.** Use static k-token lookahead prediction to disambiguate alternatives. If prediction cannot resolve within depth 5, file an issue rather than adding backtracking. Existing backtracking sites are being migrated to prediction; do not add new ones.
 
+## Failed Approaches (Do Not Repeat)
+
+### RuleRef Expansion via Return Stack (2026-03)
+
+**Goal:** Expand multi-token RuleRefs during SLL prediction to reduce backtracking.
+
+**What was tried:** Added `return_stack` to `SllConfig` to track continuation points when entering a referenced rule. `sll_expand_rule_ref` pushed return frames and advanced inside sub-rules. `try_expand_opaque` called expansion when `build_sll_node` would otherwise produce `Backtrack`.
+
+**Why it failed (3 distinct bugs):**
+
+1. **Consume node corruption:** `build_sll_node` emits `Consume(element, child)` when all configs share a common terminal. For expanded configs inside a sub-rule, this emits `p.expect(K_FROM)` at the _decision point_, consuming a token that belongs to the referenced rule (e.g., `delete_stmt`). Fix attempted: `strip_all_consume` — but this loses disambiguation information.
+
+2. **Depth-mixed Dispatch:** Expanded configs produce Dispatch branches for tokens _inside_ sub-rules (e.g., `K_RECURSIVE` from `with_clause`). When multiple alternatives share the same prefix rule (`with_clause`), these dispatches are meaningless — every alternative sees the same tokens. The generated parser enters wrong branches and fails or times out.
+
+3. **Dedup false resolution:** `sll_dedup_by_alt` keeps one config per `alt_index`. When two alternatives expand to configs with identical FIRST sets (e.g., `join_clause` and `table_or_subquery` both start with `table_or_subquery`), dedup merges them into a single alt. The prediction then emits a `Leaf` for the wrong alternative, silently dropping the other.
+
+**What remains:** The `return_stack` field on `SllConfig`, `push_return`, `pop_return`, and return-stack-aware `sll_config_first` / `sll_advance_inner` are committed as zero-overhead infrastructure. They don't affect generated output.
+
+**Lessons:**
+
+- Tokens from inside expanded sub-rules cannot be used for prediction at the decision point level
+- To use expansion correctly, the prediction must map expanded tokens back to the decision point's lookahead depth (essentially an ATN simulator)
+- `sll_dedup_by_alt` is too aggressive for expanded configs — alternatives sharing sub-rules get merged
+
 ## On-Task-Done
 
 When completing a task, run from the project root:
diff --git a/package-gale/src/parser_gen.wado b/package-gale/src/parser_gen.wado
@@ -77,25 +77,6 @@ fn strip_dead_consume(node: PredictionNode) -> PredictionNode {
     return node;
 }
 
-/// Strip ALL Consume nodes from a prediction tree.
-/// Used for expanded RuleRef predictions where Consume would incorrectly
-/// consume tokens belonging to the sub-rule at the decision point.
-fn strip_all_consume(node: PredictionNode) -> PredictionNode {
-    if let Consume(c) = node {
-        return strip_all_consume(c.child);
-    }
-    if let Dispatch(d) = node {
-        let mut new_branches: Array<PredictionBranch> = [];
-        for let b of d.branches {
-            new_branches.append(PredictionBranch {
-                tokens: b.tokens,
-                child: strip_all_consume(b.child),
-            });
-        }
-        return PredictionNode::Dispatch(PredictionDispatch { depth: d.depth, branches: new_branches });
-    }
-    return node;
-}
 
 struct SllReturn {
     elements: Array<Element>,
@@ -184,52 +165,6 @@ fn sll_advance(c: &SllConfig, token: &String, all_rules: &Array<ParserRule>, lit
     return sll_advance_inner(c, token, all_rules, lit_tokens, 0);
 }
 
-fn sll_expand_rule_ref(name: &String, c: &SllConfig, continuation_pos: i32, token: &String, all_rules: &Array<ParserRule>, lit_tokens: &Array<LitToken>, inline_depth: i32) -> Array<SllConfig> {
-    // Depth guard: only 1 level of RuleRef expansion to prevent blowup.
-    if c.return_stack.len() >= 1 {
-        return [SllConfig { alt_index: c.alt_index, elements: c.elements, pos: -1, return_stack: [] }];
-    }
-    // Only expand rules with few alternatives to avoid combinatorial explosion.
-    let mut rule_alt_count = 0;
-    for let rule of all_rules {
-        if rule.name == *name {
-            rule_alt_count = rule.alternatives.len();
-            break;
-        }
-    }
-    if rule_alt_count > 5 {
-        return [SllConfig { alt_index: c.alt_index, elements: c.elements, pos: -1, return_stack: [] }];
-    }
-    let new_stack = push_return(
-        &c.return_stack,
-        c.elements,
-        continuation_pos,
-    );
-    let mut results: Array<SllConfig> = [];
-    for let rule of all_rules {
-        if rule.name != *name {
-            continue;
-        }
-        for let alt of rule.alternatives {
-            let expanded = SllConfig {
-                alt_index: c.alt_index,
-                elements: alt.elements,
-                pos: 0,
-                return_stack: new_stack,
-            };
-            let advanced = sll_advance_inner(&expanded, token, all_rules, lit_tokens, inline_depth + 1);
-            for let a of advanced {
-                results.append(a);
-            }
-        }
-        break;
-    }
-    if results.is_empty() {
-        return [];
-    }
-    return results;
-}
-
 fn sll_advance_inner(c: &SllConfig, token: &String, all_rules: &Array<ParserRule>, lit_tokens: &Array<LitToken>, inline_depth: i32) -> Array<SllConfig> {
     // Limit inline expansion depth to prevent explosion from recursive rules.
     if inline_depth > 2 {
@@ -544,136 +479,6 @@ fn build_sll_node(configs: &Array<SllConfig>, depth: i32, max_depth: i32, all_ru
     return PredictionNode::Dispatch(PredictionDispatch { depth, branches });
 }
 
-/// Try to resolve opaque configs by expanding their multi-token RuleRefs.
-/// Returns Some(node) if expansion produces a better prediction tree, None otherwise.
-/// `non_opaque_configs` are the already-advanced transparent configs for this token.
-fn try_expand_opaque(original_configs: &Array<SllConfig>, token: &String, non_opaque_configs: &Array<SllConfig>, opaque_alts: Array<i32>, depth: i32, max_depth: i32, all_rules: &Array<ParserRule>, lit_tokens: &Array<LitToken>) -> Option<PredictionNode> {
-    // Only expand at shallow depths (up to 2 lookahead levels) to limit cost.
-    if depth > 2 || depth >= max_depth {
-        return null;
-    }
-    // Rule diversity check: only expand when opaque configs reference different rules.
-    // If all opaque configs point to the same RuleRef, expansion can't distinguish them.
-    let mut rule_refs: Array<String> = [];
-    for let c of original_configs {
-        if !array_contains_i32(&opaque_alts, c.alt_index) {
-            continue;
-        }
-        if c.pos < 0 || c.pos >= c.elements.len() {
-            continue;
-        }
-        if let RuleRef(name) = c.elements[c.pos] {
-            if !array_contains_str(&rule_refs, &name) {
-                rule_refs.append(name);
-            }
-        }
-    }
-    if rule_refs.len() <= 1 {
-        return null;
-    }
-    // Re-advance opaque configs by expanding their RuleRefs.
-    let mut expanded_configs: Array<SllConfig> = [];
-    for let c of original_configs {
-        if !array_contains_i32(&opaque_alts, c.alt_index) {
-            continue;
-        }
-        if c.pos < 0 || c.pos >= c.elements.len() {
-            continue;
-        }
-        let elem = &c.elements[c.pos];
-        if let RuleRef(name) = *elem {
-            let result = sll_expand_rule_ref(&name, c, c.pos + 1, token, all_rules, lit_tokens, 0);
-            for let r of result {
-                if r.pos != -1 {
-                    expanded_configs.append(r);
-                }
-            }
-        }
-    }
-    if expanded_configs.is_empty() {
-        return null;
-    }
-    // Combine non-opaque (already advanced) with newly expanded configs.
-    let mut all_configs: Array<SllConfig> = [];
-    for let c of non_opaque_configs {
-        all_configs.append(sll_config_clone(c));
-    }
-    for let ec of expanded_configs {
-        all_configs.append(ec);
-    }
-    // Build a flat one-level Dispatch from expanded configs.
-    // Group configs by FIRST token, then check if each token group resolves to one alt.
-    let mut all_tokens: Array<String> = [];
-    let mut config_firsts: Array<Array<String>> = [];
-    for let c of all_configs {
-        let first = sll_config_first(&c, all_rules, lit_tokens);
-        config_firsts.append(first);
-        for let tk of first {
-            if !array_contains_str(&all_tokens, &tk) {
-                all_tokens.append(tk);
-            }
-        }
-    }
-    let mut branches: Array<PredictionBranch> = [];
-    let mut has_improvement = false;
-    for let mut t = 0; t < all_tokens.len(); t += 1 {
-        let tk = &all_tokens[t];
-        let mut token_alts: Array<i32> = [];
-        for let mut i = 0; i < all_configs.len(); i += 1 {
-            if array_contains_str(&config_firsts[i], tk) {
-                if !array_contains_i32(&token_alts, all_configs[i].alt_index) {
-                    token_alts.append(all_configs[i].alt_index);
-                }
-            }
-        }
-        if token_alts.len() == 1 {
-            has_improvement = true;
-            let mut merged = false;
-            for let mut b = 0; b < branches.len(); b += 1 {
-                if let Leaf(idx) = branches[b].child {
-                    if idx == token_alts[0] {
-                        branches[b].tokens.append(*tk);
-                        merged = true;
-                    }
-                }
-            }
-            if !merged {
-                branches.append(PredictionBranch {
-                    tokens: [*tk],
-                    child: PredictionNode::Leaf(token_alts[0]),
-                });
-            }
-        } else {
-            // Still ambiguous for this token — fall back to Backtrack.
-            let sorted = token_alts.sorted();
-            let child = PredictionNode::Backtrack(sorted);
-            let mut merged = false;
-            for let mut b = 0; b < branches.len(); b += 1 {
-                if prediction_node_eq(&branches[b].child, &child) {
-                    branches[b].tokens.append(*tk);
-                    merged = true;
-                }
-            }
-            if !merged {
-                branches.append(PredictionBranch {
-                    tokens: [*tk],
-                    child,
-                });
-            }
-        }
-    }
-    if !has_improvement || branches.is_empty() {
-        return null;
-    }
-    // Only use if ALL branches are Leaf — any Backtrack branch means the expansion
-    // didn't fully resolve, which can cause incorrect dispatching on sub-rule tokens.
-    for let b of branches {
-        if let Backtrack(_) = b.child {
-            return null;
-        }
-    }
-    return Option::<PredictionNode>::Some(PredictionNode::Dispatch(PredictionDispatch { depth, branches }));
-}
 
 /// Check if all configs are at the same terminal element (for Consume).
 fn sll_find_common_terminal(configs: &Array<SllConfig>) -> Option<Element> {
diff --git a/package-gale/tests/driver_sqlite_test.wado b/package-gale/tests/driver_sqlite_test.wado