Reduce generated parser backtracking via ATN-style RuleRef expansion by gfx · Pull Request #713 · wado-lang/wado

gfx · 2026-03-28T03:38:17Z

Summary

Reduce SQLite grammar backtracking sites from 298 → 253 (-15%) by expanding multi-token RuleRefs during SLL prediction
Add ATN-style try_expand_opaque: when the prediction engine encounters opaque RuleRefs, enter the referenced rules and compute FIRST sets at the decision point's lookahead level to build a flat Dispatch
Modify sll_advance_inner to enter nullable multi-token RuleRefs (e.g., with_clause?) via return stack instead of treating them as single-token consumers
Add return stack infrastructure (SllReturn, push_return, pop_return) to SllConfig for tracking continuation points during rule expansion
Copy 18 complex SQL queries from benchmark/sqlite_parse/queries.sql to driver_sqlite_test.wado for better parser regression coverage
Document failed expansion approaches and their failure modes in package-gale/CLAUDE.md

Key design decisions

Flat Dispatch only: Expanded configs are never passed to build_sll_node. FIRST sets are computed manually and grouped by alt_index to avoid Consume corruption, depth-mixed Dispatch, and dedup false resolution bugs.

All-Leaf guard: try_expand_opaque only returns a Dispatch when every branch resolves to a single alternative (Leaf). Ambiguous branches cause the entire expansion to be rejected, falling back to the original Backtrack.

Nullable entering: sll_advance_inner enters nullable Repeat(Optional/Star, RuleRef) elements via return stack (depth < 1, alt count <= 8) so that the prediction engine sees tokens at the correct input depth.

Test plan

All 1301 tests pass (mise run on-task-done)
SQLite driver tests: 111 passed, 0 failed (including 18 new benchmark queries)
No correctness regressions (CTE, JOIN, subquery, recursive CTE all pass)
Small grammars (JSON, sexpression, calculator) unchanged

https://claude.ai/code/session_01ACVN5Rr7waUZWXtv8MFN2C

…expansion Adds SllReturn/return_stack to SllConfig and sll_expand_rule_ref helper, enabling the prediction engine to enter multi-token RuleRef alternatives. Expansion is currently disabled (guarded) pending resolution of the Consume node correctness issue discovered during testing. Key findings: - Return stack infrastructure has zero overhead when unused (empty arrays) - RuleRef expansion reduces SQLite backtracking by 31% (298→205 sites) - But Consume nodes from expanded rules incorrectly emit parse code at the decision point, causing test failures - Next step: limit expansion results to Leaf-only (approach B) https://claude.ai/code/session_01ACVN5Rr7waUZWXtv8MFN2C

Add return stack to SllConfig, enabling the prediction engine to expand multi-token RuleRefs by entering referenced rules during SLL advancement. This tracks continuation points so the engine can return to the caller after advancing through a sub-rule. Infrastructure added (disabled, zero runtime overhead): - SllReturn struct and return_stack field on SllConfig - push_return/pop_return helpers for stack management - sll_expand_rule_ref: expands multi-token RuleRefs with depth/alt guards - try_expand_opaque: attempts to resolve opaque prediction groups - strip_all_consume: removes Consume nodes from expanded prediction trees The expansion is currently disabled (try_expand_opaque is not called) because dispatching on tokens from inside expanded sub-rules can produce incorrect prediction branches. Specifically: - Consume nodes from sub-rules incorrectly consume tokens at the decision point - Dispatch branches mix tokens from different rule depths - Rules sharing prefixes (e.g., with_clause) create false disambiguation The infrastructure is ready for activation once a correct dispatch strategy is implemented (e.g., computing FIRST sets at the decision point level rather than at the expanded position level). https://claude.ai/code/session_01ACVN5Rr7waUZWXtv8MFN2C

…approaches - Copy 18 complex SQL queries from benchmark/sqlite_parse/queries.sql to driver_sqlite_test.wado for better parser regression coverage (JOINs, recursive CTEs, correlated subqueries, CASE, set operations, etc.) - Remove dead code from parser_gen.wado: sll_expand_rule_ref, try_expand_opaque, strip_all_consume (not called, caused correctness bugs when active) - Keep zero-overhead return stack infrastructure (SllReturn, push_return, pop_return, return-stack-aware sll_config_first/sll_advance_inner) - Document the RuleRef expansion approach and its 3 failure modes in package-gale/CLAUDE.md to prevent repeating the same mistakes https://claude.ai/code/session_01ACVN5Rr7waUZWXtv8MFN2C

Implement try_expand_opaque: when the SLL prediction engine encounters opaque multi-token RuleRefs that would produce a Backtrack node, expand them by entering the referenced rules and computing FIRST sets at the decision point's lookahead level. Key design: build a flat Dispatch manually from expanded FIRST sets, never passing expanded configs to build_sll_node. This avoids the 3 bugs from the previous approach (Consume corruption, depth-mixed Dispatch, dedup false resolution). Safety guards: - Rule diversity check: skip if all opaque alts reference the same rule - Alt count limit (<=8): prevent combinatorial explosion - Nullable-start guard: skip rules starting with nullable elements (e.g., with_clause?) to prevent depth mismatch in sll_advance - FIRST pre-filter: skip rule alternatives that can't match the token - Coverage verification: reject if any original alt is lost Results for SQLite grammar: 298 → 275 backtracking sites (-8%). Primarily resolves CREATE (5→0) and DROP (4→0) groups where alternatives start with different terminal sequences. https://claude.ai/code/session_01ACVN5Rr7waUZWXtv8MFN2C

Modify sll_advance_inner to enter nullable elements containing multi-token RuleRefs (e.g., with_clause?) via the return stack, instead of treating them as single-token consumers. This fixes the depth mismatch that caused try_expand_opaque to skip rules starting with nullable elements. When sll_advance encounters a nullable Repeat(Optional/Star, RuleRef): - If the RuleRef is single-token: advance past it (unchanged) - If multi-token and return_stack depth < 1: push continuation, enter the rule's alternatives, advance inside - Otherwise: fall back to pos+1 (legacy behavior) Guards: return_stack depth < 1, alt count <= 8, FIRST pre-filter. Results for SQLite: 298 → 253 backtracking sites (-15%). All 1301 tests pass. No correctness regressions. https://claude.ai/code/session_01ACVN5Rr7waUZWXtv8MFN2C

claude added 5 commits March 28, 2026 02:49

gfx changed the title ~~Reduce backtracking in package-gale parser~~ Reduce generated parser backtracking via ATN-style RuleRef expansion Mar 28, 2026

gfx merged commit b2e0f13 into main Mar 28, 2026
9 of 10 checks passed

gfx deleted the claude/reduce-backtracking-kyBvr branch March 28, 2026 06:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce generated parser backtracking via ATN-style RuleRef expansion#713

Reduce generated parser backtracking via ATN-style RuleRef expansion#713
gfx merged 5 commits intomainfrom
claude/reduce-backtracking-kyBvr

gfx commented Mar 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gfx commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key design decisions

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gfx commented Mar 28, 2026 •

edited

Loading