Skip to content

Remove backtracking from package-gale parser#711

Merged
gfx merged 8 commits intomainfrom
claude/remove-parser-backtracking-ogyIs
Mar 27, 2026
Merged

Remove backtracking from package-gale parser#711
gfx merged 8 commits intomainfrom
claude/remove-parser-backtracking-ogyIs

Conversation

@gfx
Copy link
Copy Markdown
Member

@gfx gfx commented Mar 27, 2026

claude added 6 commits March 27, 2026 13:33
…n engine

Add Consume variant to PredictionNode that factors out shared fixed-token
prefixes across alternatives. When all alternatives in a group start with
the same TokenRef or Literal, the prediction engine now emits a Consume
node that advances the parser past the shared prefix without spending
lookahead depth budget, then recurses on the suffixes.

This eliminates backtracking in cases like JSON obj/arr rules where
'{' pair (',' pair)* '}' and '{' '}' share the '{' prefix — the
generated parser now consumes '{' then dispatches on the next token.

For SQLite, backtracking sites reduced from 122 to 109 (inline groups
with shared keyword prefixes like INSERT OR REPLACE/ROLLBACK/etc).

https://claude.ai/code/session_014FPnMv8SbHtLtKwvuUs8gs
The prediction engine now detects when all alternatives in a group
share the same leading RuleRef (e.g. parse_name), consumes it via
a parse call, and then dispatches on the suffix. This eliminates
backtracking for cases like SQLite's compound_operator rule where
alternatives share a common RuleRef prefix.

SQLite backtracking: 109 → 105 (further reduction from previous 122).

https://claude.ai/code/session_014FPnMv8SbHtLtKwvuUs8gs
Implement rule_is_single_token() that checks whether a parser rule
always consumes exactly one token (all alternatives have exactly one
non-nullable terminal or single-token RuleRef). When compute_deeper_first
encounters such a RuleRef, it safely advances past it (consuming 1 depth
unit), enabling lookahead into what follows the rule reference.

This is the static equivalent of ANTLR4's SLL prediction: for rules like
SQLite's `name` → `IDENTIFIER | keyword`, the prediction engine can now
look past the rule reference to disambiguate alternatives (e.g.,
`function_name '('` vs `column_name` at depth 2).

No golden file changes because SQLite's `any_name` rule includes a
recursive `'(' any_name ')'` alternative, making it non-single-token.
The infrastructure is in place for grammars with pure single-token rules.

https://claude.ai/code/session_014FPnMv8SbHtLtKwvuUs8gs
Replace the old depth-based prediction engine with an SLL-style DFA
that tracks element positions independently per alternative. Each
alternative maintains its own cursor into its element sequence, enabling
disambiguation when different alternatives are at different positions
after consuming a shared token.

Key changes:
- SllConfig struct tracks (alt_index, elements, pos) per alternative
- build_sll_node simulates token consumption across all configs
- Single-token RuleRefs advance position; multi-token RuleRefs produce
  opaque configs that force Backtrack (safe approximation)
- strip_dead_consume removes Consume nodes that lead to Backtrack
- Opaque configs are correctly included in all dispatch branches

Results: SQLite backtracking 122 → 98 (-20%), S-expression 1 → 1,
JSON/Calculator 0 → 0. All 92 Gale unit tests pass.

2 SQLite integration tests fail (EXISTS subquery) due to backtrack
trial order — will be fixed in a follow-up.

Also fixes unrelated clippy warnings (if_same_then_else, ptr_arg).

https://claude.ai/code/session_014FPnMv8SbHtLtKwvuUs8gs
…iction

Fix a pre-existing bug where EXISTS/NOT EXISTS subqueries were parsed
as function calls instead of subquery expressions. The root cause was
twofold:

1. sll_closure skipped nullable elements (e.g., Optional(K_NOT)),
   causing their FIRST tokens to be lost from the prediction. This
   meant bt_21 ((NOT)? EXISTS '(' select_stmt ')') wasn't included
   in the EXISTS token dispatch branch.

2. Backtrack trial order: alternatives starting with fixed tokens
   (like EXISTS) should be tried before alternatives starting with
   RuleRefs (like function_name) since they are more specific.

Changes:
- Remove nullable skipping from sll_closure; use first_of_elements_from
  in sll_config_first to naturally include nullable tokens in FIRST sets
- Add explicit nullable handling in sll_advance_inner: when at a nullable
  element, try both matching it and skipping past all consecutive nullables
  (non-recursive to avoid explosion)
- Raise alt_sort_priority for multi-element alternatives starting with
  fixed tokens (priority 4) over those starting with RuleRefs (priority 2)

Also fixes unrelated clippy warnings (if_same_then_else, ptr_arg).

Results: 1283 tests pass (0 failures), SQLite backtracking 98 → 104
(slight increase due to nullable configs producing more opaque paths).

https://claude.ai/code/session_014FPnMv8SbHtLtKwvuUs8gs
@gfx gfx enabled auto-merge March 27, 2026 23:22
@gfx gfx merged commit f332d0e into main Mar 27, 2026
9 of 10 checks passed
@gfx gfx deleted the claude/remove-parser-backtracking-ogyIs branch March 27, 2026 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants