Remove backtracking from package-gale parser#711
Merged
Conversation
…n engine
Add Consume variant to PredictionNode that factors out shared fixed-token
prefixes across alternatives. When all alternatives in a group start with
the same TokenRef or Literal, the prediction engine now emits a Consume
node that advances the parser past the shared prefix without spending
lookahead depth budget, then recurses on the suffixes.
This eliminates backtracking in cases like JSON obj/arr rules where
'{' pair (',' pair)* '}' and '{' '}' share the '{' prefix — the
generated parser now consumes '{' then dispatches on the next token.
For SQLite, backtracking sites reduced from 122 to 109 (inline groups
with shared keyword prefixes like INSERT OR REPLACE/ROLLBACK/etc).
https://claude.ai/code/session_014FPnMv8SbHtLtKwvuUs8gs
The prediction engine now detects when all alternatives in a group share the same leading RuleRef (e.g. parse_name), consumes it via a parse call, and then dispatches on the suffix. This eliminates backtracking for cases like SQLite's compound_operator rule where alternatives share a common RuleRef prefix. SQLite backtracking: 109 → 105 (further reduction from previous 122). https://claude.ai/code/session_014FPnMv8SbHtLtKwvuUs8gs
Implement rule_is_single_token() that checks whether a parser rule
always consumes exactly one token (all alternatives have exactly one
non-nullable terminal or single-token RuleRef). When compute_deeper_first
encounters such a RuleRef, it safely advances past it (consuming 1 depth
unit), enabling lookahead into what follows the rule reference.
This is the static equivalent of ANTLR4's SLL prediction: for rules like
SQLite's `name` → `IDENTIFIER | keyword`, the prediction engine can now
look past the rule reference to disambiguate alternatives (e.g.,
`function_name '('` vs `column_name` at depth 2).
No golden file changes because SQLite's `any_name` rule includes a
recursive `'(' any_name ')'` alternative, making it non-single-token.
The infrastructure is in place for grammars with pure single-token rules.
https://claude.ai/code/session_014FPnMv8SbHtLtKwvuUs8gs
Replace the old depth-based prediction engine with an SLL-style DFA that tracks element positions independently per alternative. Each alternative maintains its own cursor into its element sequence, enabling disambiguation when different alternatives are at different positions after consuming a shared token. Key changes: - SllConfig struct tracks (alt_index, elements, pos) per alternative - build_sll_node simulates token consumption across all configs - Single-token RuleRefs advance position; multi-token RuleRefs produce opaque configs that force Backtrack (safe approximation) - strip_dead_consume removes Consume nodes that lead to Backtrack - Opaque configs are correctly included in all dispatch branches Results: SQLite backtracking 122 → 98 (-20%), S-expression 1 → 1, JSON/Calculator 0 → 0. All 92 Gale unit tests pass. 2 SQLite integration tests fail (EXISTS subquery) due to backtrack trial order — will be fixed in a follow-up. Also fixes unrelated clippy warnings (if_same_then_else, ptr_arg). https://claude.ai/code/session_014FPnMv8SbHtLtKwvuUs8gs
…iction
Fix a pre-existing bug where EXISTS/NOT EXISTS subqueries were parsed
as function calls instead of subquery expressions. The root cause was
twofold:
1. sll_closure skipped nullable elements (e.g., Optional(K_NOT)),
causing their FIRST tokens to be lost from the prediction. This
meant bt_21 ((NOT)? EXISTS '(' select_stmt ')') wasn't included
in the EXISTS token dispatch branch.
2. Backtrack trial order: alternatives starting with fixed tokens
(like EXISTS) should be tried before alternatives starting with
RuleRefs (like function_name) since they are more specific.
Changes:
- Remove nullable skipping from sll_closure; use first_of_elements_from
in sll_config_first to naturally include nullable tokens in FIRST sets
- Add explicit nullable handling in sll_advance_inner: when at a nullable
element, try both matching it and skipping past all consecutive nullables
(non-recursive to avoid explosion)
- Raise alt_sort_priority for multi-element alternatives starting with
fixed tokens (priority 4) over those starting with RuleRefs (priority 2)
Also fixes unrelated clippy warnings (if_same_then_else, ptr_arg).
Results: 1283 tests pass (0 failures), SQLite backtracking 98 → 104
(slight increase due to nullable configs producing more opaque paths).
https://claude.ai/code/session_014FPnMv8SbHtLtKwvuUs8gs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
https://claude.ai/code/session_014FPnMv8SbHtLtKwvuUs8gs