Benchmarking-based performance improvements #198
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Benchmark: main vs benchmark-improvements
Summary
validate_nonterminals.All times median (lower is better). Negative % = branch is faster.
Specific Improvements
• perf: choose expression by index in traverse to avoid Vec allocation (generate_dna)
• perf: make InputRange Copy to avoid clone in Predict and match_term (Earley); compiler can optimize (memcpy, registers). Reverting to .clone() regresses generate_dna +8.7%, parse_infinite_nullable_* +11–12%, reuse_parser_100 +4.4%
• perf: use sorted Vec instead of BTreeSet for CompletionMap TraversalId sets; faster for small sets, no node allocations (generate_dna ~4%, parse_infinite_nullable_* ~6%)
• perf: traverse LHS lookup without allocating—compare by &str, allocate only when returning missing nonterminal (generate_dna ~11% faster)
• perf: fast path when grammar has no extended syntax—parse via plain_grammar when no '(' or '[' outside string literals (parse_postal ~8% faster)
• perf: production boundary lookahead for BNF—format-specific start char for "next production" instead of full prod_lhs (parse_postal ~8% faster)
• perf: single-pass whitespace and comments in parse_postal/build_—replace multispace0 + opt(preceded(...)) loop with trim/skip-to-newline (parse_postal ~51% faster)
• perf: reserve capacity in validate_nonterminals (ParseGrammar::new) for NonterminalSets (build_postal_parser ~9%, build_polish_parser ~6%)
• perf: inline InputRange::next() in Scan hot path (#[inline(always)])
• perf: pre-allocate RHS Vec in parse_tree (next_parse_tree)—Vec::with_capacity + push instead of map+collect (parse_infinite_nullable_ ~12%, reuse_parser/per_input ~2%)
• perf: reserve capacity for CompletionMap at parse start—with_capacity(32, 32) to reduce rehashing (parse_polish_with_parser ~5.5%)