Skip to content

Conversation

@CrockAgile
Copy link
Collaborator

Benchmark: main vs benchmark-improvements

Summary

  • parse_postal has the largest win: ~56% faster (29.4 µs → 13.0 µs), from single-pass whitespace/comments, production-boundary lookahead, and fast path when grammar has no extended syntax.
  • generate_dna and parse_infinite_nullable_* are ~19–21% faster from traverse/InputRange/CompletionMap/parse_tree changes.
  • build_postal_parser ~12% faster from reserved capacity in validate_nonterminals.
  • Other parser/build benchmarks are ~4–8% faster; no regressions.

All times median (lower is better). Negative % = branch is faster.

Benchmark main branch Δ %
examples
generate_dna 564.3 ns 450.3 ns −20%
parse_infinite_nullable_grammar 89.21 µs 70.59 µs −21%
parse_polish_calculator 5.293 µs 4.987 µs −6%
parse_postal 29.41 µs 12.96 µs −56%
parse_postal_input 212.8 µs 201.9 µs −5%
parser_api
build_polish_parser 793.1 ns 733.6 ns −8%
build_postal_parser 5.387 µs 4.763 µs −12%
parse_infinite_nullable_with_parser 86.25 µs 70.14 µs −19%
parse_polish_with_parser 4.475 µs 4.294 µs −4%
parse_postal_with_parser 207.6 µs 197.4 µs −5%
per_input_100 13.01 ms 12.27 ms −6%
reuse_parser_100 12.98 ms 12.18 ms −6%

Specific Improvements

• perf: choose expression by index in traverse to avoid Vec allocation (generate_dna)
• perf: make InputRange Copy to avoid clone in Predict and match_term (Earley); compiler can optimize (memcpy, registers). Reverting to .clone() regresses generate_dna +8.7%, parse_infinite_nullable_* +11–12%, reuse_parser_100 +4.4%
• perf: use sorted Vec instead of BTreeSet for CompletionMap TraversalId sets; faster for small sets, no node allocations (generate_dna ~4%, parse_infinite_nullable_* ~6%)
• perf: traverse LHS lookup without allocating—compare by &str, allocate only when returning missing nonterminal (generate_dna ~11% faster)
• perf: fast path when grammar has no extended syntax—parse via plain_grammar when no '(' or '[' outside string literals (parse_postal ~8% faster)
• perf: production boundary lookahead for BNF—format-specific start char for "next production" instead of full prod_lhs (parse_postal ~8% faster)
• perf: single-pass whitespace and comments in parse_postal/build_—replace multispace0 + opt(preceded(...)) loop with trim/skip-to-newline (parse_postal ~51% faster)
• perf: reserve capacity in validate_nonterminals (ParseGrammar::new) for NonterminalSets (build_postal_parser ~9%, build_polish_parser ~6%)
• perf: inline InputRange::next() in Scan hot path (#[inline(always)])
• perf: pre-allocate RHS Vec in parse_tree (next_parse_tree)—Vec::with_capacity + push instead of map+collect (parse_infinite_nullable_
~12%, reuse_parser/per_input ~2%)
• perf: reserve capacity for CompletionMap at parse start—with_capacity(32, 32) to reduce rehashing (parse_polish_with_parser ~5.5%)

• perf: choose expression by index in traverse to avoid Vec allocation (generate_dna)
• perf: make InputRange Copy to avoid clone in Predict and match_term (Earley); compiler can optimize (memcpy, registers). Reverting to .clone() regresses generate_dna +8.7%, parse_infinite_nullable_* +11–12%, reuse_parser_100 +4.4%
• perf: use sorted Vec instead of BTreeSet for CompletionMap TraversalId sets; faster for small sets, no node allocations (generate_dna ~4%, parse_infinite_nullable_* ~6%)
• perf: traverse LHS lookup without allocating—compare by &str, allocate only when returning missing nonterminal (generate_dna ~11% faster)
• perf: fast path when grammar has no extended syntax—parse via plain_grammar when no '(' or '[' outside string literals (parse_postal ~8% faster)
• perf: production boundary lookahead for BNF—format-specific start char for "next production" instead of full prod_lhs (parse_postal ~8% faster)
• perf: single-pass whitespace and comments in parse_postal/build_*—replace multispace0 + opt(preceded(...)) loop with trim/skip-to-newline (parse_postal ~51% faster)
• perf: reserve capacity in validate_nonterminals (ParseGrammar::new) for NonterminalSets (build_postal_parser ~9%, build_polish_parser ~6%)
• perf: inline InputRange::next() in Scan hot path (#[inline(always)])
• perf: pre-allocate RHS Vec in parse_tree (next_parse_tree)—Vec::with_capacity + push instead of map+collect (parse_infinite_nullable_* ~12%, reuse_parser/per_input ~2%)
• perf: reserve capacity for CompletionMap at parse start—with_capacity(32, 32) to reduce rehashing (parse_polish_with_parser ~5.5%)
• docs: mark match_iter (traversal tree walk) and Scan input_range.next() as done in TRACING_TODO; path cache tried and reverted (regressed parse_infinite_nullable_* ~70%)
@CrockAgile CrockAgile force-pushed the benchmark-improvements branch from e85e28d to 40c70ba Compare February 2, 2026 02:08
@coveralls
Copy link

Coverage Status

coverage: 98.104% (-0.004%) from 98.108%
when pulling 24ef207 on benchmark-improvements
into b2adee3 on main.

@CrockAgile CrockAgile self-assigned this Feb 2, 2026
@CrockAgile CrockAgile requested a review from shnewto February 2, 2026 02:49
@CrockAgile CrockAgile marked this pull request as ready for review February 2, 2026 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants