Benchmarking-based performance improvements #198

CrockAgile · 2026-02-02T02:08:33Z

Benchmark: main vs benchmark-improvements

Summary

parse_postal has the largest win: ~56% faster (29.4 µs → 13.0 µs), from single-pass whitespace/comments, production-boundary lookahead, and fast path when grammar has no extended syntax.
generate_dna and parse_infinite_nullable_* are ~19–21% faster from traverse/InputRange/CompletionMap/parse_tree changes.
build_postal_parser ~12% faster from reserved capacity in validate_nonterminals.
Other parser/build benchmarks are ~4–8% faster; no regressions.

All times median (lower is better). Negative % = branch is faster.

Benchmark	main	branch	Δ %
examples
generate_dna	564.3 ns	450.3 ns	−20%
parse_infinite_nullable_grammar	89.21 µs	70.59 µs	−21%
parse_polish_calculator	5.293 µs	4.987 µs	−6%
parse_postal	29.41 µs	12.96 µs	−56%
parse_postal_input	212.8 µs	201.9 µs	−5%
parser_api
build_polish_parser	793.1 ns	733.6 ns	−8%
build_postal_parser	5.387 µs	4.763 µs	−12%
parse_infinite_nullable_with_parser	86.25 µs	70.14 µs	−19%
parse_polish_with_parser	4.475 µs	4.294 µs	−4%
parse_postal_with_parser	207.6 µs	197.4 µs	−5%
per_input_100	13.01 ms	12.27 ms	−6%
reuse_parser_100	12.98 ms	12.18 ms	−6%

Specific Improvements

• perf: choose expression by index in traverse to avoid Vec allocation (generate_dna)
• perf: make InputRange Copy to avoid clone in Predict and match_term (Earley); compiler can optimize (memcpy, registers). Reverting to .clone() regresses generate_dna +8.7%, parse_infinite_nullable_* +11–12%, reuse_parser_100 +4.4%
• perf: use sorted Vec instead of BTreeSet for CompletionMap TraversalId sets; faster for small sets, no node allocations (generate_dna ~4%, parse_infinite_nullable_* ~6%)
• perf: traverse LHS lookup without allocating—compare by &str, allocate only when returning missing nonterminal (generate_dna ~11% faster)
• perf: fast path when grammar has no extended syntax—parse via plain_grammar when no '(' or '[' outside string literals (parse_postal ~8% faster)
• perf: production boundary lookahead for BNF—format-specific start char for "next production" instead of full prod_lhs (parse_postal ~8% faster)
• perf: single-pass whitespace and comments in parse_postal/build_—replace multispace0 + opt(preceded(...)) loop with trim/skip-to-newline (parse_postal ~51% faster)
• perf: reserve capacity in validate_nonterminals (ParseGrammar::new) for NonterminalSets (build_postal_parser ~9%, build_polish_parser ~6%)
• perf: inline InputRange::next() in Scan hot path (#[inline(always)])
• perf: pre-allocate RHS Vec in parse_tree (next_parse_tree)—Vec::with_capacity + push instead of map+collect (parse_infinite_nullable_ ~12%, reuse_parser/per_input ~2%)
• perf: reserve capacity for CompletionMap at parse start—with_capacity(32, 32) to reduce rehashing (parse_polish_with_parser ~5.5%)

• perf: choose expression by index in traverse to avoid Vec allocation (generate_dna) • perf: make InputRange Copy to avoid clone in Predict and match_term (Earley); compiler can optimize (memcpy, registers). Reverting to .clone() regresses generate_dna +8.7%, parse_infinite_nullable_* +11–12%, reuse_parser_100 +4.4% • perf: use sorted Vec instead of BTreeSet for CompletionMap TraversalId sets; faster for small sets, no node allocations (generate_dna ~4%, parse_infinite_nullable_* ~6%) • perf: traverse LHS lookup without allocating—compare by &str, allocate only when returning missing nonterminal (generate_dna ~11% faster) • perf: fast path when grammar has no extended syntax—parse via plain_grammar when no '(' or '[' outside string literals (parse_postal ~8% faster) • perf: production boundary lookahead for BNF—format-specific start char for "next production" instead of full prod_lhs (parse_postal ~8% faster) • perf: single-pass whitespace and comments in parse_postal/build_*—replace multispace0 + opt(preceded(...)) loop with trim/skip-to-newline (parse_postal ~51% faster) • perf: reserve capacity in validate_nonterminals (ParseGrammar::new) for NonterminalSets (build_postal_parser ~9%, build_polish_parser ~6%) • perf: inline InputRange::next() in Scan hot path (#[inline(always)]) • perf: pre-allocate RHS Vec in parse_tree (next_parse_tree)—Vec::with_capacity + push instead of map+collect (parse_infinite_nullable_* ~12%, reuse_parser/per_input ~2%) • perf: reserve capacity for CompletionMap at parse start—with_capacity(32, 32) to reduce rehashing (parse_polish_with_parser ~5.5%) • docs: mark match_iter (traversal tree walk) and Scan input_range.next() as done in TRACING_TODO; path cache tried and reverted (regressed parse_infinite_nullable_* ~70%)

coveralls · 2026-02-02T02:41:35Z

coverage: 98.104% (-0.004%) from 98.108%
when pulling 24ef207 on benchmark-improvements
into b2adee3 on main.

CrockAgile force-pushed the benchmark-improvements branch from e85e28d to 40c70ba Compare February 2, 2026 02:08

shorten tracing macros

24ef207

CrockAgile self-assigned this Feb 2, 2026

CrockAgile requested a review from shnewto February 2, 2026 02:49

CrockAgile marked this pull request as ready for review February 2, 2026 02:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking-based performance improvements #198

Benchmarking-based performance improvements #198

Uh oh!

CrockAgile commented Feb 2, 2026

Uh oh!

coveralls commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Benchmarking-based performance improvements #198

Are you sure you want to change the base?

Benchmarking-based performance improvements #198

Uh oh!

Conversation

CrockAgile commented Feb 2, 2026

Benchmark: main vs benchmark-improvements

Summary

Specific Improvements

Uh oh!

coveralls commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants