Performance improvement by hugoferreira · Pull Request #448 · alecthomas/participle

hugoferreira · 2025-11-24T00:07:50Z

Problem & Rationale

Parsing defers field assignments until the winning branch is known. Each Defer call allocates a fresh contextFieldSet so those captured values survive branch backtracking. Benchmarks showed parseContext.Defer/Branch accounting for nearly half of total allocations (pprof: Branch ~25%, Defer ~19%) even on tiny inputs. These structs are short‑lived, small, and have fixed shape, so recycling them avoids steady heap pressure and reduces GC work without touching parser semantics.

Fix

This change adds a sync.Pool of contextFieldSet objects. Defer now grabs a zeroed struct from the pool, fills it, and Apply returns each struct to the pool after invoking setField. No other behaviour changes, branches still copy apply slices, and errors propagate the same way.

Benchmark

$ (cd _examples && go test ./thrift -bench . -benchmem -run ^$)
goos: darwin
goarch: arm64
pkg: github.com/alecthomas/participle/v2/_examples/thrift
cpu: Apple M4 Max
BenchmarkParticipleThrift-16             	    9830	    119054 ns/op	         7.434 MiB/s	  140573 B/op	    1902 allocs/op
BenchmarkParticipleThriftGenerated-16    	   15954	     74994 ns/op	        12.07 MiB/s	  135850 B/op	    1666 allocs/op
BenchmarkGoThriftParser-16               	    7281	    162447 ns/op	         5.506 MiB/s	  146584 B/op	    2620 allocs/op
PASS
ok  	github.com/alecthomas/participle/v2/_examples/thrift	4.591s

Both participle variants improved about 6–7% in wall time and shed ~350–400 KiB + ~150 allocations per parse (compared to the pre‑change baselines of 127 µs / 172 KB / 2053 allocs and 78 µs / 167 KB / 1817 allocs).

Extending the Technique

Hoping that this technique is sound, it's observable that even after pooling contextFieldSet, profiling the Thrift benchmark still showed parseContext.Branch dominating allocations: every speculative branch clones an entire parseContext, and failed branches keep their deferred captures alive until GC. go tool pprof -alloc_space attributed ~25% of bytes to Branch and ~19% to Defer, so eliminating those short-lived context copies promised another allocation drop.

Extending the fix

Introduced a sync.Pool for parseContext instances (context.go:37-118) plus small helpers: discardDeferred zeros and returns any unused capture records, and recycle hands the whole context back to the pool. Accept now recycles the accepted branch automatically.
Each node that creates speculative branches (group repetitions, disjunctions, lookahead, negation in nodes.go:263-512) now explicitly calls branch.recycle(false) when a branch fails, ensuring both the context and any deferred captures are released immediately.
No parser semantics changed: Stop, Accept, and error tracking all behave exactly as before; only swapped raw allocations for pooled scratch structs.

Second line of Benchmark

With both optimisations:

BenchmarkParticipleThrift-16             	    9172	    127150 ns/op	         6.936 MiB/s	   98042 B/op	    1638 allocs/op
BenchmarkParticipleThriftGenerated-16    	   15225	     78240 ns/op	         11.51 MiB/s	   93656 B/op	    1402 allocs/op
BenchmarkGoThriftParser-16               	    6562	    183980 ns/op	         4.963 MiB/s	  146585 B/op	    2620 allocs/op
PASS
ok  	github.com/alecthomas/participle/v2/_examples/thrift	4.683s

Compared to the prior (already pooled captures) run at ~119 µs/op with 140 kB / 1902 allocs, the new branch pooling holds throughput steady while cutting another ~40% of heap use (98 kB, 1638 allocs) for the runtime-built parser; the generated parser sees a similar improvement (from 136 kB / 1666 allocs down to 94 kB / 1402 allocs). Go-thrift remains the same, so participle now wins clearly on allocation footprint while matching its earlier speed.

`Defer` now grabs a zeroed struct from the pool, fills it, and `Apply` returns each struct to the pool after invoking `setField`. No other behavior changes, branches still copy `apply` slices, and errors propagate the same way.

context.go

…yle.

hugoferreira · 2025-11-24T00:53:16Z

Would love to get your input if it makes sense to continue searching for these low-hanging fruit performance improvements. I don't have the time to go through deep stuff, but pprof does provide some immediate clues. I'm using your library in a different project, where code is compiled in runtime, so any speed-up I can get has direct impact on my side.

Using the exact same technique (building a tokenSlicePool), I can get to:

  BenchmarkParticipleThrift-16         124391 ns/op   6.897 MiB/s   58082 B/op   1630 allocs/op
  BenchmarkParticipleThriftGenerated-16 75488 ns/op   12.00 MiB/s   54155 B/op   1394 allocs/op
  BenchmarkGoThriftParser-16           186693 ns/op   4.766 MiB/s  146584 B/op   2620 allocs/op

Which, compared to the last run in this PR (~98 KB/op for runtime-built, ~94 KB/op for generated), pooling the lexer buffer would trim another ~40 KB/op and ~60–70 allocs/op. That would need to come in a separate PR though, as it needs deeper analysis from you.

alecthomas · 2025-12-11T18:32:03Z

I don't mind the idea of performance improvements in principle, though I am a bit concerned about readability dropping. I think some abstractions on top of the pool might help here, so that the low level pool casting etc. isn't spread throughout the code. There's possibly something to be done here with generics, though there might be a performance hit.

Also would you mind running these through benchcmp so we can see that actual performance difference? Benchmarks in isolation don't really tell me anything.

hugoferreira added 2 commits November 24, 2025 00:02

IThis change adds a sync.Pool of contextFieldSet objects.

8025988

`Defer` now grabs a zeroed struct from the pool, fills it, and `Apply` returns each struct to the pool after invoking `setField`. No other behavior changes, branches still copy `apply` slices, and errors propagate the same way.

Decrease short-lived context copies

479c9c4

alecthomas reviewed Nov 24, 2025

View reviewed changes

context.go Show resolved Hide resolved

Cleaned-up func (p *parseContext) Apply() error to keep existing st…

c211a73

…yle.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance improvement#448

Performance improvement#448
hugoferreira wants to merge 3 commits intoalecthomas:masterfrom
hugoferreira:master

hugoferreira commented Nov 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

hugoferreira commented Nov 24, 2025 •

edited

Loading

Uh oh!

alecthomas commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hugoferreira commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem & Rationale

Fix

Benchmark

Extending the Technique

Extending the fix

Second line of Benchmark

Uh oh!

Uh oh!

hugoferreira commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alecthomas commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hugoferreira commented Nov 24, 2025 •

edited

Loading

hugoferreira commented Nov 24, 2025 •

edited

Loading