Implemented buffer pooling for Brandes' algorithm in pkg/analysis/betweenness_approx.go.
This optimization targets the dominant allocation hotspot identified through profiling.
| Metric | Before | After | Improvement |
|---|---|---|---|
| allocs/op | 199,548 | 149,447 | 25% |
| B/op | 29,574,627 | 11,895,089 | 60% |
| ns/op | 13,420,876 | 5,929,813 | 56% |
| Metric | Before | After | Improvement |
|---|---|---|---|
| allocs/op | 100,757 | 74,896 | 26% |
| B/op | 14,842,128 | 5,677,960 | 62% |
| ns/op | 5,512,771 | 3,176,322 | 42% |
| Metric | Before | After | Improvement |
|---|---|---|---|
| allocs/op | 499,557 | 499,557 | 0% (expected) |
| B/op | 34,110,015 | 34,110,051 | 0% (expected) |
| ns/op | 68,866,948 | 69,911,721 | 0% (expected) |
Note: The "Exact" benchmark falls through to network.Betweenness(g) (gonum's implementation)
when sampleSize >= nodeCount, so buffer pooling has no effect on it.
| Function | Before | After | Change |
|---|---|---|---|
| runtime.gcDrain | 36.40% | 21.02% | -15.4pp |
| singleSourceBetweenness (flat) | 1.47% | 2.93% | +1.5pp |
| singleSourceBetweenness (cum) | 50.05% | 64.67% | +14.6pp |
| runtime.mapassign_fast64 | 26.45% | 19.69% | -6.8pp |
| (*brandesBuffers).reset | N/A | 14.10% | new |
Analysis: GC overhead dropped significantly (36% → 21%), confirming reduced allocation pressure. The reset method adds ~14% overhead but this is amortized across all BFS traversals.
| Allocator | Before (flat) | After (flat) | Change |
|---|---|---|---|
| singleSourceBetweenness | 74.80% | 18.93% | -55.9pp |
| gonum iterator.newMapIterEdges | 10.32% | 25.61% | +15.3pp |
| gonum iterator.(*mapIter).next | 7.67% | 19.76% | +12.1pp |
| (*brandesBuffers).reset | N/A | 12.17% | new |
Analysis: The optimization successfully eliminated singleSourceBetweenness as the dominant allocator. The remaining allocations are now dominated by gonum iterator overhead (45% combined), which is addressed in Round 2 via cached adjacency lists.
brandesBuffersstruct (lines 26-34): Holds reusable maps and slices for BFSbrandesPoolsync.Pool (lines 46-58): Thread-safe buffer reuse with 256-entry pre-allocationreset()method (lines 74-105): Clears buffers while retaining capacitysingleSourceBetweenness(lines 264-333): Modified to use pooled buffers
- Maps cleared via
clear()builtin (Go 1.21+) when grown >2x expected size - Slices reset via
[:0]to retain backing array - Predecessor slices retain capacity for reuse
Each goroutine in ApproxBetweenness gets its own buffer from the pool. Buffers are returned
after results are written to localBC, ensuring thread safety.
- All unit tests:
go test ./pkg/analysis/... - Race detector:
go test -race ./pkg/analysis/... - E2E integration:
./scripts/test_buffer_pooling_e2e.sh
Results are identical with same seed across multiple runs. The TestApproxBetweenness_Determinism
test confirms this with 1e-10 epsilon tolerance.
TestResetEquivalentToFreshAllocation verifies that pooled buffer output matches fresh allocation
output for various graph topologies (sparse, dense, chain).
The original plan predicted 80% allocation reduction. We achieved 25% reduction. Key differences:
- localBC map not pooled: Each goroutine still creates a fresh
localBCmap - nodes slice:
graph.NodesOf()allocates on each call - gonum iterator overhead: ~50% of allocations come from gonum iterators
These are candidates for Round 2 optimization.
Based on post-pooling profiles, priority targets:
- Cache adjacency lists - Eliminate gonum iterator allocations (45% of memory)
- Pool localBC maps - Per-goroutine result maps
- Pool nodes slice - Reuse node list across samples
Expected Round 2 improvement: Additional 50-60% allocation reduction.
- Baseline capture:
chore: capture pre-optimization baseline benchmarks - Implementation:
perf: implement buffer pooling for Brandes' algorithm - Unit tests:
test: add comprehensive buffer pool unit and race tests - E2E tests:
test: add E2E integration test script for buffer pooling
pkg/analysis/betweenness_approx.go- Buffer pooling implementationpkg/analysis/buffer_pool_test.go- Unit and race condition tests (new)scripts/test_buffer_pooling_e2e.sh- E2E integration tests (new)benchmarks/baseline_round1_*.txt- Baseline metricsbenchmarks/profile_comparison.txt- Post-optimization profiles