Skip to content

Commit c4ebbb8

Browse files
committed
test: memory profiling
Signed-off-by: Chris Gianelloni <wolf31o2@blinklabs.io>
1 parent 78bc3ca commit c4ebbb8

File tree

15 files changed

+4826
-0
lines changed

15 files changed

+4826
-0
lines changed

.github/workflows/benchmark.yml

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
name: Benchmarks
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
workflow_dispatch: # Allow manual triggering
8+
9+
permissions:
10+
contents: read
11+
12+
jobs:
13+
benchmark:
14+
name: benchmark
15+
runs-on: ubuntu-latest
16+
steps:
17+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 https://github.com/actions/checkout/releases/tag/v6.0.2
18+
with:
19+
fetch-depth: 0
20+
submodules: true
21+
22+
- uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0 https://github.com/actions/setup-go/releases/tag/v6.2.0
23+
with:
24+
go-version: '1.24'
25+
26+
- name: Run benchmarks
27+
run: |
28+
go test -bench=. -benchmem -count=5 -timeout=30m \
29+
./vrf/... \
30+
./kes/... \
31+
./consensus/... \
32+
./cbor/... \
33+
./pipeline/... \
34+
./ledger/... \
35+
2>&1 | tee benchmark.txt
36+
37+
- name: Upload benchmark results
38+
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0 https://github.com/actions/upload-artifact/releases/tag/v6.0.0
39+
with:
40+
name: benchmark-${{ github.sha }}-${{ github.run_number }}
41+
path: benchmark.txt
42+
retention-days: 30

internal/bench/BASELINES.md

Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
# Memory Allocation Baselines
2+
3+
**Last Updated**: 2026-02-10
4+
**Go Version**: 1.24+
5+
**Platform**: linux/arm64
6+
7+
## Overview
8+
9+
This document tracks memory allocation baselines for key validation paths in gouroboros. These baselines serve three purposes:
10+
11+
1. **Regression Detection**: CI fails if allocations exceed thresholds by >50%
12+
2. **Optimization Tracking**: Measure impact of performance improvements
13+
3. **Contributor Guidance**: Set expectations for new code
14+
15+
All baseline values were captured after the completion of optimization work in PRs #1496-1503 and #1529.
16+
17+
---
18+
19+
## Current Baselines
20+
21+
### VRF Operations (`vrf/`)
22+
23+
| Operation | Allocs | Bytes | Time | Notes |
24+
|-----------|--------|-------|------|-------|
25+
| VRF KeyGen | 2 | 64 B | ~75us | Seed processing only |
26+
| VRF Prove | 11 | 736 B | ~760us | Scalar multiplication |
27+
| VRF Verify | 11 | 816 B | ~950us | Full verification |
28+
| VRF VerifyAndHash | 11 | 816 B | ~955us | Verify + hash extraction |
29+
| VRF ProofToHash | 2 | 224 B | ~38us | Hash extraction only |
30+
| MkInputVrf | 3 | 464 B | ~1.4us | VRF input creation |
31+
32+
### KES Operations (`kes/`)
33+
34+
| Operation | Allocs | Bytes | Time | Notes |
35+
|-----------|--------|-------|------|-------|
36+
| KES KeyGen (depth=6) | 255 | 8800 B | ~5ms | Cardano standard depth |
37+
| KES Sign (depth=6) | 1 | 448 B | ~180us | Single allocation |
38+
| KES Update (depth=6) | 3 | 736 B | ~78us | Key evolution |
39+
| KES Verify (depth=6) | 6 | 192 B | ~243us | Signature verification |
40+
| KES VerifySignedKES | 12 | 616 B | ~265us | Full verification path |
41+
| KES NewSumKesFromBytes (depth=6) | 6 | 424 B | ~1.1us | Signature deserialization |
42+
| KES HashPair | 1 | 32 B | ~843ns | Blake2b hash |
43+
44+
### Block Validation (`internal/bench/`)
45+
46+
| Operation | Era | Allocs | Bytes | Time | Notes |
47+
|-----------|-----|--------|-------|------|-------|
48+
| Block Validation | Shelley | 461 | 101 KB | ~2.4ms | Full validation |
49+
| Block Validation | Allegra | 1092 | 246 KB | ~3.1ms | |
50+
| Block Validation | Mary | 1136 | 236 KB | ~3.0ms | |
51+
| Block Validation | Alonzo | 1382 | 365 KB | ~3.5ms | |
52+
| Block Validation | Babbage | 5709 | 1014 KB | ~7.0ms | Largest blocks |
53+
| Block Validation | Conway | 2672 | 487 KB | ~4.4ms | |
54+
| Block Validation (pre-parsed) | All Eras | 20 | 1.5 KB | ~0.9ms | Skip decode |
55+
| VRF Verification | All Eras | 10 | 608 B | ~0.9ms | Block VRF check |
56+
| KES Verification | All Eras | 12 | 616 B | ~270us | Block KES check |
57+
| Body Hash | Shelley | 15 | 10 KB | ~42us | |
58+
| Body Hash | Babbage | 19 | 83 KB | ~328us | Largest body |
59+
| Body Hash | Conway | 18 | 39 KB | ~153us | |
60+
61+
### Block Decode (`internal/bench/`)
62+
63+
| Operation | Era | Allocs | Bytes | Throughput | Notes |
64+
|-----------|-----|--------|-------|------------|-------|
65+
| CBOR Decode | Byron | 500 | 89 KB | 5.5 MB/s | |
66+
| CBOR Decode | Shelley | 441 | 100 KB | 8.2 MB/s | |
67+
| CBOR Decode | Allegra | 1072 | 245 KB | 6.7 MB/s | |
68+
| CBOR Decode | Mary | 1116 | 235 KB | 5.2 MB/s | |
69+
| CBOR Decode | Alonzo | 1362 | 363 KB | 6.4 MB/s | |
70+
| CBOR Decode | Babbage | 5689 | 1014 KB | 3.5 MB/s | |
71+
| CBOR Decode | Conway | 2652 | 485 KB | 3.5 MB/s | |
72+
| Parallel Decode | Byron | 500 | 89 KB | 19.6 MB/s | |
73+
| Parallel Decode | Shelley | 441 | 100 KB | 35.8 MB/s | |
74+
| Parallel Decode | Babbage | 5690 | 1005 KB | 25.2 MB/s | |
75+
76+
### Transaction Validation (`internal/bench/`)
77+
78+
| Operation | Era | Allocs | Bytes | Time | Notes |
79+
|-----------|-----|--------|-------|------|-------|
80+
| Tx Validation | Shelley | 64 | 5.3 KB | ~605us | Simple tx |
81+
| Tx Validation | Allegra | 32 | 3.7 KB | ~600us | |
82+
| Tx Validation | Mary | 42 | 4.5 KB | ~553us | |
83+
| Tx Validation | Alonzo | 44 | 5.3 KB | ~371us | |
84+
| Tx Validation | Babbage | 310 | 22.1 KB | ~1.4ms | |
85+
| Tx Validation | Conway | 220 | 18.7 KB | ~1.8ms | |
86+
| Value Balance | Shelley | 9 | 216 B | ~1.2us | |
87+
| Value Balance | Alonzo | 21 | 624 B | ~3.0us | |
88+
| Witness Validation | Shelley | 13 | 624 B | ~3.1us | |
89+
| Witness Validation | Alonzo | 28 | 6.5 KB | ~33us | |
90+
91+
### Consensus / Leader Election (`internal/bench/`)
92+
93+
| Operation | Allocs | Bytes | Time | Notes |
94+
|-----------|--------|-------|------|-------|
95+
| CertifiedNatThreshold | 1221-1224 | 163-168 KB | ~4.1ms | big.Rat arithmetic |
96+
| VrfLeaderValue | 4 | 528 B | ~1.6us | Blake2b hash |
97+
| VRFOutputToInt | 1 | 64 B | ~139ns | big.Int conversion |
98+
| IsSlotLeader | 1240-1244 | 165-169 KB | ~5.3ms | Full leader check |
99+
| IsVRFOutputBelowThreshold | 5 | 592 B | ~1.8us | Threshold comparison |
100+
| Full Leader Election Workflow | 1242 | 167 KB | ~5.4ms | Complete flow |
101+
102+
---
103+
104+
## How to Update Baselines
105+
106+
### Run All Benchmarks
107+
108+
```bash
109+
# VRF benchmarks
110+
go test -bench=. -benchmem ./vrf/... -run=^$ 2>&1 | tee vrf_bench.txt
111+
112+
# KES benchmarks
113+
go test -bench=. -benchmem ./kes/... -run=^$ 2>&1 | tee kes_bench.txt
114+
115+
# Internal benchmarks (block, tx, consensus, CBOR)
116+
go test -bench=. -benchmem ./internal/bench/... -run=^$ 2>&1 | tee internal_bench.txt
117+
```
118+
119+
### Compare Against Previous Run
120+
121+
```bash
122+
# Install benchstat if needed
123+
go install golang.org/x/perf/cmd/benchstat@latest
124+
125+
# Compare old vs new
126+
benchstat old_bench.txt new_bench.txt
127+
```
128+
129+
### Generate Memory Profile
130+
131+
```bash
132+
# CPU profile
133+
go test -bench=BenchmarkBlockValidation -cpuprofile=cpu.prof ./internal/bench/...
134+
135+
# Memory profile
136+
go test -bench=BenchmarkBlockValidation -memprofile=mem.prof ./internal/bench/...
137+
138+
# Analyze
139+
go tool pprof -http=:8080 mem.prof
140+
```
141+
142+
### Extract Specific Values
143+
144+
```bash
145+
# Get allocation counts for VRF Verify
146+
go test -bench='BenchmarkVerify/Valid' -benchmem ./vrf/... -run=^$ | grep allocs
147+
148+
# Get allocation counts for block validation
149+
go test -bench='BenchmarkBlockValidation/Era_Conway' -benchmem ./internal/bench/... -run=^$
150+
```
151+
152+
---
153+
154+
## Optimization History
155+
156+
### Merged PRs (2026-01)
157+
158+
| PR | Focus Area | Impact |
159+
|----|------------|--------|
160+
| #1496 | KES optimizations | Reduced allocs in key operations |
161+
| #1497 | VRF scalar ops | Improved scalar multiplication |
162+
| #1498 | Block body prealloc | Reduced body decode allocs |
163+
| #1499 | Byron merkle buffers | Fixed buffer reuse in merkle tree |
164+
| #1500 | Fixed nonce buffers | Reduced MkInputVrf allocs |
165+
| #1501 | Plutus context prealloc | Reduced Plutus context building |
166+
| #1502 | VRF leader value | Optimized leader value computation |
167+
| #1503 | big.Rat reuse | Reduced threshold calculation allocs |
168+
| #1529 | CBOR EncMode/DecMode cache | 46-49% faster encode/decode |
169+
170+
### Pre-Optimization Estimates (for reference)
171+
172+
| Operation | Est. Before | Current | Reduction |
173+
|-----------|-------------|---------|-----------|
174+
| VRF Verify | ~15 allocs | 11 allocs | ~27% |
175+
| KES Verify (depth=6) | ~12 allocs | 6 allocs | ~50% |
176+
| MkInputVrf | ~5 allocs | 3 allocs | ~40% |
177+
| Threshold Calc | ~2000 allocs | ~1220 allocs | ~39% |
178+
179+
---
180+
181+
## Regression Thresholds
182+
183+
### CI Failure Criteria
184+
185+
The benchmark CI workflow fails a PR if any of these thresholds are exceeded:
186+
187+
| Metric | Threshold | Rationale |
188+
|--------|-----------|-----------|
189+
| Allocation Count | +50% | Catches allocation leaks |
190+
| Bytes Allocated | +100% | Allows some flexibility for features |
191+
| Time (ns/op) | +50% | Catches performance regressions |
192+
193+
### Critical Paths
194+
195+
These operations are performance-critical and have stricter monitoring:
196+
197+
| Operation | Max Allocs | Rationale |
198+
|-----------|------------|-----------|
199+
| VRF Verify | 15 | Block validation hot path |
200+
| KES VerifySignedKES | 15 | Block validation hot path |
201+
| MkInputVrf | 5 | Called for every slot check |
202+
| Body Hash | 25 | Called for every block |
203+
204+
### How to Request Threshold Increase
205+
206+
If a PR legitimately increases allocations:
207+
208+
1. Document the reason in the PR description
209+
2. Update this file with new baseline values
210+
3. Request reviewer approval for threshold increase
211+
212+
---
213+
214+
## Benchmark Environment Notes
215+
216+
- **CPU Scaling**: Disable CPU frequency scaling for consistent results
217+
- **Parallel Tests**: Use `-p 1` to avoid contention in parallel benchmarks
218+
- **Warmup**: Run benchmarks twice; use second run for baselines
219+
- **Count**: Use `-count=5` and benchstat for statistical significance
220+
221+
```bash
222+
# Recommended benchmark command for baselines
223+
go test -bench=. -benchmem -count=5 -p=1 ./internal/bench/... 2>&1 | tee bench.txt
224+
```
225+
226+
---
227+
228+
## Related Documentation
229+
230+
- [README.md](README.md) - Benchmark package documentation
231+
- [PROFILING.md](PROFILING.md) - Profiling guide

0 commit comments

Comments
 (0)