|
| 1 | +# Memory Allocation Baselines |
| 2 | + |
| 3 | +**Last Updated**: 2026-02-10 |
| 4 | +**Go Version**: 1.24+ |
| 5 | +**Platform**: linux/arm64 |
| 6 | + |
| 7 | +## Overview |
| 8 | + |
| 9 | +This document tracks memory allocation baselines for key validation paths in gouroboros. These baselines serve three purposes: |
| 10 | + |
| 11 | +1. **Regression Detection**: CI fails if allocations exceed thresholds by >50% |
| 12 | +2. **Optimization Tracking**: Measure impact of performance improvements |
| 13 | +3. **Contributor Guidance**: Set expectations for new code |
| 14 | + |
| 15 | +All baseline values were captured after the completion of optimization work in PRs #1496-1503 and #1529. |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +## Current Baselines |
| 20 | + |
| 21 | +### VRF Operations (`vrf/`) |
| 22 | + |
| 23 | +| Operation | Allocs | Bytes | Time | Notes | |
| 24 | +|-----------|--------|-------|------|-------| |
| 25 | +| VRF KeyGen | 2 | 64 B | ~75us | Seed processing only | |
| 26 | +| VRF Prove | 11 | 736 B | ~760us | Scalar multiplication | |
| 27 | +| VRF Verify | 11 | 816 B | ~950us | Full verification | |
| 28 | +| VRF VerifyAndHash | 11 | 816 B | ~955us | Verify + hash extraction | |
| 29 | +| VRF ProofToHash | 2 | 224 B | ~38us | Hash extraction only | |
| 30 | +| MkInputVrf | 3 | 464 B | ~1.4us | VRF input creation | |
| 31 | + |
| 32 | +### KES Operations (`kes/`) |
| 33 | + |
| 34 | +| Operation | Allocs | Bytes | Time | Notes | |
| 35 | +|-----------|--------|-------|------|-------| |
| 36 | +| KES KeyGen (depth=6) | 255 | 8800 B | ~5ms | Cardano standard depth | |
| 37 | +| KES Sign (depth=6) | 1 | 448 B | ~180us | Single allocation | |
| 38 | +| KES Update (depth=6) | 3 | 736 B | ~78us | Key evolution | |
| 39 | +| KES Verify (depth=6) | 6 | 192 B | ~243us | Signature verification | |
| 40 | +| KES VerifySignedKES | 12 | 616 B | ~265us | Full verification path | |
| 41 | +| KES NewSumKesFromBytes (depth=6) | 6 | 424 B | ~1.1us | Signature deserialization | |
| 42 | +| KES HashPair | 1 | 32 B | ~843ns | Blake2b hash | |
| 43 | + |
| 44 | +### Block Validation (`internal/bench/`) |
| 45 | + |
| 46 | +| Operation | Era | Allocs | Bytes | Time | Notes | |
| 47 | +|-----------|-----|--------|-------|------|-------| |
| 48 | +| Block Validation | Shelley | 461 | 101 KB | ~2.4ms | Full validation | |
| 49 | +| Block Validation | Allegra | 1092 | 246 KB | ~3.1ms | | |
| 50 | +| Block Validation | Mary | 1136 | 236 KB | ~3.0ms | | |
| 51 | +| Block Validation | Alonzo | 1382 | 365 KB | ~3.5ms | | |
| 52 | +| Block Validation | Babbage | 5709 | 1014 KB | ~7.0ms | Largest blocks | |
| 53 | +| Block Validation | Conway | 2672 | 487 KB | ~4.4ms | | |
| 54 | +| Block Validation (pre-parsed) | All Eras | 20 | 1.5 KB | ~0.9ms | Skip decode | |
| 55 | +| VRF Verification | All Eras | 10 | 608 B | ~0.9ms | Block VRF check | |
| 56 | +| KES Verification | All Eras | 12 | 616 B | ~270us | Block KES check | |
| 57 | +| Body Hash | Shelley | 15 | 10 KB | ~42us | | |
| 58 | +| Body Hash | Babbage | 19 | 83 KB | ~328us | Largest body | |
| 59 | +| Body Hash | Conway | 18 | 39 KB | ~153us | | |
| 60 | + |
| 61 | +### Block Decode (`internal/bench/`) |
| 62 | + |
| 63 | +| Operation | Era | Allocs | Bytes | Throughput | Notes | |
| 64 | +|-----------|-----|--------|-------|------------|-------| |
| 65 | +| CBOR Decode | Byron | 500 | 89 KB | 5.5 MB/s | | |
| 66 | +| CBOR Decode | Shelley | 441 | 100 KB | 8.2 MB/s | | |
| 67 | +| CBOR Decode | Allegra | 1072 | 245 KB | 6.7 MB/s | | |
| 68 | +| CBOR Decode | Mary | 1116 | 235 KB | 5.2 MB/s | | |
| 69 | +| CBOR Decode | Alonzo | 1362 | 363 KB | 6.4 MB/s | | |
| 70 | +| CBOR Decode | Babbage | 5689 | 1014 KB | 3.5 MB/s | | |
| 71 | +| CBOR Decode | Conway | 2652 | 485 KB | 3.5 MB/s | | |
| 72 | +| Parallel Decode | Byron | 500 | 89 KB | 19.6 MB/s | | |
| 73 | +| Parallel Decode | Shelley | 441 | 100 KB | 35.8 MB/s | | |
| 74 | +| Parallel Decode | Babbage | 5690 | 1005 KB | 25.2 MB/s | | |
| 75 | + |
| 76 | +### Transaction Validation (`internal/bench/`) |
| 77 | + |
| 78 | +| Operation | Era | Allocs | Bytes | Time | Notes | |
| 79 | +|-----------|-----|--------|-------|------|-------| |
| 80 | +| Tx Validation | Shelley | 64 | 5.3 KB | ~605us | Simple tx | |
| 81 | +| Tx Validation | Allegra | 32 | 3.7 KB | ~600us | | |
| 82 | +| Tx Validation | Mary | 42 | 4.5 KB | ~553us | | |
| 83 | +| Tx Validation | Alonzo | 44 | 5.3 KB | ~371us | | |
| 84 | +| Tx Validation | Babbage | 310 | 22.1 KB | ~1.4ms | | |
| 85 | +| Tx Validation | Conway | 220 | 18.7 KB | ~1.8ms | | |
| 86 | +| Value Balance | Shelley | 9 | 216 B | ~1.2us | | |
| 87 | +| Value Balance | Alonzo | 21 | 624 B | ~3.0us | | |
| 88 | +| Witness Validation | Shelley | 13 | 624 B | ~3.1us | | |
| 89 | +| Witness Validation | Alonzo | 28 | 6.5 KB | ~33us | | |
| 90 | + |
| 91 | +### Consensus / Leader Election (`internal/bench/`) |
| 92 | + |
| 93 | +| Operation | Allocs | Bytes | Time | Notes | |
| 94 | +|-----------|--------|-------|------|-------| |
| 95 | +| CertifiedNatThreshold | 1221-1224 | 163-168 KB | ~4.1ms | big.Rat arithmetic | |
| 96 | +| VrfLeaderValue | 4 | 528 B | ~1.6us | Blake2b hash | |
| 97 | +| VRFOutputToInt | 1 | 64 B | ~139ns | big.Int conversion | |
| 98 | +| IsSlotLeader | 1240-1244 | 165-169 KB | ~5.3ms | Full leader check | |
| 99 | +| IsVRFOutputBelowThreshold | 5 | 592 B | ~1.8us | Threshold comparison | |
| 100 | +| Full Leader Election Workflow | 1242 | 167 KB | ~5.4ms | Complete flow | |
| 101 | + |
| 102 | +--- |
| 103 | + |
| 104 | +## How to Update Baselines |
| 105 | + |
| 106 | +### Run All Benchmarks |
| 107 | + |
| 108 | +```bash |
| 109 | +# VRF benchmarks |
| 110 | +go test -bench=. -benchmem ./vrf/... -run=^$ 2>&1 | tee vrf_bench.txt |
| 111 | + |
| 112 | +# KES benchmarks |
| 113 | +go test -bench=. -benchmem ./kes/... -run=^$ 2>&1 | tee kes_bench.txt |
| 114 | + |
| 115 | +# Internal benchmarks (block, tx, consensus, CBOR) |
| 116 | +go test -bench=. -benchmem ./internal/bench/... -run=^$ 2>&1 | tee internal_bench.txt |
| 117 | +``` |
| 118 | + |
| 119 | +### Compare Against Previous Run |
| 120 | + |
| 121 | +```bash |
| 122 | +# Install benchstat if needed |
| 123 | +go install golang.org/x/perf/cmd/benchstat@latest |
| 124 | + |
| 125 | +# Compare old vs new |
| 126 | +benchstat old_bench.txt new_bench.txt |
| 127 | +``` |
| 128 | + |
| 129 | +### Generate Memory Profile |
| 130 | + |
| 131 | +```bash |
| 132 | +# CPU profile |
| 133 | +go test -bench=BenchmarkBlockValidation -cpuprofile=cpu.prof ./internal/bench/... |
| 134 | + |
| 135 | +# Memory profile |
| 136 | +go test -bench=BenchmarkBlockValidation -memprofile=mem.prof ./internal/bench/... |
| 137 | + |
| 138 | +# Analyze |
| 139 | +go tool pprof -http=:8080 mem.prof |
| 140 | +``` |
| 141 | + |
| 142 | +### Extract Specific Values |
| 143 | + |
| 144 | +```bash |
| 145 | +# Get allocation counts for VRF Verify |
| 146 | +go test -bench='BenchmarkVerify/Valid' -benchmem ./vrf/... -run=^$ | grep allocs |
| 147 | + |
| 148 | +# Get allocation counts for block validation |
| 149 | +go test -bench='BenchmarkBlockValidation/Era_Conway' -benchmem ./internal/bench/... -run=^$ |
| 150 | +``` |
| 151 | + |
| 152 | +--- |
| 153 | + |
| 154 | +## Optimization History |
| 155 | + |
| 156 | +### Merged PRs (2026-01) |
| 157 | + |
| 158 | +| PR | Focus Area | Impact | |
| 159 | +|----|------------|--------| |
| 160 | +| #1496 | KES optimizations | Reduced allocs in key operations | |
| 161 | +| #1497 | VRF scalar ops | Improved scalar multiplication | |
| 162 | +| #1498 | Block body prealloc | Reduced body decode allocs | |
| 163 | +| #1499 | Byron merkle buffers | Fixed buffer reuse in merkle tree | |
| 164 | +| #1500 | Fixed nonce buffers | Reduced MkInputVrf allocs | |
| 165 | +| #1501 | Plutus context prealloc | Reduced Plutus context building | |
| 166 | +| #1502 | VRF leader value | Optimized leader value computation | |
| 167 | +| #1503 | big.Rat reuse | Reduced threshold calculation allocs | |
| 168 | +| #1529 | CBOR EncMode/DecMode cache | 46-49% faster encode/decode | |
| 169 | + |
| 170 | +### Pre-Optimization Estimates (for reference) |
| 171 | + |
| 172 | +| Operation | Est. Before | Current | Reduction | |
| 173 | +|-----------|-------------|---------|-----------| |
| 174 | +| VRF Verify | ~15 allocs | 11 allocs | ~27% | |
| 175 | +| KES Verify (depth=6) | ~12 allocs | 6 allocs | ~50% | |
| 176 | +| MkInputVrf | ~5 allocs | 3 allocs | ~40% | |
| 177 | +| Threshold Calc | ~2000 allocs | ~1220 allocs | ~39% | |
| 178 | + |
| 179 | +--- |
| 180 | + |
| 181 | +## Regression Thresholds |
| 182 | + |
| 183 | +### CI Failure Criteria |
| 184 | + |
| 185 | +The benchmark CI workflow fails a PR if any of these thresholds are exceeded: |
| 186 | + |
| 187 | +| Metric | Threshold | Rationale | |
| 188 | +|--------|-----------|-----------| |
| 189 | +| Allocation Count | +50% | Catches allocation leaks | |
| 190 | +| Bytes Allocated | +100% | Allows some flexibility for features | |
| 191 | +| Time (ns/op) | +50% | Catches performance regressions | |
| 192 | + |
| 193 | +### Critical Paths |
| 194 | + |
| 195 | +These operations are performance-critical and have stricter monitoring: |
| 196 | + |
| 197 | +| Operation | Max Allocs | Rationale | |
| 198 | +|-----------|------------|-----------| |
| 199 | +| VRF Verify | 15 | Block validation hot path | |
| 200 | +| KES VerifySignedKES | 15 | Block validation hot path | |
| 201 | +| MkInputVrf | 5 | Called for every slot check | |
| 202 | +| Body Hash | 25 | Called for every block | |
| 203 | + |
| 204 | +### How to Request Threshold Increase |
| 205 | + |
| 206 | +If a PR legitimately increases allocations: |
| 207 | + |
| 208 | +1. Document the reason in the PR description |
| 209 | +2. Update this file with new baseline values |
| 210 | +3. Request reviewer approval for threshold increase |
| 211 | + |
| 212 | +--- |
| 213 | + |
| 214 | +## Benchmark Environment Notes |
| 215 | + |
| 216 | +- **CPU Scaling**: Disable CPU frequency scaling for consistent results |
| 217 | +- **Parallel Tests**: Use `-p 1` to avoid contention in parallel benchmarks |
| 218 | +- **Warmup**: Run benchmarks twice; use second run for baselines |
| 219 | +- **Count**: Use `-count=5` and benchstat for statistical significance |
| 220 | + |
| 221 | +```bash |
| 222 | +# Recommended benchmark command for baselines |
| 223 | +go test -bench=. -benchmem -count=5 -p=1 ./internal/bench/... 2>&1 | tee bench.txt |
| 224 | +``` |
| 225 | + |
| 226 | +--- |
| 227 | + |
| 228 | +## Related Documentation |
| 229 | + |
| 230 | +- [README.md](README.md) - Benchmark package documentation |
| 231 | +- [PROFILING.md](PROFILING.md) - Profiling guide |
0 commit comments