Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: Benchmarks

on:
push:
branches:
- main
workflow_dispatch: # Allow manual triggering

permissions:
contents: read

jobs:
benchmark:
name: benchmark
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 https://github.com/actions/checkout/releases/tag/v6.0.2
with:
fetch-depth: 0
submodules: true

- uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0 https://github.com/actions/setup-go/releases/tag/v6.2.0
with:
go-version: '1.24'

- name: Run benchmarks
run: |
set -o pipefail
go test -bench=. -benchmem -count=5 -timeout=30m -run='^$' \
./vrf/... \
./kes/... \
./consensus/... \
./cbor/... \
./pipeline/... \
./ledger/... \
./internal/bench/... \
2>&1 | tee benchmark.txt
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The pipeline exits with tee’s status, so go test failures won’t fail the workflow. Capture the pipeline status or enable pipefail to ensure benchmark failures stop the job.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/benchmark.yml, line 35:

<comment>The pipeline exits with `tee`’s status, so `go test` failures won’t fail the workflow. Capture the pipeline status or enable pipefail to ensure benchmark failures stop the job.</comment>

<file context>
@@ -0,0 +1,42 @@
+            ./cbor/... \
+            ./pipeline/... \
+            ./ledger/... \
+            2>&1 | tee benchmark.txt
+
+      - name: Upload benchmark results
</file context>
Fix with Cubic


- name: Upload benchmark results
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0 https://github.com/actions/upload-artifact/releases/tag/v6.0.0
with:
name: benchmark-${{ github.sha }}-${{ github.run_number }}
path: benchmark.txt
retention-days: 30
233 changes: 233 additions & 0 deletions internal/bench/BASELINES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
# Memory Allocation Baselines

**Last Updated**: 2026-02-22
**Go Version**: 1.24+
**Platform**: linux/amd64

## Overview

This document tracks memory allocation baselines for key validation paths in gouroboros. These baselines serve three purposes:

1. **Regression Detection**: Allocation regressions are caught by `TestAllocationRegression` tests; benchmark thresholds (>20% allocs, >20% time, >50% bytes) are aspirational targets for future CI enforcement (see [Regression Thresholds](#regression-thresholds))
2. **Optimization Tracking**: Measure impact of performance improvements
3. **Contributor Guidance**: Set expectations for new code

All baseline values were captured after the completion of optimization work in PRs #1496-1503 and #1529.

---

## Current Baselines

### VRF Operations (`vrf/`)

| Operation | Allocs | Bytes | Time | Notes |
|-----------|--------|-------|------|-------|
| VRF KeyGen | 2 | 64 B | ~75us | Seed processing only |
| VRF Prove | 11 | 736 B | ~760us | Scalar multiplication |
| VRF Verify | 11 | 816 B | ~950us | Full verification |
| VRF VerifyAndHash | 11 | 816 B | ~955us | Verify + hash extraction |
| VRF ProofToHash | 2 | 224 B | ~38us | Hash extraction only |
| MkInputVrf | 3 | 464 B | ~1.4us | VRF input creation |

### KES Operations (`kes/`)

| Operation | Allocs | Bytes | Time | Notes |
|-----------|--------|-------|------|-------|
| KES KeyGen (depth=6) | 255 | 8800 B | ~5ms | Cardano standard depth |
| KES Sign (depth=6) | 1 | 448 B | ~180us | Single allocation |
| KES Update (depth=6) | 3 | 736 B | ~78us | Key evolution |
| KES Verify (depth=6) | 6 | 192 B | ~243us | Signature verification |
| KES VerifySignedKES | 12 | 616 B | ~265us | Full verification path |
| KES NewSumKesFromBytes (depth=6) | 6 | 424 B | ~1.1us | Signature deserialization |
| KES HashPair | 1 | 32 B | ~843ns | Blake2b hash |

### Block Validation (`internal/bench/`)

| Operation | Era | Allocs | Bytes | Time | Notes |
|-----------|-----|--------|-------|------|-------|
| Block Validation | Shelley | 461 | 101 KB | ~2.4ms | Full validation |
| Block Validation | Allegra | 1092 | 246 KB | ~3.1ms | |
| Block Validation | Mary | 1136 | 236 KB | ~3.0ms | |
| Block Validation | Alonzo | 1382 | 365 KB | ~3.5ms | |
| Block Validation | Babbage | 5709 | 1014 KB | ~7.0ms | Largest blocks |
| Block Validation | Conway | 2672 | 487 KB | ~4.4ms | |
| Block Validation (pre-parsed) | All Eras | 20 | 1.5 KB | ~0.9ms | Skip decode |
| VRF Verification | All Eras | 10 | 608 B | ~0.9ms | Block VRF check |
| KES Verification | All Eras | 12 | 616 B | ~270us | Block KES check |
| Body Hash | Shelley | 15 | 10 KB | ~42us | |
| Body Hash | Babbage | 19 | 83 KB | ~328us | Largest body |
| Body Hash | Conway | 18 | 39 KB | ~153us | |

### Block Decode (`internal/bench/`)

| Operation | Era | Allocs | Bytes | Throughput | Notes |
|-----------|-----|--------|-------|------------|-------|
| CBOR Decode | Byron | 500 | 89 KB | 5.5 MB/s | |
| CBOR Decode | Shelley | 441 | 100 KB | 8.2 MB/s | |
| CBOR Decode | Allegra | 1072 | 245 KB | 6.7 MB/s | |
| CBOR Decode | Mary | 1116 | 235 KB | 5.2 MB/s | |
| CBOR Decode | Alonzo | 1362 | 363 KB | 6.4 MB/s | |
| CBOR Decode | Babbage | 5689 | 1014 KB | 3.5 MB/s | |
| CBOR Decode | Conway | 2652 | 485 KB | 3.5 MB/s | |
| Parallel Decode | Byron | 500 | 89 KB | 19.6 MB/s | |
| Parallel Decode | Shelley | 441 | 100 KB | 35.8 MB/s | |
| Parallel Decode | Babbage | 5690 | 1005 KB | 25.2 MB/s | |

### Transaction Validation (`internal/bench/`)

| Operation | Era | Allocs | Bytes | Time | Notes |
|-----------|-----|--------|-------|------|-------|
| Tx Validation | Shelley | 64 | 5.3 KB | ~605us | Simple tx |
| Tx Validation | Allegra | 32 | 3.7 KB | ~600us | |
| Tx Validation | Mary | 42 | 4.5 KB | ~553us | |
| Tx Validation | Alonzo | 44 | 5.3 KB | ~371us | |
| Tx Validation | Babbage | 310 | 22.1 KB | ~1.4ms | |
| Tx Validation | Conway | 220 | 18.7 KB | ~1.8ms | |
| Value Balance | Shelley | 9 | 216 B | ~1.2us | |
| Value Balance | Alonzo | 21 | 624 B | ~3.0us | |
| Witness Validation | Shelley | 13 | 624 B | ~3.1us | |
| Witness Validation | Alonzo | 28 | 6.5 KB | ~33us | |

### Consensus / Leader Election (`internal/bench/`)

| Operation | Allocs | Bytes | Time | Notes |
|-----------|--------|-------|------|-------|
| CertifiedNatThreshold | 1221-1224 | 163-168 KB | ~4.1ms | big.Rat arithmetic |
| VrfLeaderValue | 4 | 528 B | ~1.6us | Blake2b hash |
| VRFOutputToInt | 1 | 64 B | ~139ns | big.Int conversion |
| IsSlotLeader | 1240-1244 | 165-169 KB | ~5.3ms | Full leader check |
| IsVRFOutputBelowThreshold | 5 | 592 B | ~1.8us | Threshold comparison |
| Full Leader Election Workflow | 1242 | 167 KB | ~5.4ms | Complete flow |

---

## How to Update Baselines

### Run All Benchmarks

```bash
# VRF benchmarks
go test -bench=. -benchmem ./vrf/... -run=^$ 2>&1 | tee vrf_bench.txt

# KES benchmarks
go test -bench=. -benchmem ./kes/... -run=^$ 2>&1 | tee kes_bench.txt

# Internal benchmarks (block, tx, consensus, CBOR)
go test -bench=. -benchmem ./internal/bench/... -run=^$ 2>&1 | tee internal_bench.txt
```

### Compare Against Previous Run

```bash
# Install benchstat if needed
go install golang.org/x/perf/cmd/benchstat@latest

# Compare old vs new
benchstat old_bench.txt new_bench.txt
```

### Generate Memory Profile

```bash
# CPU profile
go test -bench=BenchmarkBlockValidation -cpuprofile=cpu.prof ./internal/bench/...

# Memory profile
go test -bench=BenchmarkBlockValidation -memprofile=mem.prof ./internal/bench/...

# Analyze
go tool pprof -http=localhost:8080 mem.prof
```

### Extract Specific Values

```bash
# Get allocation counts for VRF Verify
go test -bench='BenchmarkVerify/Valid' -benchmem ./vrf/... -run=^$ | grep allocs

# Get allocation counts for block validation
go test -bench='BenchmarkBlockValidation/Era_Conway' -benchmem ./internal/bench/... -run=^$
```

---

## Optimization History

### Merged PRs (2026-01)

| PR | Focus Area | Impact |
|----|------------|--------|
| #1496 | KES optimizations | Reduced allocs in key operations |
| #1497 | VRF scalar ops | Improved scalar multiplication |
| #1498 | Block body prealloc | Reduced body decode allocs |
| #1499 | Byron merkle buffers | Fixed buffer reuse in merkle tree |
| #1500 | Fixed nonce buffers | Reduced MkInputVrf allocs |
| #1501 | Plutus context prealloc | Reduced Plutus context building |
| #1502 | VRF leader value | Optimized leader value computation |
| #1503 | big.Rat reuse | Reduced threshold calculation allocs |
| #1529 | CBOR EncMode/DecMode cache | 46-49% faster encode/decode |

### Pre-Optimization Estimates (for reference)

| Operation | Est. Before | Current | Reduction |
|-----------|-------------|---------|-----------|
| VRF Verify | ~15 allocs | 11 allocs | ~27% |
| KES Verify (depth=6) | ~12 allocs | 6 allocs | ~50% |
| MkInputVrf | ~5 allocs | 3 allocs | ~40% |
| Threshold Calc | ~2000 allocs | ~1220 allocs | ~39% |

---

## Regression Thresholds

### Aspirational Regression Thresholds

These thresholds guide manual review and future CI enforcement.
Currently, regression detection is handled by `TestAllocationRegression`
tests in `regression_test.go`, not by the benchmark CI workflow.

| Metric | Threshold | Rationale |
|--------|-----------|-----------|
| Allocation Count | +20% | Catches allocation leaks |
| Bytes Allocated | +50% | Allows some flexibility for features |
| Time (ns/op) | +20% | Catches performance regressions |

### Critical Paths

These operations are performance-critical and have stricter monitoring:

| Operation | Max Allocs | Rationale |
|-----------|------------|-----------|
| VRF Verify | 15 | Block validation hot path |
| KES VerifySignedKES | 15 | Block validation hot path |
| MkInputVrf | 5 | Called for every slot check |
| Body Hash | 25 | Called for every block |

### How to Request Threshold Increase

If a PR legitimately increases allocations:

1. Document the reason in the PR description
2. Update this file with new baseline values
3. Request reviewer approval for threshold increase

---

## Benchmark Environment Notes

- **CPU Scaling**: Disable CPU frequency scaling for consistent results
- **Parallel Tests**: Use `-p 1` to avoid contention in parallel benchmarks
- **Warmup**: Run benchmarks twice; use second run for baselines
- **Count**: Use `-count=5` and benchstat for statistical significance

```bash
# Recommended benchmark command for baselines
go test -bench=. -benchmem -count=5 -p=1 ./internal/bench/... 2>&1 | tee bench.txt
```

---

## Related Documentation

- [README.md](README.md) - Benchmark package documentation
- [PROFILING.md](PROFILING.md) - Profiling guide
Loading