-
Notifications
You must be signed in to change notification settings - Fork 26
test: memory profiling #1531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
wolf31o2
wants to merge
1
commit into
main
Choose a base branch
from
feat/memory-profiling
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
test: memory profiling #1531
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| name: Benchmarks | ||
|
|
||
| on: | ||
| push: | ||
| branches: | ||
| - main | ||
| workflow_dispatch: # Allow manual triggering | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| jobs: | ||
| benchmark: | ||
| name: benchmark | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 https://github.com/actions/checkout/releases/tag/v6.0.2 | ||
| with: | ||
| fetch-depth: 0 | ||
| submodules: true | ||
|
|
||
| - uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0 https://github.com/actions/setup-go/releases/tag/v6.2.0 | ||
| with: | ||
| go-version: '1.24' | ||
|
|
||
| - name: Run benchmarks | ||
| run: | | ||
| set -o pipefail | ||
| go test -bench=. -benchmem -count=5 -timeout=30m -run='^$' \ | ||
| ./vrf/... \ | ||
| ./kes/... \ | ||
| ./consensus/... \ | ||
| ./cbor/... \ | ||
| ./pipeline/... \ | ||
| ./ledger/... \ | ||
| ./internal/bench/... \ | ||
| 2>&1 | tee benchmark.txt | ||
|
|
||
| - name: Upload benchmark results | ||
| uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0 https://github.com/actions/upload-artifact/releases/tag/v6.0.0 | ||
| with: | ||
| name: benchmark-${{ github.sha }}-${{ github.run_number }} | ||
| path: benchmark.txt | ||
| retention-days: 30 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,233 @@ | ||
| # Memory Allocation Baselines | ||
|
|
||
| **Last Updated**: 2026-02-22 | ||
| **Go Version**: 1.24+ | ||
| **Platform**: linux/amd64 | ||
|
|
||
| ## Overview | ||
|
|
||
| This document tracks memory allocation baselines for key validation paths in gouroboros. These baselines serve three purposes: | ||
|
|
||
| 1. **Regression Detection**: Allocation regressions are caught by `TestAllocationRegression` tests; benchmark thresholds (>20% allocs, >20% time, >50% bytes) are aspirational targets for future CI enforcement (see [Regression Thresholds](#regression-thresholds)) | ||
| 2. **Optimization Tracking**: Measure impact of performance improvements | ||
| 3. **Contributor Guidance**: Set expectations for new code | ||
|
|
||
| All baseline values were captured after the completion of optimization work in PRs #1496-1503 and #1529. | ||
|
|
||
| --- | ||
|
|
||
| ## Current Baselines | ||
|
|
||
| ### VRF Operations (`vrf/`) | ||
|
|
||
| | Operation | Allocs | Bytes | Time | Notes | | ||
| |-----------|--------|-------|------|-------| | ||
| | VRF KeyGen | 2 | 64 B | ~75us | Seed processing only | | ||
| | VRF Prove | 11 | 736 B | ~760us | Scalar multiplication | | ||
| | VRF Verify | 11 | 816 B | ~950us | Full verification | | ||
| | VRF VerifyAndHash | 11 | 816 B | ~955us | Verify + hash extraction | | ||
| | VRF ProofToHash | 2 | 224 B | ~38us | Hash extraction only | | ||
| | MkInputVrf | 3 | 464 B | ~1.4us | VRF input creation | | ||
|
|
||
| ### KES Operations (`kes/`) | ||
|
|
||
| | Operation | Allocs | Bytes | Time | Notes | | ||
| |-----------|--------|-------|------|-------| | ||
| | KES KeyGen (depth=6) | 255 | 8800 B | ~5ms | Cardano standard depth | | ||
| | KES Sign (depth=6) | 1 | 448 B | ~180us | Single allocation | | ||
| | KES Update (depth=6) | 3 | 736 B | ~78us | Key evolution | | ||
| | KES Verify (depth=6) | 6 | 192 B | ~243us | Signature verification | | ||
| | KES VerifySignedKES | 12 | 616 B | ~265us | Full verification path | | ||
| | KES NewSumKesFromBytes (depth=6) | 6 | 424 B | ~1.1us | Signature deserialization | | ||
| | KES HashPair | 1 | 32 B | ~843ns | Blake2b hash | | ||
|
|
||
| ### Block Validation (`internal/bench/`) | ||
|
|
||
| | Operation | Era | Allocs | Bytes | Time | Notes | | ||
| |-----------|-----|--------|-------|------|-------| | ||
| | Block Validation | Shelley | 461 | 101 KB | ~2.4ms | Full validation | | ||
| | Block Validation | Allegra | 1092 | 246 KB | ~3.1ms | | | ||
| | Block Validation | Mary | 1136 | 236 KB | ~3.0ms | | | ||
| | Block Validation | Alonzo | 1382 | 365 KB | ~3.5ms | | | ||
| | Block Validation | Babbage | 5709 | 1014 KB | ~7.0ms | Largest blocks | | ||
| | Block Validation | Conway | 2672 | 487 KB | ~4.4ms | | | ||
| | Block Validation (pre-parsed) | All Eras | 20 | 1.5 KB | ~0.9ms | Skip decode | | ||
| | VRF Verification | All Eras | 10 | 608 B | ~0.9ms | Block VRF check | | ||
| | KES Verification | All Eras | 12 | 616 B | ~270us | Block KES check | | ||
| | Body Hash | Shelley | 15 | 10 KB | ~42us | | | ||
| | Body Hash | Babbage | 19 | 83 KB | ~328us | Largest body | | ||
| | Body Hash | Conway | 18 | 39 KB | ~153us | | | ||
|
|
||
| ### Block Decode (`internal/bench/`) | ||
|
|
||
| | Operation | Era | Allocs | Bytes | Throughput | Notes | | ||
| |-----------|-----|--------|-------|------------|-------| | ||
| | CBOR Decode | Byron | 500 | 89 KB | 5.5 MB/s | | | ||
| | CBOR Decode | Shelley | 441 | 100 KB | 8.2 MB/s | | | ||
| | CBOR Decode | Allegra | 1072 | 245 KB | 6.7 MB/s | | | ||
| | CBOR Decode | Mary | 1116 | 235 KB | 5.2 MB/s | | | ||
| | CBOR Decode | Alonzo | 1362 | 363 KB | 6.4 MB/s | | | ||
| | CBOR Decode | Babbage | 5689 | 1014 KB | 3.5 MB/s | | | ||
| | CBOR Decode | Conway | 2652 | 485 KB | 3.5 MB/s | | | ||
| | Parallel Decode | Byron | 500 | 89 KB | 19.6 MB/s | | | ||
| | Parallel Decode | Shelley | 441 | 100 KB | 35.8 MB/s | | | ||
| | Parallel Decode | Babbage | 5690 | 1005 KB | 25.2 MB/s | | | ||
|
|
||
| ### Transaction Validation (`internal/bench/`) | ||
|
|
||
| | Operation | Era | Allocs | Bytes | Time | Notes | | ||
| |-----------|-----|--------|-------|------|-------| | ||
| | Tx Validation | Shelley | 64 | 5.3 KB | ~605us | Simple tx | | ||
| | Tx Validation | Allegra | 32 | 3.7 KB | ~600us | | | ||
| | Tx Validation | Mary | 42 | 4.5 KB | ~553us | | | ||
| | Tx Validation | Alonzo | 44 | 5.3 KB | ~371us | | | ||
| | Tx Validation | Babbage | 310 | 22.1 KB | ~1.4ms | | | ||
| | Tx Validation | Conway | 220 | 18.7 KB | ~1.8ms | | | ||
| | Value Balance | Shelley | 9 | 216 B | ~1.2us | | | ||
| | Value Balance | Alonzo | 21 | 624 B | ~3.0us | | | ||
| | Witness Validation | Shelley | 13 | 624 B | ~3.1us | | | ||
| | Witness Validation | Alonzo | 28 | 6.5 KB | ~33us | | | ||
|
|
||
| ### Consensus / Leader Election (`internal/bench/`) | ||
|
|
||
| | Operation | Allocs | Bytes | Time | Notes | | ||
| |-----------|--------|-------|------|-------| | ||
| | CertifiedNatThreshold | 1221-1224 | 163-168 KB | ~4.1ms | big.Rat arithmetic | | ||
| | VrfLeaderValue | 4 | 528 B | ~1.6us | Blake2b hash | | ||
| | VRFOutputToInt | 1 | 64 B | ~139ns | big.Int conversion | | ||
| | IsSlotLeader | 1240-1244 | 165-169 KB | ~5.3ms | Full leader check | | ||
| | IsVRFOutputBelowThreshold | 5 | 592 B | ~1.8us | Threshold comparison | | ||
| | Full Leader Election Workflow | 1242 | 167 KB | ~5.4ms | Complete flow | | ||
|
|
||
| --- | ||
|
|
||
| ## How to Update Baselines | ||
|
|
||
| ### Run All Benchmarks | ||
|
|
||
| ```bash | ||
| # VRF benchmarks | ||
| go test -bench=. -benchmem ./vrf/... -run=^$ 2>&1 | tee vrf_bench.txt | ||
|
|
||
| # KES benchmarks | ||
| go test -bench=. -benchmem ./kes/... -run=^$ 2>&1 | tee kes_bench.txt | ||
|
|
||
| # Internal benchmarks (block, tx, consensus, CBOR) | ||
| go test -bench=. -benchmem ./internal/bench/... -run=^$ 2>&1 | tee internal_bench.txt | ||
| ``` | ||
|
|
||
| ### Compare Against Previous Run | ||
|
|
||
| ```bash | ||
| # Install benchstat if needed | ||
| go install golang.org/x/perf/cmd/benchstat@latest | ||
|
|
||
| # Compare old vs new | ||
| benchstat old_bench.txt new_bench.txt | ||
| ``` | ||
|
|
||
| ### Generate Memory Profile | ||
|
|
||
| ```bash | ||
| # CPU profile | ||
| go test -bench=BenchmarkBlockValidation -cpuprofile=cpu.prof ./internal/bench/... | ||
|
|
||
| # Memory profile | ||
| go test -bench=BenchmarkBlockValidation -memprofile=mem.prof ./internal/bench/... | ||
|
|
||
| # Analyze | ||
| go tool pprof -http=localhost:8080 mem.prof | ||
| ``` | ||
|
|
||
| ### Extract Specific Values | ||
|
|
||
| ```bash | ||
| # Get allocation counts for VRF Verify | ||
| go test -bench='BenchmarkVerify/Valid' -benchmem ./vrf/... -run=^$ | grep allocs | ||
|
|
||
| # Get allocation counts for block validation | ||
| go test -bench='BenchmarkBlockValidation/Era_Conway' -benchmem ./internal/bench/... -run=^$ | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Optimization History | ||
|
|
||
| ### Merged PRs (2026-01) | ||
|
|
||
| | PR | Focus Area | Impact | | ||
| |----|------------|--------| | ||
| | #1496 | KES optimizations | Reduced allocs in key operations | | ||
| | #1497 | VRF scalar ops | Improved scalar multiplication | | ||
| | #1498 | Block body prealloc | Reduced body decode allocs | | ||
| | #1499 | Byron merkle buffers | Fixed buffer reuse in merkle tree | | ||
| | #1500 | Fixed nonce buffers | Reduced MkInputVrf allocs | | ||
| | #1501 | Plutus context prealloc | Reduced Plutus context building | | ||
| | #1502 | VRF leader value | Optimized leader value computation | | ||
| | #1503 | big.Rat reuse | Reduced threshold calculation allocs | | ||
| | #1529 | CBOR EncMode/DecMode cache | 46-49% faster encode/decode | | ||
|
|
||
| ### Pre-Optimization Estimates (for reference) | ||
|
|
||
| | Operation | Est. Before | Current | Reduction | | ||
| |-----------|-------------|---------|-----------| | ||
| | VRF Verify | ~15 allocs | 11 allocs | ~27% | | ||
| | KES Verify (depth=6) | ~12 allocs | 6 allocs | ~50% | | ||
| | MkInputVrf | ~5 allocs | 3 allocs | ~40% | | ||
| | Threshold Calc | ~2000 allocs | ~1220 allocs | ~39% | | ||
|
|
||
| --- | ||
|
|
||
| ## Regression Thresholds | ||
|
|
||
| ### Aspirational Regression Thresholds | ||
|
|
||
| These thresholds guide manual review and future CI enforcement. | ||
| Currently, regression detection is handled by `TestAllocationRegression` | ||
| tests in `regression_test.go`, not by the benchmark CI workflow. | ||
|
|
||
| | Metric | Threshold | Rationale | | ||
| |--------|-----------|-----------| | ||
| | Allocation Count | +20% | Catches allocation leaks | | ||
| | Bytes Allocated | +50% | Allows some flexibility for features | | ||
| | Time (ns/op) | +20% | Catches performance regressions | | ||
|
|
||
| ### Critical Paths | ||
|
|
||
| These operations are performance-critical and have stricter monitoring: | ||
|
|
||
| | Operation | Max Allocs | Rationale | | ||
| |-----------|------------|-----------| | ||
| | VRF Verify | 15 | Block validation hot path | | ||
| | KES VerifySignedKES | 15 | Block validation hot path | | ||
| | MkInputVrf | 5 | Called for every slot check | | ||
| | Body Hash | 25 | Called for every block | | ||
|
|
||
| ### How to Request Threshold Increase | ||
|
|
||
| If a PR legitimately increases allocations: | ||
|
|
||
| 1. Document the reason in the PR description | ||
| 2. Update this file with new baseline values | ||
| 3. Request reviewer approval for threshold increase | ||
|
|
||
| --- | ||
|
|
||
| ## Benchmark Environment Notes | ||
|
|
||
| - **CPU Scaling**: Disable CPU frequency scaling for consistent results | ||
| - **Parallel Tests**: Use `-p 1` to avoid contention in parallel benchmarks | ||
| - **Warmup**: Run benchmarks twice; use second run for baselines | ||
| - **Count**: Use `-count=5` and benchstat for statistical significance | ||
|
|
||
| ```bash | ||
| # Recommended benchmark command for baselines | ||
| go test -bench=. -benchmem -count=5 -p=1 ./internal/bench/... 2>&1 | tee bench.txt | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Related Documentation | ||
|
|
||
| - [README.md](README.md) - Benchmark package documentation | ||
| - [PROFILING.md](PROFILING.md) - Profiling guide |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: The pipeline exits with
tee’s status, sogo testfailures won’t fail the workflow. Capture the pipeline status or enable pipefail to ensure benchmark failures stop the job.Prompt for AI agents