Skip to content

test: memory profiling#1531

Open
wolf31o2 wants to merge 1 commit intomainfrom
feat/memory-profiling
Open

test: memory profiling#1531
wolf31o2 wants to merge 1 commit intomainfrom
feat/memory-profiling

Conversation

@wolf31o2
Copy link
Member

@wolf31o2 wolf31o2 commented Feb 10, 2026


Summary by cubic

Adds a memory profiling and benchmark suite across core validation paths, with docs and CI to run, compare, and profile to catch regressions.

  • New Features
    • Benchmarks for blocks, txs, CBOR, consensus (VRF/KES), and Plutus scripts using shared helpers and real fixtures.
    • Allocation regression tests with per-era baselines; fail on >20% alloc/time or >50% bytes.
    • Profiling tests behind the "profile" build tag and scripts/profile.sh for CPU/heap pprof and benchstat diffs; guided by PROFILING.md.
    • Docs (README, BASELINES.md for linux/amd64, Go 1.24+) and a GitHub Actions workflow (push to main + manual) running vrf/kes/consensus/cbor/pipeline/ledger/internal/bench, archiving results.

Written for commit 854b7c2. Summary will update on new commits.

Summary by CodeRabbit

  • Tests

    • Added extensive performance, profiling, and allocation-regression test suites covering block validation, CBOR, transactions, consensus, VRF/KES, native/Plutus scripts, and helpers.
  • Chores

    • Added CI benchmark workflow to run and archive benchmark outputs, a profiling orchestration script, benchmark helpers, and realistic fixture data.
  • Documentation

    • Added comprehensive benchmarking docs: README, profiling guide, baselines, and testdata usage notes.

@wolf31o2 wolf31o2 requested review from a team as code owners February 10, 2026 22:51
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 10, 2026

📝 Walkthrough

Walkthrough

Adds a benchmarking and profiling subsystem under internal/bench with extensive Go benchmark suites (blocks, CBOR, transactions, consensus, scripts, VRF/KES), allocation-regression and profile-tagged tests, helpers, fixtures and unit tests, baseline and profiling documentation (BASELINES.md, PROFILING.md, README, testdata README), a CI workflow (.github/workflows/benchmark.yml) to run and upload benchmarks, and a profiling helper script (scripts/profile.sh).

Possibly related issues

Possibly related PRs

  • feat: consensus #1431 — Benchmarks exercise consensus/VRF/KES helpers (MkInputVrf, KES verification, leader-election) that are implemented or modified in that PR.
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'test: memory profiling' directly and clearly summarizes the main objective of this PR: adding memory profiling and benchmark capabilities to track allocations across core validation paths.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/memory-profiling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 issues found across 15 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="internal/bench/block_bench_test.go">

<violation number="1" location="internal/bench/block_bench_test.go:317">
P2: Guard against unsigned underflow when computing the KES period offset; if the fixture’s kes period is ahead of the slot-derived period, skip or handle it explicitly instead of wrapping.</violation>
</file>

<file name="internal/bench/regression_test.go">

<violation number="1" location="internal/bench/regression_test.go:254">
P2: Check the VRF verification result inside the allocation loop; otherwise the regression test can pass even when verification fails.</violation>

<violation number="2" location="internal/bench/regression_test.go:264">
P3: Validate the VerifyAndHash error inside the allocation loop so failures don't get masked.</violation>

<violation number="3" location="internal/bench/regression_test.go:274">
P3: Handle the vrf.Prove error during allocation measurement to ensure the test fails on invalid proofs.</violation>

<violation number="4" location="internal/bench/regression_test.go:299">
P3: Assert the KES verification result so allocation measurements only reflect the success path.</violation>

<violation number="5" location="internal/bench/regression_test.go:380">
P2: Fail the test if CBOR decoding returns an error so regression results stay meaningful.</violation>
</file>

<file name="internal/bench/helpers.go">

<violation number="1" location="internal/bench/helpers.go:15">
P3: Several added lines exceed the 80-character limit enforced by golines; wrap comments and long signatures/lines to avoid lint failures.</violation>
</file>

<file name="internal/bench/tx_bench_test.go">

<violation number="1" location="internal/bench/tx_bench_test.go:63">
P2: Benchmark results are discarded, which can let the compiler optimize away the work and skew benchmark results. Store results in a package-level sink variable as recommended in the bench README to prevent dead-code elimination.</violation>
</file>

<file name="internal/bench/profile_test.go">

<violation number="1" location="internal/bench/profile_test.go:99">
P2: VerifyBlock errors are discarded, so the profiling test can silently skip the rest of the validation path when verification fails (VerifyBlock returns early on VRF/KES/body hash errors). Capture and handle the error to avoid profiling the wrong code path.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.


for _, kd := range kesDataList {
b.Run("Era_"+kd.name, func(b *testing.B) {
currentKesPeriod := kd.slot / defaultSlotsPerKesPeriod
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Guard against unsigned underflow when computing the KES period offset; if the fixture’s kes period is ahead of the slot-derived period, skip or handle it explicitly instead of wrapping.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/block_bench_test.go, line 317:

<comment>Guard against unsigned underflow when computing the KES period offset; if the fixture’s kes period is ahead of the slot-derived period, skip or handle it explicitly instead of wrapping.</comment>

<file context>
@@ -0,0 +1,585 @@
+
+	for _, kd := range kesDataList {
+		b.Run("Era_"+kd.name, func(b *testing.B) {
+			currentKesPeriod := kd.slot / defaultSlotsPerKesPeriod
+			t := currentKesPeriod - kd.kesPeriod
+
</file context>
Fix with Cubic


t.Run("Verify", func(t *testing.T) {
allocs := testing.AllocsPerRun(100, func() {
_, _ = vrf.Verify(vrfPK, vrfProof, vrfOutput, vrfAlpha)
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Check the VRF verification result inside the allocation loop; otherwise the regression test can pass even when verification fails.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/regression_test.go, line 254:

<comment>Check the VRF verification result inside the allocation loop; otherwise the regression test can pass even when verification fails.</comment>

<file context>
@@ -0,0 +1,392 @@
+
+	t.Run("Verify", func(t *testing.T) {
+		allocs := testing.AllocsPerRun(100, func() {
+			_, _ = vrf.Verify(vrfPK, vrfProof, vrfOutput, vrfAlpha)
+		})
+		limit := 12.0 * multiplier // Baseline: 11 allocs
</file context>
Fix with Cubic


// VerifyBlock includes VRF/KES verification, body hash, etc.
// With our config, transaction validation is skipped.
_, _, _, _, _ = ledger.VerifyBlock(block, eta0Hex, slotsPerKes, config)
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: VerifyBlock errors are discarded, so the profiling test can silently skip the rest of the validation path when verification fails (VerifyBlock returns early on VRF/KES/body hash errors). Capture and handle the error to avoid profiling the wrong code path.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/profile_test.go, line 99:

<comment>VerifyBlock errors are discarded, so the profiling test can silently skip the rest of the validation path when verification fails (VerifyBlock returns early on VRF/KES/body hash errors). Capture and handle the error to avoid profiling the wrong code path.</comment>

<file context>
@@ -0,0 +1,446 @@
+
+				// VerifyBlock includes VRF/KES verification, body hash, etc.
+				// With our config, transaction validation is skipped.
+				_, _, _, _, _ = ledger.VerifyBlock(block, eta0Hex, slotsPerKes, config)
+			}
+		}
</file context>
Fix with Cubic


t.Run("VerifyAndHash", func(t *testing.T) {
allocs := testing.AllocsPerRun(100, func() {
_, _ = vrf.VerifyAndHash(vrfPK, vrfProof, vrfAlpha)
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Validate the VerifyAndHash error inside the allocation loop so failures don't get masked.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/regression_test.go, line 264:

<comment>Validate the VerifyAndHash error inside the allocation loop so failures don't get masked.</comment>

<file context>
@@ -0,0 +1,392 @@
+
+	t.Run("VerifyAndHash", func(t *testing.T) {
+		allocs := testing.AllocsPerRun(100, func() {
+			_, _ = vrf.VerifyAndHash(vrfPK, vrfProof, vrfAlpha)
+		})
+		limit := 12.0 * multiplier // Baseline: 11 allocs
</file context>
Fix with Cubic

@@ -0,0 +1,199 @@
// Copyright 2026 Blink Labs Software
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Several added lines exceed the 80-character limit enforced by golines; wrap comments and long signatures/lines to avoid lint failures.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/helpers.go, line 15:

<comment>Several added lines exceed the 80-character limit enforced by golines; wrap comments and long signatures/lines to avoid lint failures.</comment>

<file context>
@@ -0,0 +1,199 @@
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// Package bench provides benchmark utilities and test fixtures for memory profiling.
+package bench
+
</file context>
Fix with Cubic

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Fix all issues with AI agents
In @.github/workflows/benchmark.yml:
- Around line 43-58: The base benchmark step currently checks out BASE_SHA and
runs go test against a fixed list of package paths (e.g., ./vrf/...,
./pipeline/..., ./ledger/...) which will fail if some of those paths don't exist
on the base branch; after checking out BASE_SHA, detect which of the listed
package paths actually exist (for example by running git ls-tree or test -d / go
list on each path) and build the go test argument list from only those existing
paths, then run go test -bench... -benchmem ... and tee to benchmark-base.txt;
alternatively, make the go test invocation tolerant of missing packages by
filtering them out or skipping the step failure for missing-path errors before
checking out HEAD_SHA.

In `@internal/bench/BASELINES.md`:
- Around line 186-192: The time regression threshold is inconsistent between
BASELINES.md ("Time (ns/op) | +50%") and README.md ("Time increases by more than
20%"); pick a canonical threshold and make them match: update the "Time (ns/op)"
cell in BASELINES.md and the corresponding sentence in README.md to the chosen
value, and also update the benchmark/CI configuration that enforces the
threshold (the bench threshold variable or workflow step that checks time
regressions) so the docs and CI are consistent. Ensure the updated value is used
everywhere and adjust any explanatory text to match.

In `@internal/bench/block_bench_test.go`:
- Around line 315-318: The subtraction currentKesPeriod - kd.kesPeriod can
underflow (both are uint64); inside the b.Run closure that iterates kesDataList
(where kd, currentKesPeriod and defaultSlotsPerKesPeriod are defined) add a
guard: if currentKesPeriod < kd.kesPeriod then handle that case (e.g. set t to
0, skip the subtest, or convert to a signed type before subtracting) otherwise
perform t := currentKesPeriod - kd.kesPeriod; update any downstream logic that
uses t accordingly to avoid relying on a wrapped large value.

In `@internal/bench/cbor_bench_test.go`:
- Around line 113-119: Replace the unsafe b.Fatalf call inside the goroutines
spawned by b.RunParallel: capture the first decode error from inside the
parallel loop into a shared container (e.g., an error variable set once via
sync/atomic.Value or a buffered chan error) instead of calling b.Fatalf
directly, then after b.RunParallel returns check that container and call
b.Fatalf from the benchmark goroutine if an error was recorded; apply the same
change for both the block decode loop (where NewBlockFromCbor is called) and the
BenchmarkCBORDecodeTxParallel test.

In `@internal/bench/consensus_bench_test.go`:
- Line 1: Update the file header year to match the rest of the PR: change the
copyright year in internal/bench/consensus_bench_test.go from 2025 to 2026 so it
is consistent with other new files.

In `@internal/bench/README.md`:
- Around line 122-130: The README references a missing file
PLAN_MEMORY_PROFILING.md and lists "Target Allocs" (e.g., Threshold Calc, VRF
Verify, KES Verify, Leader Check, MkInputVrf) that conflict with actual
baselines in BASELINES.md; update the table to either remove or correct the
stale reference and clarify whether the numbers are aspirational by (a) adding a
parenthetical "aspirational target" note for each target allocation row or (b)
replacing the target values with the current baseline numbers from BASELINES.md,
and remove or add a link to PLAN_MEMORY_PROFILING.md if that file is intended to
exist so readers are not misled.

In `@internal/bench/regression_test.go`:
- Line 45: Fix the typo in the inline comment for kesSeed: change "lenght" to
"length" but do not modify the kesSeed value itself (the string must remain
unchanged to match the canonical test vector); update the comment next to
kesSeed to read something like "Matches canonical test vector; string length
spelled correctly" or simply correct "lenght" → "length" while keeping kesSeed =
[]byte("test string of 32 byte of lenght") intact.

In `@internal/bench/script_bench_test.go`:
- Around line 258-270: The benchmark name construction is fragile: replace the
expression using string(rune('0'+depth)) inside the b.Run call with a robust
formatted string using fmt.Sprintf (e.g., fmt.Sprintf("%s/Depth%d", scriptType,
depth)); update the import to include fmt if missing and ensure the b.Run
identifier that currently builds the name (where scriptTypes, depths,
createNestedScript and b.Run are used) uses the new formatted name so
multi-digit depths render correctly.
- Around line 48-119: The helper functions createPubkeyScript, createAllScript,
createAnyScript, createNofKScript, createInvalidBeforeScript, and
createInvalidHereafterScript currently ignore errors from cbor.Encode and
ns.UnmarshalCBOR; update each to surface failures by either (A) adding a
testing.TB parameter and calling tb.Fatal/ tb.Fatalf when cbor.Encode or
ns.UnmarshalCBOR returns an error, or (B) if you prefer not to thread testing.TB
through tests, panic on any non-nil error from cbor.Encode or ns.UnmarshalCBOR
so encoding/unmarshal failures fail fast during test setup.

In `@internal/bench/tx_bench_test.go`:
- Around line 36-44: The benchmark is passing a BlockType (from
BlockTypeFromEra) into ledger.NewTransactionFromCbor which expects a TxType, and
the error from BlockTypeFromEra is being ignored; add a conversion so you
compute a txType (either by implementing a TxTypeFromEra helper or a simple
era->TxType map) and pass that txType into ledger.NewTransactionFromCbor instead
of blockType, and check the returned error from the era->type conversion before
running the benchmark (i.e., replace BlockTypeFromEra(era) with a proper TxType
resolution and handle its error).
🧹 Nitpick comments (15)
internal/bench/README.md (1)

21-21: Add language specifiers to fenced code blocks.

Static analysis flagged lines 21, 109, and 134 as fenced code blocks without a language specified. Adding a language (e.g., text or the appropriate language) improves rendering and satisfies markdownlint (MD040).

internal/bench/helpers.go (1)

142-193: Consider consolidating the parallel era switch statements.

BlockTypeFromEra and blockCborFromEra both have identical switch structures over era names. A map-based approach (e.g., map[string]struct{ blockType uint; hex string }) would reduce duplication and make adding new eras a single-point change.

internal/bench/helpers_test.go (2)

31-60: Consider using ledger block type constants instead of hardcoded integers.

The expected values (1-7) are hardcoded magic numbers. Using ledger.BlockTypeByronMain, ledger.BlockTypeShelley, etc. would make the test self-documenting and resilient to constant value changes.

Example for the first few entries
+	"github.com/blinklabs-io/gouroboros/ledger"
 ...
 	tests := []struct {
 		era      string
 		expected uint
 		wantErr  bool
 	}{
-		{"byron", 1, false},
-		{"shelley", 2, false},
+		{"byron", ledger.BlockTypeByronMain, false},
+		{"shelley", ledger.BlockTypeShelley, false},
 		// ... and so on

62-119: Missing test coverage for LoadTxFixture, MustLoadTxFixture, and ledger state variants.

There are no tests for LoadTxFixture/MustLoadTxFixture or for BenchLedgerStateWithUtxos/BenchLedgerStateWithUtxoFunc. These are exported functions that benchmark tests rely on. Consider adding coverage for at least the happy path of tx fixture loading and the ledger state variants.

internal/bench/script_bench_test.go (1)

339-343: All three Plutus version benchmarks use identical script bytes.

testPlutusV1ScriptHex and testPlutusV2ScriptHex are aliased to testPlutusV3ScriptHex, making the V1/V2/V3 sub-benchmarks measure the exact same thing. The comments acknowledge this, but it could be misleading when comparing results across versions. Consider adding a comment in the benchmark output or using distinct (even synthetic) scripts per version if available.

internal/bench/block_bench_test.go (1)

232-327: Significant code duplication with BenchmarkVerifyKesComponents (lines 461–563).

The kesData struct and the entire header type-switch for extracting KES fields (bodyCbor, signature, hotVkey, kesPeriod) are duplicated verbatim between BenchmarkBlockKESVerification and BenchmarkVerifyKesComponents. Consider extracting a helper function like extractKesData(blocks []blockData) []kesData to eliminate the ~80 duplicated lines.

♻️ Sketch of proposed extraction

Define the kesData struct and the extraction helper at file scope:

type kesData struct {
	name      string
	bodyCbor  []byte
	signature []byte
	hotVkey   []byte
	kesPeriod uint64
	slot      uint64
}

func extractKesData(blocks []blockData) []kesData {
	result := make([]kesData, 0, len(blocks))
	for _, bd := range blocks {
		header := bd.block.Header()
		var bodyCbor []byte
		var sig []byte
		var hotVkey []byte
		var kesPeriod uint64
		var err error

		switch h := header.(type) {
		// ... same switch cases ...
		default:
			continue
		}
		result = append(result, kesData{
			name:      bd.name,
			bodyCbor:  bodyCbor,
			signature: sig,
			hotVkey:   hotVkey,
			kesPeriod: kesPeriod,
			slot:      header.SlotNumber(),
		})
	}
	return result
}

Then both benchmarks would simply call extractKesData(postByronBlocks).

internal/bench/cbor_bench_test.go (1)

326-331: Inconsistent Byron filtering: string comparison vs. block type constant.

Line 329 checks tb.name == "Byron" while other benchmarks (e.g., block_bench_test.go:88, cbor_bench_test.go:412) use BlockTypeByronMain/BlockTypeByronEbb constants. The string comparison is fragile and could miss a Byron block with a different name or fail to skip EBB blocks.

♻️ Suggested fix
-		if tb.name == "Byron" {
+		if tb.blockType == ledger.BlockTypeByronMain || tb.blockType == ledger.BlockTypeByronEbb {
 			continue
 		}

This would require changing testBlockData to store blockType (which it already does at line 41).

internal/bench/tx_bench_test.go (1)

24-47: Several benchmarks silently skip eras with no transactions or empty CBOR.

The pattern of continue at lines 29, 33 means no sub-benchmark is registered for those eras. If a fixture is broken or missing, this silently produces incomplete benchmark results. Consider using b.Skip() inside the b.Run closure instead, so the skipped era is visible in output.

internal/bench/profile_test.go (2)

411-423: Native script detection logic doesn't actually find native scripts.

The check tx.Cbor() != nil && len(tx.Cbor()) > 0 is true for any transaction with CBOR data, not only those containing native script witnesses. The scriptsFound flag is set but never meaningfully used to drive the actual profiling subtests below. Consider either removing this dead detection block or actually extracting and evaluating native scripts from witnesses.


427-445: Profiling subtests measure trivial Go operations, not native script evaluation.

EmptyKeyHashLookup profiles a map lookup and SlotComparison profiles integer comparisons — both are basic Go operations unrelated to the actual NativeScript.Evaluate code path. These won't produce useful profiling data for native script performance. If actual script evaluation profiling is intended, consider constructing synthetic NativeScript instances and calling Evaluate.

scripts/profile.sh (2)

105-119: Unquoted $cmd_args relies on word splitting for argument passing.

Using a bash array would be more robust and avoid potential issues if any argument ever contains spaces or glob characters.

♻️ Suggested improvement
-    local cmd_args="-tags=profile -run=$test_name -timeout=10m"
+    local -a cmd_args=(-tags=profile "-run=$test_name" -timeout=10m)

     case "$PROFILE_TYPE" in
         cpu)
             echo "Generating CPU profile: $cpu_prof"
-            go test $cmd_args -cpuprofile="$cpu_prof" ./internal/bench/...
+            go test "${cmd_args[@]}" -cpuprofile="$cpu_prof" ./internal/bench/...
             ;;
         mem)
             echo "Generating memory profile: $mem_prof"
-            go test $cmd_args -memprofile="$mem_prof" ./internal/bench/...
+            go test "${cmd_args[@]}" -memprofile="$mem_prof" ./internal/bench/...
             ;;
         both)
             echo "Generating CPU profile: $cpu_prof"
             echo "Generating memory profile: $mem_prof"
-            go test $cmd_args -cpuprofile="$cpu_prof" -memprofile="$mem_prof" ./internal/bench/...
+            go test "${cmd_args[@]}" -cpuprofile="$cpu_prof" -memprofile="$mem_prof" ./internal/bench/...
             ;;

62-64: Hardcoded line numbers for help text extraction are fragile.

If the header comments are edited (added/removed lines), sed -n '16,44p' will extract the wrong range. Consider using a marker-based approach (e.g., extracting lines between # Usage: and the next blank line) or simply inlining the help text in a heredoc.

internal/bench/regression_test.go (1)

191-246: panic in helper functions will crash the test process without useful diagnostics.

The *Once helpers (e.g., vrfVerifyOnce, kesVerifyOnce, cborBlockDecodeOnce) use panic on failure. Since they're called inside testing.AllocsPerRun, a failure would crash the entire process rather than produce a proper test failure with -v output. This is a pragmatic tradeoff given the AllocsPerRun API, but worth documenting with a brief comment explaining why panic is used instead of t.Fatal.

.github/workflows/benchmark.yml (2)

29-30: Pin benchstat to a specific version for reproducibility.

All other actions in this workflow are pinned by commit hash, but benchstat@latest can silently change behavior or output format between runs, potentially affecting regression detection.

Suggested fix
      - name: Install benchstat
-       run: go install golang.org/x/perf/cmd/benchstat@latest
+       run: go install golang.org/x/perf/cmd/benchstat@v0.0.0-20250210151756-PICK_A_COMMIT

Pick a known-good commit hash from https://pkg.go.dev/golang.org/x/perf.


63-87: Quote shell variables to prevent globbing and word splitting.

Static analysis (SC2086) flags unquoted $GITHUB_STEP_SUMMARY and $GITHUB_OUTPUT. While GHA controls these paths, quoting is a good defensive practice.

Suggested fix
-          echo "## Benchmark Comparison" >> $GITHUB_STEP_SUMMARY
-          echo "" >> $GITHUB_STEP_SUMMARY
-          echo '```' >> $GITHUB_STEP_SUMMARY
-          cat benchmark-comparison.txt >> $GITHUB_STEP_SUMMARY
-          echo '```' >> $GITHUB_STEP_SUMMARY
+          {
+            echo "## Benchmark Comparison"
+            echo ""
+            echo '```'
+            cat benchmark-comparison.txt
+            echo '```'
+          } >> "$GITHUB_STEP_SUMMARY"
           ...
-          echo "regression_found=$REGRESSION_FOUND" >> $GITHUB_OUTPUT
+          echo "regression_found=$REGRESSION_FOUND" >> "$GITHUB_OUTPUT"

Comment on lines 186 to 192

| Metric | Threshold | Rationale |
|--------|-----------|-----------|
| Allocation Count | +50% | Catches allocation leaks |
| Bytes Allocated | +100% | Allows some flexibility for features |
| Time (ns/op) | +50% | Catches performance regressions |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent time regression threshold with README.md.

BASELINES.md states the time threshold is +50%, but README.md (line 397) states "Time increases by more than 20%." These should be consistent to avoid confusion among contributors. Please reconcile the threshold across both documents (and ideally with the actual CI workflow configuration).

🤖 Prompt for AI Agents
In `@internal/bench/BASELINES.md` around lines 186 - 192, The time regression
threshold is inconsistent between BASELINES.md ("Time (ns/op) | +50%") and
README.md ("Time increases by more than 20%"); pick a canonical threshold and
make them match: update the "Time (ns/op)" cell in BASELINES.md and the
corresponding sentence in README.md to the chosen value, and also update the
benchmark/CI configuration that enforces the threshold (the bench threshold
variable or workflow step that checks time regressions) so the docs and CI are
consistent. Ensure the updated value is used everywhere and adjust any
explanatory text to match.

Comment on lines 113 to 119
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
_, err := ledger.NewBlockFromCbor(tb.blockType, tb.cbor)
if err != nil {
b.Fatalf("failed to decode %s block: %v", tb.name, err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

b.Fatalf inside b.RunParallel causes undefined behavior.

Go's testing package requires that Fatal/Fatalf/FailNow are only called from the goroutine running the benchmark function. RunParallel spawns multiple goroutines, so calling b.Fatalf here leads to a data race and may panic or produce corrupt results. The same issue exists in BenchmarkCBORDecodeTxParallel (line 154).

🐛 Proposed fix
 		b.RunParallel(func(pb *testing.PB) {
 			for pb.Next() {
 				_, err := ledger.NewBlockFromCbor(tb.blockType, tb.cbor)
 				if err != nil {
-					b.Fatalf("failed to decode %s block: %v", tb.name, err)
+					b.Errorf("failed to decode %s block: %v", tb.name, err)
+					return
 				}
 			}
 		})

Apply the same pattern at line 154 for BenchmarkCBORDecodeTxParallel.

🤖 Prompt for AI Agents
In `@internal/bench/cbor_bench_test.go` around lines 113 - 119, Replace the unsafe
b.Fatalf call inside the goroutines spawned by b.RunParallel: capture the first
decode error from inside the parallel loop into a shared container (e.g., an
error variable set once via sync/atomic.Value or a buffered chan error) instead
of calling b.Fatalf directly, then after b.RunParallel returns check that
container and call b.Fatalf from the benchmark goroutine if an error was
recorded; apply the same change for both the block decode loop (where
NewBlockFromCbor is called) and the BenchmarkCBORDecodeTxParallel test.

vrfEpochNonce = make([]byte, 32)

// KES test data
kesSeed = []byte("test string of 32 byte of lenght") // Matches canonical test vector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in comment: "lenght" → "length".

"test string of 32 byte of lenght" — the string itself likely needs to stay unchanged to match the canonical test vector, but the comment could clarify this.

🤖 Prompt for AI Agents
In `@internal/bench/regression_test.go` at line 45, Fix the typo in the inline
comment for kesSeed: change "lenght" to "length" but do not modify the kesSeed
value itself (the string must remain unchanged to match the canonical test
vector); update the comment next to kesSeed to read something like "Matches
canonical test vector; string length spelled correctly" or simply correct
"lenght" → "length" while keeping kesSeed = []byte("test string of 32 byte of
lenght") intact.

@wolf31o2 wolf31o2 force-pushed the feat/memory-profiling branch from b334cf3 to c7b7371 Compare February 10, 2026 23:38
@github-actions
Copy link

Benchmark Results

Benchmark Comparison (click to expand)
goos: linux
goarch: amd64
pkg: github.com/blinklabs-io/gouroboros/cbor
cpu: AMD EPYC 7763 64-Core Processor                
         │ benchmark-base.txt │       benchmark-current.txt        │
         │       sec/op       │    sec/op     vs base              │
Decode-4         460.6n ± ∞ ¹   459.3n ± ∞ ¹       ~ (p=0.111 n=5)
Encode-4         360.7n ± ∞ ¹   363.5n ± ∞ ¹  +0.78% (p=0.008 n=5)
geomean          407.6n         408.6n        +0.25%
¹ need >= 6 samples for confidence interval at level 0.95

         │ benchmark-base.txt │        benchmark-current.txt        │
         │        B/op        │    B/op      vs base                │
Decode-4          760.0 ± ∞ ¹   760.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Encode-4          184.0 ± ∞ ¹   184.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean           374.0         374.0        +0.00%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

         │ benchmark-base.txt │        benchmark-current.txt        │
         │     allocs/op      │  allocs/op   vs base                │
Decode-4          5.000 ± ∞ ¹   5.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
Encode-4          4.000 ± ∞ ¹   4.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean           4.472         4.472        +0.00%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

pkg: github.com/blinklabs-io/gouroboros/kes
                                     │ benchmark-base.txt │        benchmark-current.txt        │
                                     │       sec/op       │    sec/op      vs base              │
KeyGen/depth_1-4                             45.12µ ± ∞ ¹    45.11µ ± ∞ ¹       ~ (p=1.000 n=5)
KeyGen/depth_2-4                             91.04µ ± ∞ ¹    91.12µ ± ∞ ¹       ~ (p=0.548 n=5)
KeyGen/depth_3-4                             182.8µ ± ∞ ¹    182.8µ ± ∞ ¹       ~ (p=0.548 n=5)
KeyGen/depth_6-4                             1.469m ± ∞ ¹    1.468m ± ∞ ¹       ~ (p=0.095 n=5)
Sign/depth_1-4                               49.88µ ± ∞ ¹    49.82µ ± ∞ ¹  -0.12% (p=0.008 n=5)
Sign/depth_2-4                               49.88µ ± ∞ ¹    49.92µ ± ∞ ¹  +0.09% (p=0.008 n=5)
Sign/depth_3-4                               49.87µ ± ∞ ¹    49.88µ ± ∞ ¹       ~ (p=1.000 n=5)
Sign/depth_6-4                               50.03µ ± ∞ ¹    50.15µ ± ∞ ¹       ~ (p=0.151 n=5)
SignAtDifferentPeriods/period_0-4            50.07µ ± ∞ ¹    50.02µ ± ∞ ¹       ~ (p=0.516 n=5)
SignAtDifferentPeriods/period_31-4           50.05µ ± ∞ ¹    50.41µ ± ∞ ¹  +0.72% (p=0.032 n=5)
SignAtDifferentPeriods/period_32-4           50.06µ ± ∞ ¹    50.25µ ± ∞ ¹       ~ (p=0.421 n=5)
SignAtDifferentPeriods/period_63-4           50.04µ ± ∞ ¹    50.04µ ± ∞ ¹       ~ (p=0.841 n=5)
Update/depth_1-4                             22.41µ ± ∞ ¹    22.40µ ± ∞ ¹       ~ (p=0.548 n=5)
Update/depth_2-4                             22.50µ ± ∞ ¹    22.45µ ± ∞ ¹       ~ (p=0.151 n=5)
Update/depth_3-4                             22.50µ ± ∞ ¹    22.51µ ± ∞ ¹       ~ (p=1.000 n=5)
Update/depth_6-4                             22.52µ ± ∞ ¹    22.52µ ± ∞ ¹       ~ (p=0.690 n=5)
UpdateMultiplePeriods/updates_1-4            1.499m ± ∞ ¹    1.491m ± ∞ ¹  -0.49% (p=0.008 n=5)
UpdateMultiplePeriods/updates_8-4            1.933m ± ∞ ¹    1.926m ± ∞ ¹       ~ (p=0.095 n=5)
UpdateMultiplePeriods/updates_32-4           4.035m ± ∞ ¹    4.024m ± ∞ ¹  -0.25% (p=0.008 n=5)
UpdateMultiplePeriods/updates_63-4           5.852m ± ∞ ¹    5.837m ± ∞ ¹       ~ (p=0.151 n=5)
UpdateAtBoundary-4                           733.2µ ± ∞ ¹    732.8µ ± ∞ ¹       ~ (p=0.841 n=5)
NewSumKesFromBytes/depth_1-4                 31.47n ± ∞ ¹    31.40n ± ∞ ¹       ~ (p=0.056 n=5)
NewSumKesFromBytes/depth_2-4                 80.98n ± ∞ ¹    80.99n ± ∞ ¹       ~ (p=0.889 n=5)
NewSumKesFromBytes/depth_3-4                 127.8n ± ∞ ¹    128.4n ± ∞ ¹       ~ (p=0.198 n=5)
NewSumKesFromBytes/depth_6-4                 274.3n ± ∞ ¹    274.1n ± ∞ ¹       ~ (p=0.548 n=5)
Verify/depth_1-4                             64.51µ ± ∞ ¹    64.49µ ± ∞ ¹       ~ (p=0.548 n=5)
Verify/depth_2-4                             64.67µ ± ∞ ¹    64.67µ ± ∞ ¹       ~ (p=0.841 n=5)
Verify/depth_3-4                             64.67µ ± ∞ ¹    64.70µ ± ∞ ¹       ~ (p=0.278 n=5)
Verify/depth_6-4                             65.55µ ± ∞ ¹    65.60µ ± ∞ ¹       ~ (p=1.000 n=5)
VerifyAtDifferentPeriods/period_0-4          65.53µ ± ∞ ¹    65.49µ ± ∞ ¹       ~ (p=0.421 n=5)
VerifyAtDifferentPeriods/period_31-4         66.46µ ± ∞ ¹    66.34µ ± ∞ ¹       ~ (p=0.548 n=5)
VerifyAtDifferentPeriods/period_32-4         65.72µ ± ∞ ¹    65.79µ ± ∞ ¹       ~ (p=0.841 n=5)
VerifyAtDifferentPeriods/period_63-4         65.16µ ± ∞ ¹    64.93µ ± ∞ ¹       ~ (p=0.095 n=5)
VerifySignedKES-4                            65.92µ ± ∞ ¹    65.85µ ± ∞ ¹       ~ (p=0.310 n=5)
HashPair-4                                   209.2n ± ∞ ¹    207.6n ± ∞ ¹  -0.76% (p=0.008 n=5)
PublicKey/depth_1-4                         0.4680n ± ∞ ¹   0.4673n ± ∞ ¹       ~ (p=0.548 n=5)
PublicKey/depth_2-4                         0.4678n ± ∞ ¹   0.4668n ± ∞ ¹       ~ (p=0.087 n=5)
PublicKey/depth_3-4                         0.4672n ± ∞ ¹   0.4671n ± ∞ ¹       ~ (p=0.579 n=5)
PublicKey/depth_6-4                         0.4672n ± ∞ ¹   0.4671n ± ∞ ¹       ~ (p=0.460 n=5)
FullWorkflow/depth_1-4                       160.2µ ± ∞ ¹    159.9µ ± ∞ ¹       ~ (p=0.056 n=5)
FullWorkflow/depth_2-4                       206.5µ ± ∞ ¹    206.2µ ± ∞ ¹  -0.16% (p=0.008 n=5)
FullWorkflow/depth_3-4                       298.3µ ± ∞ ¹    297.9µ ± ∞ ¹       ~ (p=0.690 n=5)
FullWorkflow/depth_6-4                       1.584m ± ∞ ¹    1.583m ± ∞ ¹       ~ (p=0.548 n=5)
KeyGenSignUpdateCycle/cycles_8-4             2.146m ± ∞ ¹    2.140m ± ∞ ¹  -0.31% (p=0.008 n=5)
SignatureSize/depth_1-4                     0.3120n ± ∞ ¹   0.3116n ± ∞ ¹       ~ (p=0.286 n=5)
SignatureSize/depth_2-4                     0.3121n ± ∞ ¹   0.3119n ± ∞ ¹       ~ (p=0.802 n=5)
SignatureSize/depth_3-4                     0.3118n ± ∞ ¹   0.3121n ± ∞ ¹       ~ (p=0.198 n=5)
SignatureSize/depth_6-4                     0.3122n ± ∞ ¹   0.3115n ± ∞ ¹  -0.22% (p=0.008 n=5)
SignatureSize/depth_10-4                    0.3118n ± ∞ ¹   0.3121n ± ∞ ¹       ~ (p=0.532 n=5)
SignatureSize/depth_20-4                    0.3113n ± ∞ ¹   0.3117n ± ∞ ¹       ~ (p=0.238 n=5)
MaxPeriod/depth_1-4                         0.3117n ± ∞ ¹   0.3119n ± ∞ ¹       ~ (p=0.937 n=5)
MaxPeriod/depth_2-4                         0.3114n ± ∞ ¹   0.3119n ± ∞ ¹       ~ (p=0.127 n=5)
MaxPeriod/depth_3-4                         0.3119n ± ∞ ¹   0.3119n ± ∞ ¹       ~ (p=0.651 n=5)
MaxPeriod/depth_6-4                         0.3120n ± ∞ ¹   0.3118n ± ∞ ¹       ~ (p=0.460 n=5)
MaxPeriod/depth_10-4                        0.3122n ± ∞ ¹   0.3113n ± ∞ ¹  -0.29% (p=0.024 n=5)
MaxPeriod/depth_20-4                        0.3118n ± ∞ ¹   0.3118n ± ∞ ¹       ~ (p=0.770 n=5)
geomean                                      1.814µ          1.813µ        -0.05%
¹ need >= 6 samples for confidence interval at level 0.95

                                     │ benchmark-base.txt │         benchmark-current.txt         │
                                     │        B/op        │     B/op       vs base                │
KeyGen/depth_1-4                              352.0 ± ∞ ¹     352.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
KeyGen/depth_2-4                              704.0 ± ∞ ¹     704.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
KeyGen/depth_3-4                            1.281Ki ± ∞ ¹   1.281Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
KeyGen/depth_6-4                            8.594Ki ± ∞ ¹   8.594Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
Sign/depth_1-4                                128.0 ± ∞ ¹     128.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Sign/depth_2-4                                192.0 ± ∞ ¹     192.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Sign/depth_3-4                                256.0 ± ∞ ¹     256.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Sign/depth_6-4                                448.0 ± ∞ ¹     448.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignAtDifferentPeriods/period_0-4             448.0 ± ∞ ¹     448.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignAtDifferentPeriods/period_31-4            448.0 ± ∞ ¹     448.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignAtDifferentPeriods/period_32-4            448.0 ± ∞ ¹     448.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignAtDifferentPeriods/period_63-4            448.0 ± ∞ ¹     448.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Update/depth_1-4                              224.0 ± ∞ ¹     224.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Update/depth_2-4                              320.0 ± ∞ ¹     320.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Update/depth_3-4                              416.0 ± ∞ ¹     416.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Update/depth_6-4                              736.0 ± ∞ ¹     736.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
UpdateMultiplePeriods/updates_1-4           9.312Ki ± ∞ ¹   9.312Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
UpdateMultiplePeriods/updates_8-4           15.84Ki ± ∞ ¹   15.84Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
UpdateMultiplePeriods/updates_32-4          41.59Ki ± ∞ ¹   41.59Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
UpdateMultiplePeriods/updates_63-4          70.00Ki ± ∞ ¹   70.00Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
UpdateAtBoundary-4                          4.594Ki ± ∞ ¹   4.594Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
NewSumKesFromBytes/depth_1-4                  24.00 ± ∞ ¹     24.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
NewSumKesFromBytes/depth_2-4                  104.0 ± ∞ ¹     104.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
NewSumKesFromBytes/depth_3-4                  184.0 ± ∞ ¹     184.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
NewSumKesFromBytes/depth_6-4                  424.0 ± ∞ ¹     424.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/depth_1-4                              32.00 ± ∞ ¹     32.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/depth_2-4                              64.00 ± ∞ ¹     64.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/depth_3-4                              96.00 ± ∞ ¹     96.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/depth_6-4                              192.0 ± ∞ ¹     192.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAtDifferentPeriods/period_0-4           192.0 ± ∞ ¹     192.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAtDifferentPeriods/period_31-4          192.0 ± ∞ ¹     192.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAtDifferentPeriods/period_32-4          192.0 ± ∞ ¹     192.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAtDifferentPeriods/period_63-4          192.0 ± ∞ ¹     192.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifySignedKES-4                             616.0 ± ∞ ¹     616.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
HashPair-4                                    32.00 ± ∞ ¹     32.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
PublicKey/depth_1-4                           0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
PublicKey/depth_2-4                           0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
PublicKey/depth_3-4                           0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
PublicKey/depth_6-4                           0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
FullWorkflow/depth_1-4                        536.0 ± ∞ ¹     536.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
FullWorkflow/depth_2-4                      1.039Ki ± ∞ ¹   1.039Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
FullWorkflow/depth_3-4                      1.805Ki ± ∞ ¹   1.805Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
FullWorkflow/depth_6-4                      9.633Ki ± ∞ ¹   9.633Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
KeyGenSignUpdateCycle/cycles_8-4            17.75Ki ± ∞ ¹   17.75Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
SignatureSize/depth_1-4                       0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignatureSize/depth_2-4                       0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignatureSize/depth_3-4                       0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignatureSize/depth_6-4                       0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignatureSize/depth_10-4                      0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignatureSize/depth_20-4                      0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
MaxPeriod/depth_1-4                           0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
MaxPeriod/depth_2-4                           0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
MaxPeriod/depth_3-4                           0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
MaxPeriod/depth_6-4                           0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
MaxPeriod/depth_10-4                          0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
MaxPeriod/depth_20-4                          0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                                 ³                  +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

                                     │ benchmark-base.txt │        benchmark-current.txt        │
                                     │     allocs/op      │  allocs/op   vs base                │
KeyGen/depth_1-4                              7.000 ± ∞ ¹   7.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
KeyGen/depth_2-4                              15.00 ± ∞ ¹   15.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
KeyGen/depth_3-4                              31.00 ± ∞ ¹   31.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
KeyGen/depth_6-4                              255.0 ± ∞ ¹   255.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Sign/depth_1-4                                1.000 ± ∞ ¹   1.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
Sign/depth_2-4                                1.000 ± ∞ ¹   1.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
Sign/depth_3-4                                1.000 ± ∞ ¹   1.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
Sign/depth_6-4                                1.000 ± ∞ ¹   1.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignAtDifferentPeriods/period_0-4             1.000 ± ∞ ¹   1.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignAtDifferentPeriods/period_31-4            1.000 ± ∞ ¹   1.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignAtDifferentPeriods/period_32-4            1.000 ± ∞ ¹   1.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignAtDifferentPeriods/period_63-4            1.000 ± ∞ ¹   1.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
Update/depth_1-4                              3.000 ± ∞ ¹   3.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
Update/depth_2-4                              3.000 ± ∞ ¹   3.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
Update/depth_3-4                              3.000 ± ∞ ¹   3.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
Update/depth_6-4                              3.000 ± ∞ ¹   3.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
UpdateMultiplePeriods/updates_1-4             258.0 ± ∞ ¹   258.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
UpdateMultiplePeriods/updates_8-4             327.0 ± ∞ ¹   327.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
UpdateMultiplePeriods/updates_32-4            671.0 ± ∞ ¹   671.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
UpdateMultiplePeriods/updates_63-4            960.0 ± ∞ ¹   960.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
UpdateAtBoundary-4                            127.0 ± ∞ ¹   127.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
NewSumKesFromBytes/depth_1-4                  1.000 ± ∞ ¹   1.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
NewSumKesFromBytes/depth_2-4                  2.000 ± ∞ ¹   2.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
NewSumKesFromBytes/depth_3-4                  3.000 ± ∞ ¹   3.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
NewSumKesFromBytes/depth_6-4                  6.000 ± ∞ ¹   6.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/depth_1-4                              1.000 ± ∞ ¹   1.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/depth_2-4                              2.000 ± ∞ ¹   2.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/depth_3-4                              3.000 ± ∞ ¹   3.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/depth_6-4                              6.000 ± ∞ ¹   6.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAtDifferentPeriods/period_0-4           6.000 ± ∞ ¹   6.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAtDifferentPeriods/period_31-4          6.000 ± ∞ ¹   6.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAtDifferentPeriods/period_32-4          6.000 ± ∞ ¹   6.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAtDifferentPeriods/period_63-4          6.000 ± ∞ ¹   6.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifySignedKES-4                             12.00 ± ∞ ¹   12.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
HashPair-4                                    1.000 ± ∞ ¹   1.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
PublicKey/depth_1-4                           0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
PublicKey/depth_2-4                           0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
PublicKey/depth_3-4                           0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
PublicKey/depth_6-4                           0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
FullWorkflow/depth_1-4                        10.00 ± ∞ ¹   10.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
FullWorkflow/depth_2-4                        20.00 ± ∞ ¹   20.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
FullWorkflow/depth_3-4                        38.00 ± ∞ ¹   38.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
FullWorkflow/depth_6-4                        268.0 ± ∞ ¹   268.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
KeyGenSignUpdateCycle/cycles_8-4              304.0 ± ∞ ¹   304.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignatureSize/depth_1-4                       0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignatureSize/depth_2-4                       0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignatureSize/depth_3-4                       0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignatureSize/depth_6-4                       0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignatureSize/depth_10-4                      0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
SignatureSize/depth_20-4                      0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
MaxPeriod/depth_1-4                           0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
MaxPeriod/depth_2-4                           0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
MaxPeriod/depth_3-4                           0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
MaxPeriod/depth_6-4                           0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
MaxPeriod/depth_10-4                          0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
MaxPeriod/depth_20-4                          0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                                 ³                +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/blinklabs-io/gouroboros/ledger
                                     │ benchmark-base.txt │       benchmark-current.txt        │
                                     │       sec/op       │    sec/op     vs base              │
DetermineTransactionType_Conway-4            14.84µ ± ∞ ¹   14.70µ ± ∞ ¹       ~ (p=0.095 n=5)
DetermineTransactionType_Byron-4             2.784µ ± ∞ ¹   2.768µ ± ∞ ¹       ~ (p=0.286 n=5)
TransactionDeserialization_Byron-4           2.749µ ± ∞ ¹   2.755µ ± ∞ ¹       ~ (p=0.730 n=5)
TransactionSerialization_Byron-4             2.180n ± ∞ ¹   2.179n ± ∞ ¹       ~ (p=0.389 n=5)
TransactionDeserialization_Conway-4          14.89µ ± ∞ ¹   14.79µ ± ∞ ¹  -0.65% (p=0.008 n=5)
TransactionSerialization_Conway-4            2.179n ± ∞ ¹   2.179n ± ∞ ¹       ~ (p=0.889 n=5)
TransactionDeserialization_Shelley-4         28.05µ ± ∞ ¹   28.26µ ± ∞ ¹       ~ (p=0.690 n=5)
TransactionSerialization_Shelley-4           2.180n ± ∞ ¹   2.178n ± ∞ ¹       ~ (p=0.294 n=5)
TransactionDeserialization_Allegra-4         20.84µ ± ∞ ¹   20.78µ ± ∞ ¹  -0.31% (p=0.040 n=5)
TransactionSerialization_Allegra-4           2.180n ± ∞ ¹   2.178n ± ∞ ¹       ~ (p=0.238 n=5)
TransactionDeserialization_Mary-4            22.94µ ± ∞ ¹   22.88µ ± ∞ ¹       ~ (p=0.310 n=5)
TransactionSerialization_Mary-4              2.181n ± ∞ ¹   2.180n ± ∞ ¹       ~ (p=0.587 n=5)
TransactionDeserialization_Alonzo-4          58.08µ ± ∞ ¹   58.08µ ± ∞ ¹       ~ (p=0.841 n=5)
TransactionSerialization_Alonzo-4            2.491n ± ∞ ¹   2.492n ± ∞ ¹       ~ (p=0.294 n=5)
TransactionDeserialization_Babbage-4         13.58µ ± ∞ ¹   13.36µ ± ∞ ¹  -1.58% (p=0.008 n=5)
TransactionSerialization_Babbage-4           2.491n ± ∞ ¹   2.492n ± ∞ ¹       ~ (p=0.460 n=5)
geomean                                      304.5n         303.8n        -0.22%
¹ need >= 6 samples for confidence interval at level 0.95

                                     │ benchmark-base.txt │         benchmark-current.txt         │
                                     │        B/op        │     B/op       vs base                │
DetermineTransactionType_Conway-4           19.70Ki ± ∞ ¹   19.70Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
DetermineTransactionType_Byron-4            3.859Ki ± ∞ ¹   3.859Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionDeserialization_Byron-4          3.859Ki ± ∞ ¹   3.859Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionSerialization_Byron-4              0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionDeserialization_Conway-4         19.70Ki ± ∞ ¹   19.70Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionSerialization_Conway-4             0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionDeserialization_Shelley-4        22.50Ki ± ∞ ¹   22.50Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionSerialization_Shelley-4            0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionDeserialization_Allegra-4        18.59Ki ± ∞ ¹   18.59Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionSerialization_Allegra-4            0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionDeserialization_Mary-4           20.66Ki ± ∞ ¹   20.66Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionSerialization_Mary-4               0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionDeserialization_Alonzo-4         44.87Ki ± ∞ ¹   44.87Ki ± ∞ ¹       ~ (p=1.000 n=5)
TransactionSerialization_Alonzo-4             0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionDeserialization_Babbage-4        17.23Ki ± ∞ ¹   17.23Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionSerialization_Babbage-4            0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                                 ³                  +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

                                     │ benchmark-base.txt │        benchmark-current.txt        │
                                     │     allocs/op      │  allocs/op   vs base                │
DetermineTransactionType_Conway-4             111.0 ± ∞ ¹   111.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
DetermineTransactionType_Byron-4              28.00 ± ∞ ¹   28.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionDeserialization_Byron-4            28.00 ± ∞ ¹   28.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionSerialization_Byron-4              0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionDeserialization_Conway-4           111.0 ± ∞ ¹   111.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionSerialization_Conway-4             0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionDeserialization_Shelley-4          139.0 ± ∞ ¹   139.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionSerialization_Shelley-4            0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionDeserialization_Allegra-4          106.0 ± ∞ ¹   106.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionSerialization_Allegra-4            0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionDeserialization_Mary-4             120.0 ± ∞ ¹   120.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionSerialization_Mary-4               0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionDeserialization_Alonzo-4           278.0 ± ∞ ¹   278.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionSerialization_Alonzo-4             0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionDeserialization_Babbage-4          100.0 ± ∞ ¹   100.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
TransactionSerialization_Babbage-4            0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                                 ³                +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/blinklabs-io/gouroboros/ledger/allegra
                              │ benchmark-base.txt │       benchmark-current.txt        │
                              │       sec/op       │    sec/op     vs base              │
AllegraBlockDeserialization-4         185.7µ ± ∞ ¹   184.6µ ± ∞ ¹       ~ (p=0.421 n=5)
AllegraBlockSerialization-4           1.556n ± ∞ ¹   1.556n ± ∞ ¹       ~ (p=0.563 n=5)
geomean                               537.5n         536.0n        -0.28%
¹ need >= 6 samples for confidence interval at level 0.95

                              │ benchmark-base.txt │         benchmark-current.txt         │
                              │        B/op        │     B/op       vs base                │
AllegraBlockDeserialization-4        203.4Ki ± ∞ ¹   203.4Ki ± ∞ ¹       ~ (p=1.000 n=5)
AllegraBlockSerialization-4            0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                          ³                  +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

                              │ benchmark-base.txt │        benchmark-current.txt         │
                              │     allocs/op      │  allocs/op    vs base                │
AllegraBlockDeserialization-4         1.048k ± ∞ ¹   1.048k ± ∞ ¹       ~ (p=1.000 n=5) ²
AllegraBlockSerialization-4            0.000 ± ∞ ¹    0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                          ³                 +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/blinklabs-io/gouroboros/ledger/alonzo
                             │ benchmark-base.txt │       benchmark-current.txt        │
                             │       sec/op       │    sec/op     vs base              │
AlonzoBlockDeserialization-4         261.4µ ± ∞ ¹   256.4µ ± ∞ ¹       ~ (p=0.222 n=5)
AlonzoBlockSerialization-4           1.867n ± ∞ ¹   1.867n ± ∞ ¹       ~ (p=0.722 n=5)
geomean                              698.6n         691.8n        -0.97%
¹ need >= 6 samples for confidence interval at level 0.95

                             │ benchmark-base.txt │         benchmark-current.txt         │
                             │        B/op        │     B/op       vs base                │
AlonzoBlockDeserialization-4        281.7Ki ± ∞ ¹   281.7Ki ± ∞ ¹       ~ (p=1.000 n=5)
AlonzoBlockSerialization-4            0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                         ³                  +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

                             │ benchmark-base.txt │        benchmark-current.txt         │
                             │     allocs/op      │  allocs/op    vs base                │
AlonzoBlockDeserialization-4         1.335k ± ∞ ¹   1.335k ± ∞ ¹       ~ (p=1.000 n=5) ²
AlonzoBlockSerialization-4            0.000 ± ∞ ¹    0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                         ³                 +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/blinklabs-io/gouroboros/ledger/babbage
                              │ benchmark-base.txt │       benchmark-current.txt        │
                              │       sec/op       │    sec/op     vs base              │
BabbageBlockDeserialization-4         1.108m ± ∞ ¹   1.106m ± ∞ ¹       ~ (p=1.000 n=5)
BabbageBlockSerialization-4           1.867n ± ∞ ¹   1.869n ± ∞ ¹       ~ (p=0.540 n=5)
geomean                               1.439µ         1.437µ        -0.08%
¹ need >= 6 samples for confidence interval at level 0.95

                              │ benchmark-base.txt │         benchmark-current.txt         │
                              │        B/op        │     B/op       vs base                │
BabbageBlockDeserialization-4        837.1Ki ± ∞ ¹   837.1Ki ± ∞ ¹       ~ (p=0.548 n=5)
BabbageBlockSerialization-4            0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                          ³                  +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

                              │ benchmark-base.txt │        benchmark-current.txt         │
                              │     allocs/op      │  allocs/op    vs base                │
BabbageBlockDeserialization-4         5.659k ± ∞ ¹   5.659k ± ∞ ¹       ~ (p=1.000 n=5) ²
BabbageBlockSerialization-4            0.000 ± ∞ ¹    0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                          ³                 +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/blinklabs-io/gouroboros/ledger/byron
                            │ benchmark-base.txt │       benchmark-current.txt        │
                            │       sec/op       │    sec/op     vs base              │
ByronBlockDeserialization-4         73.96µ ± ∞ ¹   73.68µ ± ∞ ¹       ~ (p=0.222 n=5)
ByronBlockSerialization-4           1.868n ± ∞ ¹   1.871n ± ∞ ¹       ~ (p=0.119 n=5)
geomean                             371.7n         371.3n        -0.11%
¹ need >= 6 samples for confidence interval at level 0.95

                            │ benchmark-base.txt │         benchmark-current.txt         │
                            │        B/op        │     B/op       vs base                │
ByronBlockDeserialization-4        80.62Ki ± ∞ ¹   80.62Ki ± ∞ ¹       ~ (p=1.000 n=5)
ByronBlockSerialization-4            0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                        ³                  +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

                            │ benchmark-base.txt │        benchmark-current.txt        │
                            │     allocs/op      │  allocs/op   vs base                │
ByronBlockDeserialization-4          493.0 ± ∞ ¹   493.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
ByronBlockSerialization-4            0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                        ³                +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/blinklabs-io/gouroboros/ledger/common
                      │ benchmark-base.txt │       benchmark-current.txt        │
                      │       sec/op       │    sec/op     vs base              │
AddressFromBytes-4            65.06n ± ∞ ¹   64.47n ± ∞ ¹       ~ (p=0.222 n=5)
AddressFromString-4           1.923µ ± ∞ ¹   1.922µ ± ∞ ¹       ~ (p=0.579 n=5)
DecodeAuxiliaryData-4         4.838µ ± ∞ ¹   4.776µ ± ∞ ¹  -1.28% (p=0.008 n=5)
geomean                       845.9n         839.6n        -0.75%
¹ need >= 6 samples for confidence interval at level 0.95

                      │ benchmark-base.txt │         benchmark-current.txt         │
                      │        B/op        │     B/op       vs base                │
AddressFromBytes-4             64.00 ± ∞ ¹     64.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
AddressFromString-4            240.0 ± ∞ ¹     240.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
DecodeAuxiliaryData-4        6.305Ki ± ∞ ¹   6.305Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                        462.9           462.9        +0.00%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

                      │ benchmark-base.txt │        benchmark-current.txt        │
                      │     allocs/op      │  allocs/op   vs base                │
AddressFromBytes-4             2.000 ± ∞ ¹   2.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
AddressFromString-4            4.000 ± ∞ ¹   4.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
DecodeAuxiliaryData-4          55.00 ± ∞ ¹   55.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                        7.606         7.606        +0.00%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

pkg: github.com/blinklabs-io/gouroboros/ledger/conway
                             │ benchmark-base.txt │       benchmark-current.txt        │
                             │       sec/op       │    sec/op     vs base              │
ConwayBlockDeserialization-4         476.8µ ± ∞ ¹   473.1µ ± ∞ ¹       ~ (p=0.151 n=5)
ConwayBlockSerialization-4           1.868n ± ∞ ¹   1.867n ± ∞ ¹       ~ (p=0.587 n=5)
geomean                              943.7n         939.8n        -0.42%
¹ need >= 6 samples for confidence interval at level 0.95

                             │ benchmark-base.txt │         benchmark-current.txt         │
                             │        B/op        │     B/op       vs base                │
ConwayBlockDeserialization-4        404.8Ki ± ∞ ¹   404.8Ki ± ∞ ¹       ~ (p=0.238 n=5)
ConwayBlockSerialization-4            0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                         ³                  +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

                             │ benchmark-base.txt │        benchmark-current.txt         │
                             │     allocs/op      │  allocs/op    vs base                │
ConwayBlockDeserialization-4         2.625k ± ∞ ¹   2.625k ± ∞ ¹       ~ (p=1.000 n=5) ²
ConwayBlockSerialization-4            0.000 ± ∞ ¹    0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                         ³                 +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/blinklabs-io/gouroboros/ledger/mary
                           │ benchmark-base.txt │       benchmark-current.txt        │
                           │       sec/op       │    sec/op     vs base              │
MaryBlockDeserialization-4         204.9µ ± ∞ ¹   204.2µ ± ∞ ¹       ~ (p=0.222 n=5)
MaryBlockSerialization-4           1.868n ± ∞ ¹   1.867n ± ∞ ¹       ~ (p=0.730 n=5)
geomean                            618.6n         617.4n        -0.19%
¹ need >= 6 samples for confidence interval at level 0.95

                           │ benchmark-base.txt │         benchmark-current.txt         │
                           │        B/op        │     B/op       vs base                │
MaryBlockDeserialization-4        195.2Ki ± ∞ ¹   195.2Ki ± ∞ ¹       ~ (p=1.000 n=5)
MaryBlockSerialization-4            0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                       ³                  +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

                           │ benchmark-base.txt │        benchmark-current.txt         │
                           │     allocs/op      │  allocs/op    vs base                │
MaryBlockDeserialization-4         1.092k ± ∞ ¹   1.092k ± ∞ ¹       ~ (p=1.000 n=5) ²
MaryBlockSerialization-4            0.000 ± ∞ ¹    0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                       ³                 +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/blinklabs-io/gouroboros/ledger/shelley
                              │ benchmark-base.txt │       benchmark-current.txt        │
                              │       sec/op       │    sec/op     vs base              │
ShelleyBlockDeserialization-4         73.78µ ± ∞ ¹   73.90µ ± ∞ ¹       ~ (p=0.310 n=5)
ShelleyBlockSerialization-4           1.868n ± ∞ ¹   1.870n ± ∞ ¹       ~ (p=0.738 n=5)
geomean                               371.2n         371.7n        +0.14%
¹ need >= 6 samples for confidence interval at level 0.95

                              │ benchmark-base.txt │         benchmark-current.txt         │
                              │        B/op        │     B/op       vs base                │
ShelleyBlockDeserialization-4        81.51Ki ± ∞ ¹   81.51Ki ± ∞ ¹       ~ (p=1.000 n=5)
ShelleyBlockSerialization-4            0.000 ± ∞ ¹     0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                          ³                  +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

                              │ benchmark-base.txt │        benchmark-current.txt        │
                              │     allocs/op      │  allocs/op   vs base                │
ShelleyBlockDeserialization-4          419.0 ± ∞ ¹   419.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
ShelleyBlockSerialization-4            0.000 ± ∞ ¹   0.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                          ³                +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/blinklabs-io/gouroboros/pipeline
                           │ benchmark-base.txt │       benchmark-current.txt        │
                           │       sec/op       │    sec/op     vs base              │
DecodeStage/Byron-4                82.59µ ± ∞ ¹   84.07µ ± ∞ ¹       ~ (p=0.056 n=5)
DecodeStage/Shelley-4              83.49µ ± ∞ ¹   84.77µ ± ∞ ¹       ~ (p=0.151 n=5)
DecodeStage/Allegra-4              211.3µ ± ∞ ¹   212.7µ ± ∞ ¹  +0.66% (p=0.024 n=5)
DecodeStage/Mary-4                 226.4µ ± ∞ ¹   228.0µ ± ∞ ¹  +0.71% (p=0.032 n=5)
DecodeStage/Alonzo-4               293.2µ ± ∞ ¹   295.1µ ± ∞ ¹       ~ (p=0.056 n=5)
DecodeStage/Babbage-4              1.190m ± ∞ ¹   1.209m ± ∞ ¹  +1.61% (p=0.032 n=5)
DecodeStage/Conway-4               519.7µ ± ∞ ¹   530.0µ ± ∞ ¹  +1.98% (p=0.008 n=5)
DecodeStage/AllEras-4              394.3µ ± ∞ ¹   404.8µ ± ∞ ¹  +2.65% (p=0.016 n=5)
StageWorkerPool/Workers1-4         452.9µ ± ∞ ¹   455.7µ ± ∞ ¹       ~ (p=0.056 n=5)
StageWorkerPool/Workers2-4         369.9µ ± ∞ ¹   372.0µ ± ∞ ¹       ~ (p=0.548 n=5)
StageWorkerPool/Workers4-4         293.5µ ± ∞ ¹   297.5µ ± ∞ ¹  +1.37% (p=0.016 n=5)
StageWorkerPool/Workers8-4         265.2µ ± ∞ ¹   266.3µ ± ∞ ¹       ~ (p=0.421 n=5)
BlockPipeline/EndToEnd-4           185.2µ ± ∞ ¹   186.9µ ± ∞ ¹       ~ (p=1.000 n=5)
BlockDecode/Byron-4                85.46µ ± ∞ ¹   86.14µ ± ∞ ¹       ~ (p=0.095 n=5)
BlockDecode/Shelley-4              96.13µ ± ∞ ¹   97.71µ ± ∞ ¹       ~ (p=0.222 n=5)
BlockDecode/Allegra-4              238.7µ ± ∞ ¹   243.1µ ± ∞ ¹  +1.83% (p=0.008 n=5)
BlockDecode/Mary-4                 252.8µ ± ∞ ¹   256.0µ ± ∞ ¹  +1.26% (p=0.016 n=5)
BlockDecode/Alonzo-4               334.6µ ± ∞ ¹   335.0µ ± ∞ ¹       ~ (p=0.421 n=5)
BlockDecode/Babbage-4              1.293m ± ∞ ¹   1.301m ± ∞ ¹  +0.62% (p=0.032 n=5)
BlockDecode/Conway-4               570.4µ ± ∞ ¹   571.2µ ± ∞ ¹       ~ (p=0.310 n=5)
geomean                            276.8µ         279.8µ        +1.09%
¹ need >= 6 samples for confidence interval at level 0.95

                           │ benchmark-base.txt │         benchmark-current.txt         │
                           │        B/op        │     B/op       vs base                │
DecodeStage/Byron-4               89.34Ki ± ∞ ¹   89.34Ki ± ∞ ¹       ~ (p=1.000 n=5)
DecodeStage/Shelley-4             91.48Ki ± ∞ ¹   91.48Ki ± ∞ ¹       ~ (p=0.524 n=5)
DecodeStage/Allegra-4             224.4Ki ± ∞ ¹   224.4Ki ± ∞ ¹       ~ (p=0.881 n=5)
DecodeStage/Mary-4                215.0Ki ± ∞ ¹   215.0Ki ± ∞ ¹       ~ (p=1.000 n=5)
DecodeStage/Alonzo-4              321.6Ki ± ∞ ¹   321.6Ki ± ∞ ¹       ~ (p=0.556 n=5)
DecodeStage/Babbage-4             917.7Ki ± ∞ ¹   917.7Ki ± ∞ ¹       ~ (p=0.706 n=5)
DecodeStage/Conway-4              443.4Ki ± ∞ ¹   443.4Ki ± ∞ ¹       ~ (p=0.714 n=5)
DecodeStage/AllEras-4             328.9Ki ± ∞ ¹   329.0Ki ± ∞ ¹       ~ (p=0.690 n=5)
StageWorkerPool/Workers1-4        329.0Ki ± ∞ ¹   329.0Ki ± ∞ ¹       ~ (p=0.786 n=5)
StageWorkerPool/Workers2-4        329.0Ki ± ∞ ¹   329.0Ki ± ∞ ¹       ~ (p=1.000 n=5)
StageWorkerPool/Workers4-4        329.0Ki ± ∞ ¹   328.9Ki ± ∞ ¹  -0.03% (p=0.032 n=5)
StageWorkerPool/Workers8-4        328.9Ki ± ∞ ¹   328.9Ki ± ∞ ¹       ~ (p=0.056 n=5)
BlockPipeline/EndToEnd-4          329.0Ki ± ∞ ¹   328.9Ki ± ∞ ¹       ~ (p=0.373 n=5)
BlockDecode/Byron-4               87.09Ki ± ∞ ¹   87.09Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
BlockDecode/Shelley-4             97.64Ki ± ∞ ¹   97.64Ki ± ∞ ¹       ~ (p=0.524 n=5)
BlockDecode/Allegra-4             238.8Ki ± ∞ ¹   238.8Ki ± ∞ ¹       ~ (p=0.206 n=5)
BlockDecode/Mary-4                229.4Ki ± ∞ ¹   229.4Ki ± ∞ ¹       ~ (p=0.278 n=5)
BlockDecode/Alonzo-4              351.6Ki ± ∞ ¹   351.6Ki ± ∞ ¹       ~ (p=0.175 n=5)
BlockDecode/Babbage-4             980.4Ki ± ∞ ¹   980.4Ki ± ∞ ¹       ~ (p=0.492 n=5)
BlockDecode/Conway-4              473.7Ki ± ∞ ¹   473.7Ki ± ∞ ¹       ~ (p=1.000 n=5)
geomean                           272.2Ki         272.2Ki        -0.00%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

                           │ benchmark-base.txt │        benchmark-current.txt         │
                           │     allocs/op      │  allocs/op    vs base                │
DecodeStage/Byron-4                 502.0 ± ∞ ¹    502.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
DecodeStage/Shelley-4               428.0 ± ∞ ¹    428.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
DecodeStage/Allegra-4              1.058k ± ∞ ¹   1.058k ± ∞ ¹       ~ (p=1.000 n=5) ²
DecodeStage/Mary-4                 1.102k ± ∞ ¹   1.102k ± ∞ ¹       ~ (p=1.000 n=5) ²
DecodeStage/Alonzo-4               1.346k ± ∞ ¹   1.346k ± ∞ ¹       ~ (p=1.000 n=5) ²
DecodeStage/Babbage-4              5.671k ± ∞ ¹   5.672k ± ∞ ¹       ~ (p=0.206 n=5)
DecodeStage/Conway-4               2.636k ± ∞ ¹   2.636k ± ∞ ¹       ~ (p=1.000 n=5) ²
DecodeStage/AllEras-4              1.820k ± ∞ ¹   1.820k ± ∞ ¹       ~ (p=0.683 n=5)
StageWorkerPool/Workers1-4         1.820k ± ∞ ¹   1.820k ± ∞ ¹       ~ (p=1.000 n=5)
StageWorkerPool/Workers2-4         1.821k ± ∞ ¹   1.821k ± ∞ ¹       ~ (p=1.000 n=5)
StageWorkerPool/Workers4-4         1.821k ± ∞ ¹   1.820k ± ∞ ¹  -0.05% (p=0.048 n=5)
StageWorkerPool/Workers8-4         1.820k ± ∞ ¹   1.820k ± ∞ ¹       ~ (p=1.000 n=5) ²
BlockPipeline/EndToEnd-4           1.820k ± ∞ ¹   1.820k ± ∞ ¹       ~ (p=1.000 n=5)
BlockDecode/Byron-4                 500.0 ± ∞ ¹    500.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
BlockDecode/Shelley-4               441.0 ± ∞ ¹    441.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
BlockDecode/Allegra-4              1.072k ± ∞ ¹   1.072k ± ∞ ¹       ~ (p=1.000 n=5) ²
BlockDecode/Mary-4                 1.116k ± ∞ ¹   1.116k ± ∞ ¹       ~ (p=1.000 n=5) ²
BlockDecode/Alonzo-4               1.362k ± ∞ ¹   1.362k ± ∞ ¹       ~ (p=1.000 n=5) ²
BlockDecode/Babbage-4              5.689k ± ∞ ¹   5.689k ± ∞ ¹       ~ (p=1.000 n=5) ²
BlockDecode/Conway-4               2.652k ± ∞ ¹   2.652k ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                            1.412k         1.412k        -0.00%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

                           │ benchmark-base.txt │        benchmark-current.txt        │
                           │        B/s         │      B/s       vs base              │
DecodeStage/Byron-4               21.05Mi ± ∞ ¹   20.68Mi ± ∞ ¹       ~ (p=0.063 n=5)
DecodeStage/Shelley-4             36.63Mi ± ∞ ¹   36.08Mi ± ∞ ¹       ~ (p=0.151 n=5)
DecodeStage/Allegra-4             28.69Mi ± ∞ ¹   28.50Mi ± ∞ ¹  -0.66% (p=0.032 n=5)
DecodeStage/Mary-4                21.57Mi ± ∞ ¹   21.42Mi ± ∞ ¹  -0.71% (p=0.032 n=5)
DecodeStage/Alonzo-4              26.86Mi ± ∞ ¹   26.69Mi ± ∞ ¹       ~ (p=0.056 n=5)
DecodeStage/Babbage-4             14.38Mi ± ∞ ¹   14.15Mi ± ∞ ¹  -1.59% (p=0.024 n=5)
DecodeStage/Conway-4              14.10Mi ± ∞ ¹   13.83Mi ± ∞ ¹  -1.96% (p=0.008 n=5)
DecodeStage/AllEras-4             17.41Mi ± ∞ ¹   16.97Mi ± ∞ ¹  -2.57% (p=0.016 n=5)
StageWorkerPool/Workers1-4        15.15Mi ± ∞ ¹   15.07Mi ± ∞ ¹       ~ (p=0.056 n=5)
StageWorkerPool/Workers2-4        18.56Mi ± ∞ ¹   18.45Mi ± ∞ ¹       ~ (p=0.460 n=5)
StageWorkerPool/Workers4-4        23.39Mi ± ∞ ¹   23.08Mi ± ∞ ¹  -1.35% (p=0.016 n=5)
StageWorkerPool/Workers8-4        25.88Mi ± ∞ ¹   25.79Mi ± ∞ ¹       ~ (p=0.421 n=5)
BlockPipeline/EndToEnd-4          37.06Mi ± ∞ ¹   36.74Mi ± ∞ ¹       ~ (p=1.000 n=5)
BlockDecode/Byron-4               20.34Mi ± ∞ ¹   20.18Mi ± ∞ ¹       ~ (p=0.087 n=5)
BlockDecode/Shelley-4             31.81Mi ± ∞ ¹   31.30Mi ± ∞ ¹       ~ (p=0.222 n=5)
BlockDecode/Allegra-4             25.40Mi ± ∞ ¹   24.94Mi ± ∞ ¹  -1.80% (p=0.008 n=5)
BlockDecode/Mary-4                19.32Mi ± ∞ ¹   19.08Mi ± ∞ ¹  -1.23% (p=0.016 n=5)
BlockDecode/Alonzo-4              23.55Mi ± ∞ ¹   23.51Mi ± ∞ ¹       ~ (p=0.413 n=5)
BlockDecode/Babbage-4             13.24Mi ± ∞ ¹   13.16Mi ± ∞ ¹  -0.58% (p=0.024 n=5)
BlockDecode/Conway-4              12.86Mi ± ∞ ¹   12.84Mi ± ∞ ¹       ~ (p=0.238 n=5)
geomean                           21.28Mi         21.05Mi        -1.07%
¹ need >= 6 samples for confidence interval at level 0.95

pkg: github.com/blinklabs-io/gouroboros/vrf
                                       │ benchmark-base.txt │       benchmark-current.txt        │
                                       │       sec/op       │    sec/op     vs base              │
KeyGen/StandardSeed-4                          22.35µ ± ∞ ¹   22.38µ ± ∞ ¹       ~ (p=0.222 n=5)
KeyGen/CardanoSeed-4                           22.37µ ± ∞ ¹   22.50µ ± ∞ ¹       ~ (p=0.095 n=5)
KeyGen/RandomSeed-4                            22.38µ ± ∞ ¹   22.41µ ± ∞ ¹       ~ (p=0.095 n=5)
Prove/Empty-4                                  221.4µ ± ∞ ¹   221.4µ ± ∞ ¹       ~ (p=0.690 n=5)
Prove/32B-4                                    221.2µ ± ∞ ¹   221.8µ ± ∞ ¹  +0.26% (p=0.032 n=5)
Prove/256B-4                                   221.7µ ± ∞ ¹   221.7µ ± ∞ ¹       ~ (p=0.548 n=5)
Prove/1KB-4                                    222.8µ ± ∞ ¹   222.9µ ± ∞ ¹       ~ (p=0.548 n=5)
Verify/Valid-4                                 263.6µ ± ∞ ¹   263.8µ ± ∞ ¹       ~ (p=0.310 n=5)
Verify/Alpha_Empty-4                           263.2µ ± ∞ ¹   263.1µ ± ∞ ¹       ~ (p=0.341 n=5)
Verify/Alpha_32B-4                             263.4µ ± ∞ ¹   263.1µ ± ∞ ¹       ~ (p=0.690 n=5)
Verify/Alpha_256B-4                            263.6µ ± ∞ ¹   263.6µ ± ∞ ¹       ~ (p=0.500 n=5)
Verify/Alpha_1KB-4                             265.2µ ± ∞ ¹   264.7µ ± ∞ ¹  -0.18% (p=0.032 n=5)
VerifyAndHash/Standard-4                       263.3µ ± ∞ ¹   263.4µ ± ∞ ¹       ~ (p=0.222 n=5)
VerifyAndHash/Alpha_Empty-4                    263.2µ ± ∞ ¹   263.2µ ± ∞ ¹       ~ (p=0.579 n=5)
VerifyAndHash/Alpha_32B-4                      263.3µ ± ∞ ¹   263.3µ ± ∞ ¹       ~ (p=0.690 n=5)
VerifyAndHash/Alpha_256B-4                     263.6µ ± ∞ ¹   263.7µ ± ∞ ¹       ~ (p=0.151 n=5)
VerifyAndHash/Alpha_1KB-4                      264.8µ ± ∞ ¹   264.8µ ± ∞ ¹       ~ (p=1.000 n=5)
ProofToHash-4                                  10.22µ ± ∞ ¹   10.23µ ± ∞ ¹       ~ (p=0.548 n=5)
MkInputVrf/LowSlot-4                           297.7n ± ∞ ¹   298.5n ± ∞ ¹       ~ (p=0.651 n=5)
MkInputVrf/HighSlot-4                          298.5n ± ∞ ¹   299.7n ± ∞ ¹       ~ (p=0.389 n=5)
MkInputVrf/RandomEta0-4                        309.2n ± ∞ ¹   310.3n ± ∞ ¹       ~ (p=0.587 n=5)
FullWorkflow/Complete-4                        506.7µ ± ∞ ¹   507.1µ ± ∞ ¹       ~ (p=0.222 n=5)
FullWorkflow/WithRandomKeys-4                  507.0µ ± ∞ ¹   507.3µ ± ∞ ¹       ~ (p=0.310 n=5)
LeaderElectionWorkflow/SingleSlot-4            485.4µ ± ∞ ¹   485.4µ ± ∞ ¹       ~ (p=1.000 n=5)
LeaderElectionWorkflow/MultipleSlots-4         485.2µ ± ∞ ¹   485.5µ ± ∞ ¹       ~ (p=1.000 n=5)
ParallelProve-4                                96.59µ ± ∞ ¹   96.63µ ± ∞ ¹       ~ (p=0.310 n=5)
ParallelVerify-4                               115.8µ ± ∞ ¹   115.8µ ± ∞ ¹       ~ (p=1.000 n=5)
geomean                                        83.77µ         83.84µ        +0.08%
¹ need >= 6 samples for confidence interval at level 0.95

                                       │ benchmark-base.txt │         benchmark-current.txt         │
                                       │        B/op        │     B/op       vs base                │
KeyGen/StandardSeed-4                           64.00 ± ∞ ¹     64.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
KeyGen/CardanoSeed-4                            64.00 ± ∞ ¹     64.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
KeyGen/RandomSeed-4                             64.00 ± ∞ ¹     64.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
Prove/Empty-4                                   736.0 ± ∞ ¹     736.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Prove/32B-4                                     736.0 ± ∞ ¹     736.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Prove/256B-4                                    736.0 ± ∞ ¹     736.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Prove/1KB-4                                     736.0 ± ∞ ¹     736.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/Valid-4                                  816.0 ± ∞ ¹     816.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/Alpha_Empty-4                            816.0 ± ∞ ¹     816.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/Alpha_32B-4                              816.0 ± ∞ ¹     816.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/Alpha_256B-4                             816.0 ± ∞ ¹     816.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/Alpha_1KB-4                              816.0 ± ∞ ¹     816.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAndHash/Standard-4                        816.0 ± ∞ ¹     816.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAndHash/Alpha_Empty-4                     816.0 ± ∞ ¹     816.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAndHash/Alpha_32B-4                       816.0 ± ∞ ¹     816.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAndHash/Alpha_256B-4                      816.0 ± ∞ ¹     816.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAndHash/Alpha_1KB-4                       816.0 ± ∞ ¹     816.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
ProofToHash-4                                   224.0 ± ∞ ¹     224.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
MkInputVrf/LowSlot-4                            464.0 ± ∞ ¹     464.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
MkInputVrf/HighSlot-4                           464.0 ± ∞ ¹     464.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
MkInputVrf/RandomEta0-4                         464.0 ± ∞ ¹     464.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
FullWorkflow/Complete-4                       1.578Ki ± ∞ ¹   1.578Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
FullWorkflow/WithRandomKeys-4                 1.578Ki ± ∞ ¹   1.578Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
LeaderElectionWorkflow/SingleSlot-4           1.969Ki ± ∞ ¹   1.969Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
LeaderElectionWorkflow/MultipleSlots-4        1.969Ki ± ∞ ¹   1.969Ki ± ∞ ¹       ~ (p=1.000 n=5) ²
ParallelProve-4                                 736.0 ± ∞ ¹     736.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
ParallelVerify-4                                816.0 ± ∞ ¹     816.0 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                         607.6           607.6        +0.00%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

                                       │ benchmark-base.txt │        benchmark-current.txt        │
                                       │     allocs/op      │  allocs/op   vs base                │
KeyGen/StandardSeed-4                           2.000 ± ∞ ¹   2.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
KeyGen/CardanoSeed-4                            2.000 ± ∞ ¹   2.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
KeyGen/RandomSeed-4                             2.000 ± ∞ ¹   2.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
Prove/Empty-4                                   11.00 ± ∞ ¹   11.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
Prove/32B-4                                     11.00 ± ∞ ¹   11.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
Prove/256B-4                                    11.00 ± ∞ ¹   11.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
Prove/1KB-4                                     11.00 ± ∞ ¹   11.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/Valid-4                                  11.00 ± ∞ ¹   11.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/Alpha_Empty-4                            11.00 ± ∞ ¹   11.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/Alpha_32B-4                              11.00 ± ∞ ¹   11.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/Alpha_256B-4                             11.00 ± ∞ ¹   11.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
Verify/Alpha_1KB-4                              11.00 ± ∞ ¹   11.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAndHash/Standard-4                        11.00 ± ∞ ¹   11.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAndHash/Alpha_Empty-4                     11.00 ± ∞ ¹   11.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAndHash/Alpha_32B-4                       11.00 ± ∞ ¹   11.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAndHash/Alpha_256B-4                      11.00 ± ∞ ¹   11.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
VerifyAndHash/Alpha_1KB-4                       11.00 ± ∞ ¹   11.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
ProofToHash-4                                   2.000 ± ∞ ¹   2.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
MkInputVrf/LowSlot-4                            3.000 ± ∞ ¹   3.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
MkInputVrf/HighSlot-4                           3.000 ± ∞ ¹   3.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
MkInputVrf/RandomEta0-4                         3.000 ± ∞ ¹   3.000 ± ∞ ¹       ~ (p=1.000 n=5) ²
FullWorkflow/Complete-4                         24.00 ± ∞ ¹   24.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
FullWorkflow/WithRandomKeys-4                   24.00 ± ∞ ¹   24.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
LeaderElectionWorkflow/SingleSlot-4             25.00 ± ∞ ¹   25.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
LeaderElectionWorkflow/MultipleSlots-4          25.00 ± ∞ ¹   25.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
ParallelProve-4                                 11.00 ± ∞ ¹   11.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
ParallelVerify-4                                11.00 ± ∞ ¹   11.00 ± ∞ ¹       ~ (p=1.000 n=5) ²
geomean                                         8.328         8.328        +0.00%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal


Generated by benchmark workflow

@wolf31o2 wolf31o2 force-pushed the feat/memory-profiling branch from c7b7371 to 710f9cb Compare February 11, 2026 00:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In `@internal/bench/profile_test.go`:
- Around line 337-368: The sub-test TestProfileConsensus/LeaderValueComputation
generates VRF proofs but discards the proof hash instead of computing the leader
value; fix by either (A) invoking the actual consensus leader-value logic
(import the consensus package and call consensus.VrfLeaderValue or the exported
function that consumes VRF output) passing the VRF proof/hash returned by
vrf.Prove, or (B) if you prefer not to import consensus, compute the leader
value inline exactly as consensus does (e.g., compute a Blake2b-256 of the VRF
output via blake2b.Sum256(hash) and use that result instead of `_ = hash`), and
if you intend only to profile vrf.Prove rename the sub-test from
"LeaderValueComputation" to avoid confusion; update the loop body in
TestProfileConsensus/LeaderValueComputation accordingly.
- Around line 219-243: TestProfileTxValidation currently profiles an empty
validation loop because the local rules slice is empty; replace the empty rules
slice with the era's actual UtxoValidationRules (or a representative subset) and
pass that into common.VerifyTransaction so real validation paths are exercised
(refer to the UtxoValidationRules symbol to locate the rule set). Also switch
from using BenchLedgerState() to MockLedgerState (from
internal/test/ledger/ledger.go) for the ls variable so the test follows the
*_test.go guideline and uses the mock ledger state during validation profiling.
- Around line 417-487: TestProfileNativeScript currently never exercises native
script evaluation: it only checks tx.Cbor and then runs trivial benchmarks
(EmptyKeyHashLookup/SlotComparison). Replace or augment those sub-tests to
construct real NativeScript instances (use the helper constructors in
script_bench_test.go) and call NativeScript.Evaluate inside the hot loops
(instead of raw map lookups/integer compares). Also use the scriptsFound check
or remove it if unused; ensure you pass the same key set and validity window
(keyHashes, slot/validityStart/validityEnd) into the Evaluate calls so the
benchmark measures the actual script evaluation path (refer to
TestProfileNativeScript, EmptyKeyHashLookup, SlotComparison, and
NativeScript.Evaluate).

In `@internal/bench/README.md`:
- Around line 19-27: The README's "Package Structure" and "Available Benchmarks"
sections are incomplete; update internal/bench/README.md to list the new files
and directories (block_bench_test.go, cbor_bench_test.go, tx_bench_test.go,
script_bench_test.go, regression_test.go, helpers_test.go, BASELINES.md,
PROFILING.md, and testdata/) in the package structure, and expand the "Available
Benchmarks" section to document the block, CBOR, transaction, and script
benchmarks (reference the corresponding test files by name) and mention the
regression and profiling tests and baseline/profiling docs; keep descriptions
concise and consistent with the existing consensus benchmark entries.

In `@internal/bench/script_bench_test.go`:
- Around line 470-492: In BenchmarkPlutusScriptHash, the hex.DecodeString calls
for testPlutusV3ScriptHex and testPlutusV1ScriptHex discard errors; update the
setup in function BenchmarkPlutusScriptHash to check the returned errors for
plutusV3Bytes and plutusV1Bytes and call b.Fatalf (or b.Fatalf with the error)
if decoding fails so the benchmark fails fast instead of operating on nil/empty
bytes when creating common.PlutusV1Script and common.PlutusV3Script.
- Around line 560-576: The CBOR encode errors are currently ignored in the
benchmark setup (calls to cbor.Encode when building data for the "Decode/Pubkey"
and "Decode/All_Depth3" sub-benchmarks), which can produce nil data and
invalidate the decode benchmark; after each cbor.Encode(&script) call (used in
the createPubkeyScript and createNestedScript setups), check the returned error
and call b.Fatalf with a descriptive message including the error (e.g., "failed
to encode script: %v") so the benchmark fails fast on encode errors rather than
repeatedly unmarshaling nil.
🧹 Nitpick comments (2)
internal/bench/testdata/README.md (1)

7-11: Add language identifiers to fenced code blocks.

Markdownlint flags three fenced code blocks (lines 7, 38, 50) missing language specifiers. Use ```text for directory structure and fixture format examples.

Example fix for line 7
-```
+```text
 testdata/
   blocks/           # Block CBOR fixtures (future use)
   transactions/     # Transaction CBOR fixtures (future use)
</details>

</blockquote></details>
<details>
<summary>internal/bench/block_bench_test.go (1)</summary><blockquote>

`260-329`: **Duplicated KES data extraction switch block.**

The type-switch extracting `bodyCbor`, `signature`, `hotVkey`, and `kesPeriod` from era-specific headers is copy-pasted verbatim between `BenchmarkBlockKESVerification` (lines 260-329) and `BenchmarkVerifyKesComponents` (lines 508-577). Consider extracting a shared helper like `extractKesData(blocks []blockData) []kesData` to reduce maintenance burden.


<details>
<summary>Sketch of helper</summary>

```go
func extractKesData(blocks []blockData) []kesData {
	result := make([]kesData, 0, len(blocks))
	for _, bd := range blocks {
		header := bd.block.Header()
		var bodyCbor, sig, hotVkey []byte
		var kesPeriod uint64
		var err error

		switch h := header.(type) {
		case *shelley.ShelleyBlockHeader:
			bodyCbor, err = cbor.Encode(h.Body)
			if err != nil { continue }
			sig = h.Signature
			hotVkey = h.Body.OpCertHotVkey
			kesPeriod = uint64(h.Body.OpCertKesPeriod)
		// ... remaining cases ...
		default:
			continue
		}

		result = append(result, kesData{
			name: bd.name, bodyCbor: bodyCbor,
			signature: sig, hotVkey: hotVkey,
			kesPeriod: kesPeriod, slot: header.SlotNumber(),
		})
	}
	return result
}

Also applies to: 508-577

Comment on lines 219 to 243
// Use nil for protocol params - validation will be limited
// but we're profiling the validation code paths
var pp common.ProtocolParameters

// Empty validation rules slice - we're just profiling the framework
// overhead
var rules []common.UtxoValidationRuleFunc

for i := 0; i < profileIterations; i++ {
for txIdx, tx := range txs {
// VerifyTransaction runs all applicable validation rules
// Errors may occur since we use nil protocol parameters
err := common.VerifyTransaction(tx, slot, ls, pp, rules)
if err != nil && i == 0 && txIdx == 0 {
// Log once to avoid flooding output
t.Logf(
"VerifyTransaction error (expected with nil params): %v",
err,
)
}
}
}
})
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

TestProfileTxValidation profiles an empty loop — no validation rules are executed.

The rules slice on line 225 is empty, so common.VerifyTransaction iterates over zero rules and returns nil immediately. This profiles only the function-call overhead of the framework, not any actual transaction validation logic. To produce useful profiles, populate rules with the era's actual UtxoValidationRules or at least a representative subset.

Also, line 196 uses BenchLedgerState() rather than MockLedgerState from internal/test/ledger/ledger.go. As per coding guidelines, **/*_test.go: "Use MockLedgerState from internal/test/ledger/ledger.go for validation tests".

🤖 Prompt for AI Agents
In `@internal/bench/profile_test.go` around lines 219 - 243,
TestProfileTxValidation currently profiles an empty validation loop because the
local rules slice is empty; replace the empty rules slice with the era's actual
UtxoValidationRules (or a representative subset) and pass that into
common.VerifyTransaction so real validation paths are exercised (refer to the
UtxoValidationRules symbol to locate the rule set). Also switch from using
BenchLedgerState() to MockLedgerState (from internal/test/ledger/ledger.go) for
the ls variable so the test follows the *_test.go guideline and uses the mock
ledger state during validation profiling.

@wolf31o2 wolf31o2 force-pushed the feat/memory-profiling branch from 710f9cb to c4ebbb8 Compare February 11, 2026 12:41
@wolf31o2
Copy link
Member Author

@cubic-dev-ai review

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Feb 11, 2026

@cubic-dev-ai review

@wolf31o2 I have started the AI code review. It will take a few minutes to complete.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 15 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="internal/bench/block_bench_test.go">

<violation number="1" location="internal/bench/block_bench_test.go:127">
P2: The pre-parsed validation benchmark also uses the fixed all-zero eta0, which causes VerifyBlock to return on VRF failure and skip KES/body-hash checks for blocks whose epoch nonce differs. Use a per-block epoch nonce (or fixtures that include it) so this benchmark measures full validation instead of the VRF failure path.</violation>
</file>

<file name="internal/bench/regression_test.go">

<violation number="1" location="internal/bench/regression_test.go:354">
P3: `NewSumKesFromBytes` allocations are measured without first validating the deserialization succeeds, so an invalid signature/depth would still pass while measuring the error path. Add a pre-check (like other subtests) to assert the call succeeds before running `AllocsPerRun`.</violation>
</file>

<file name="internal/bench/tx_bench_test.go">

<violation number="1" location="internal/bench/tx_bench_test.go:46">
P2: Check the error from ledger.NewTransactionFromCbor so the benchmark fails if decoding regresses instead of silently benchmarking a failure path.</violation>

<violation number="2" location="internal/bench/tx_bench_test.go:186">
P2: Handle the error from tx.Utxorpc (e.g., pre-validate before timing or fail fast) to avoid benchmarking a failing path.</violation>
</file>

<file name="scripts/profile.sh">

<violation number="1" location="scripts/profile.sh:49">
P2: Option parsing treats `--help`/`--view` as positional arguments when profile_type is omitted, so the documented options fail unless both positional args are supplied. This makes `./scripts/profile.sh --help` and `./scripts/profile.sh block --view` error out even though defaults exist.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

}
_, _, _, _, err = ledger.VerifyBlock(
block,
defaultEta0Hex,
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The pre-parsed validation benchmark also uses the fixed all-zero eta0, which causes VerifyBlock to return on VRF failure and skip KES/body-hash checks for blocks whose epoch nonce differs. Use a per-block epoch nonce (or fixtures that include it) so this benchmark measures full validation instead of the VRF failure path.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/block_bench_test.go, line 127:

<comment>The pre-parsed validation benchmark also uses the fixed all-zero eta0, which causes VerifyBlock to return on VRF failure and skip KES/body-hash checks for blocks whose epoch nonce differs. Use a per-block epoch nonce (or fixtures that include it) so this benchmark measures full validation instead of the VRF failure path.</comment>

<file context>
@@ -0,0 +1,617 @@
+				}
+				_, _, _, _, err = ledger.VerifyBlock(
+					block,
+					defaultEta0Hex,
+					defaultSlotsPerKesPeriod,
+					cfg,
</file context>
Fix with Cubic

@wolf31o2 wolf31o2 force-pushed the feat/memory-profiling branch from c4ebbb8 to ccce5cb Compare February 12, 2026 16:13
@wolf31o2
Copy link
Member Author

@cubic-dev-ai review

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Feb 12, 2026

@cubic-dev-ai review

@wolf31o2 I have started the AI code review. It will take a few minutes to complete.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 15 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="internal/bench/PROFILING.md">

<violation number="1" location="internal/bench/PROFILING.md:25">
P2: Using `-http=:8080` binds pprof on all interfaces. For safety, docs should recommend binding to localhost (e.g., `-http=localhost:8080`) to avoid exposing profiling data on shared machines.</violation>
</file>

<file name="scripts/profile.sh">

<violation number="1" location="scripts/profile.sh:187">
P3: When `COMPONENT=all`, the final instructions point to non-existent `cpu_all.prof`/`mem_all.prof` files. This misleads users after running `./scripts/profile.sh all`. Adjust the output to use a real component or conditionally describe the example profile.</violation>
</file>

<file name="internal/bench/cbor_bench_test.go">

<violation number="1" location="internal/bench/cbor_bench_test.go:209">
P2: The block encode and round-trip benchmarks call `block.Cbor()`, which returns the original decoded CBOR bytes for DecodeStoreCbor-backed types. Since NewBlockFromCbor sets the stored CBOR, these benchmarks skip re-encoding and only measure a byte slice lookup. Use an explicit encode path (e.g., cbor.Encode) in both BenchmarkCBOREncodeBlock and BenchmarkCBORRoundTrip to benchmark serialization work.</violation>

<violation number="2" location="internal/bench/cbor_bench_test.go:231">
P2: BenchmarkCBOREncodeTx is using `transaction.Cbor()`, which returns the cached original bytes for DecodeStoreCbor-backed transactions. Since NewTransactionFromCbor sets the stored bytes, this benchmark skips encoding work. Use an explicit encoding call to measure serialization cost.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
result := transaction.Cbor()
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: BenchmarkCBOREncodeTx is using transaction.Cbor(), which returns the cached original bytes for DecodeStoreCbor-backed transactions. Since NewTransactionFromCbor sets the stored bytes, this benchmark skips encoding work. Use an explicit encoding call to measure serialization cost.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/cbor_bench_test.go, line 231:

<comment>BenchmarkCBOREncodeTx is using `transaction.Cbor()`, which returns the cached original bytes for DecodeStoreCbor-backed transactions. Since NewTransactionFromCbor sets the stored bytes, this benchmark skips encoding work. Use an explicit encoding call to measure serialization cost.</comment>

<file context>
@@ -0,0 +1,387 @@
+			b.ReportAllocs()
+			b.ResetTimer()
+			for i := 0; i < b.N; i++ {
+				result := transaction.Cbor()
+				if len(result) == 0 {
+					b.Fatalf("empty CBOR result for %s tx", tx.name)
</file context>
Fix with Cubic

b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
result := block.Cbor()
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The block encode and round-trip benchmarks call block.Cbor(), which returns the original decoded CBOR bytes for DecodeStoreCbor-backed types. Since NewBlockFromCbor sets the stored CBOR, these benchmarks skip re-encoding and only measure a byte slice lookup. Use an explicit encode path (e.g., cbor.Encode) in both BenchmarkCBOREncodeBlock and BenchmarkCBORRoundTrip to benchmark serialization work.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/cbor_bench_test.go, line 209:

<comment>The block encode and round-trip benchmarks call `block.Cbor()`, which returns the original decoded CBOR bytes for DecodeStoreCbor-backed types. Since NewBlockFromCbor sets the stored CBOR, these benchmarks skip re-encoding and only measure a byte slice lookup. Use an explicit encode path (e.g., cbor.Encode) in both BenchmarkCBOREncodeBlock and BenchmarkCBORRoundTrip to benchmark serialization work.</comment>

<file context>
@@ -0,0 +1,387 @@
+			b.ReportAllocs()
+			b.ResetTimer()
+			for i := 0; i < b.N; i++ {
+				result := block.Cbor()
+				if len(result) == 0 {
+					b.Fatalf("empty CBOR result for %s block", tb.name)
</file context>
Fix with Cubic


echo "=== Profiling complete ==="
echo ""
echo "To analyze profiles manually:"
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: When COMPONENT=all, the final instructions point to non-existent cpu_all.prof/mem_all.prof files. This misleads users after running ./scripts/profile.sh all. Adjust the output to use a real component or conditionally describe the example profile.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At scripts/profile.sh, line 187:

<comment>When `COMPONENT=all`, the final instructions point to non-existent `cpu_all.prof`/`mem_all.prof` files. This misleads users after running `./scripts/profile.sh all`. Adjust the output to use a real component or conditionally describe the example profile.</comment>

<file context>
@@ -0,0 +1,196 @@
+
+echo "=== Profiling complete ==="
+echo ""
+echo "To analyze profiles manually:"
+echo "  go tool pprof -http=:8080 $OUTPUT_DIR/cpu_${COMPONENT}.prof"
+echo "  go tool pprof -http=:8081 $OUTPUT_DIR/mem_${COMPONENT}.prof"
</file context>
Fix with Cubic

@wolf31o2 wolf31o2 force-pushed the feat/memory-profiling branch from ccce5cb to 2dc3f8a Compare February 13, 2026 17:08
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@internal/bench/cbor_bench_test.go`:
- Around line 244-280: BenchmarkCBORDecodeRaw currently assumes testBlocks has
at least one element and does testBlocks[0], which will panic if it's empty;
modify BenchmarkCBORDecodeRaw to first check if len(testBlocks) == 0 and call
b.Fatal or b.Fatalf with a clear message (e.g., "no test blocks available for
BenchmarkCBORDecodeRaw") before accessing testBlocks[0] or iterating to find
smallestBlock, then proceed with the existing smallestBlock selection and
benchmark runs.

In `@internal/bench/consensus_bench_test.go`:
- Around line 295-310: BenchmarkConsensusParallelThreshold currently shares a
single *big.Rat (activeSlotCoeff) across goroutines which causes a data race;
move creation of the big.Rat into the b.RunParallel callback so each goroutine
constructs its own instance (e.g., call big.NewRat(1, 20) or
new(big.Rat).Set(...) inside the pb loop) before invoking
consensus.CertifiedNatThreshold to avoid concurrent access to the same *big.Rat.
🧹 Nitpick comments (5)
internal/bench/helpers.go (1)

214-233: blockCborFromEra returns error but internally panics via MustDecodeHex.

The function signature returns ([]byte, error), but testdata.MustDecodeHex panics on invalid hex rather than returning an error. Since the hex data comes from compile-time constants this is practically safe, but it makes the error return misleading—callers believe errors are handled gracefully when they'd actually get a panic. Consider either using a non-panicking hex decode to propagate errors, or simplifying the return type if panics are acceptable here.

Option: propagate errors instead of panicking
 func blockCborFromEra(era string) ([]byte, error) {
+	var hexStr string
 	switch strings.ToLower(era) {
 	case "byron":
-		return testdata.MustDecodeHex(testdata.ByronBlockHex), nil
+		hexStr = testdata.ByronBlockHex
 	case "shelley":
-		return testdata.MustDecodeHex(testdata.ShelleyBlockHex), nil
+		hexStr = testdata.ShelleyBlockHex
 	// ... same for other eras
 	default:
 		return nil, fmt.Errorf("unknown era: %s", era)
 	}
+	b, err := hex.DecodeString(strings.TrimSpace(hexStr))
+	if err != nil {
+		return nil, fmt.Errorf("decode %s hex: %w", era, err)
+	}
+	return b, nil
 }
internal/bench/cbor_bench_test.go (1)

109-115: Minor duplication: mustDecodeHex vs testdata.MustDecodeHex.

This local helper duplicates testdata.MustDecodeHex (which also panics on decode failure). Consider reusing the existing utility to avoid drift.

internal/bench/block_bench_test.go (1)

525-629: Duplicated KES data extraction switch between BenchmarkBlockKESVerification and BenchmarkVerifyKesComponents.

The kesData struct and the ~50-line type switch extracting bodyCbor, signature, hotVkey, and kesPeriod from era-specific headers (lines 531-611) is a near-exact copy of lines 283-363. Consider extracting a shared helper like extractKesData(blocks []blockData) []kesData to reduce duplication and maintenance burden.

Sketch of extracted helper
type kesData struct {
	name      string
	bodyCbor  []byte
	signature []byte
	hotVkey   []byte
	kesPeriod uint64
	slot      uint64
}

func extractKesData(blocks []blockData) []kesData {
	result := make([]kesData, 0, len(blocks))
	for _, bd := range blocks {
		header := bd.block.Header()
		var bodyCbor []byte
		var sig, hotVkey []byte
		var kesPeriod uint64
		var err error

		switch h := header.(type) {
		case *shelley.ShelleyBlockHeader:
			bodyCbor, err = cbor.Encode(h.Body)
			sig = h.Signature
			hotVkey = h.Body.OpCertHotVkey
			kesPeriod = uint64(h.Body.OpCertKesPeriod)
		case *allegra.AllegraBlockHeader:
			bodyCbor, err = cbor.Encode(h.Body)
			sig = h.Signature
			hotVkey = h.Body.OpCertHotVkey
			kesPeriod = uint64(h.Body.OpCertKesPeriod)
		// ... remaining eras ...
		default:
			continue
		}
		if err != nil {
			continue
		}
		result = append(result, kesData{
			name: bd.name, bodyCbor: bodyCbor,
			signature: sig, hotVkey: hotVkey,
			kesPeriod: kesPeriod, slot: header.SlotNumber(),
		})
	}
	return result
}

Then both benchmarks would call extractKesData(postByronBlocks).

internal/bench/profile_test.go (1)

278-308: Stale/misleading comment: consensus package is directly imported, not "dynamically."

Line 279's comment says "Import consensus package dynamically to avoid circular dependencies" but consensus is already imported normally at line 42 and used directly at line 305. This comment should be removed to avoid confusion.

Suggested fix
 func TestProfileConsensus(t *testing.T) {
-	// Import consensus package dynamically to avoid circular dependencies
-	// For now, we test the VRF components directly
-
 	t.Run("LeaderValueComputation", func(t *testing.T) {
internal/bench/regression_test.go (1)

102-167: Consider extracting shared allocation baselines to avoid drift between TestAllocationRegression and TestRegression* tests.

The same limits appear in two places — e.g., VRFVerify limit of 12 is defined here (line 132) and again in TestRegressionVRF/Verify (line 280), LeaderCheck limit of 1350 here (line 137) and in TestRegressionConsensus/IsSlotLeaderFromComponents (line 412). If a baseline is updated in one place but not the other, the two tests will enforce different limits for the same operation.

Consider defining the baselines as package-level constants or a shared map.

@wolf31o2 wolf31o2 force-pushed the feat/memory-profiling branch from 2dc3f8a to 3feff17 Compare February 16, 2026 15:39
@wolf31o2
Copy link
Member Author

@cubic-dev-ai review

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Feb 16, 2026

@cubic-dev-ai review

@wolf31o2 I have started the AI code review. It will take a few minutes to complete.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 15 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".github/workflows/benchmark.yml">

<violation number="1" location=".github/workflows/benchmark.yml:35">
P2: The `|| true` forces the benchmark step to succeed even when `go test` fails, so CI will never fail on benchmark/test errors. Remove it so failures are surfaced.</violation>
</file>

<file name="internal/bench/BASELINES.md">

<violation number="1" location="internal/bench/BASELINES.md:11">
P3: The overview claims CI fails when allocations exceed thresholds by >50%, but the regression thresholds table lists allocation count at +20%. This inconsistency can mislead contributors about the actual CI criteria. Align the overview with the table or make it refer to the detailed thresholds below.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

./cbor/... \
./pipeline/... \
./ledger/... \
2>&1 | tee benchmark.txt || true
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The || true forces the benchmark step to succeed even when go test fails, so CI will never fail on benchmark/test errors. Remove it so failures are surfaced.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/benchmark.yml, line 35:

<comment>The `|| true` forces the benchmark step to succeed even when `go test` fails, so CI will never fail on benchmark/test errors. Remove it so failures are surfaced.</comment>

<file context>
@@ -0,0 +1,42 @@
+            ./cbor/... \
+            ./pipeline/... \
+            ./ledger/... \
+            2>&1 | tee benchmark.txt || true
+
+      - name: Upload benchmark results
</file context>
Fix with Cubic

@wolf31o2 wolf31o2 force-pushed the feat/memory-profiling branch from 3feff17 to c626b9c Compare February 16, 2026 22:37
@wolf31o2
Copy link
Member Author

@cubic-dev-ai review

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Feb 16, 2026

@cubic-dev-ai review

@wolf31o2 I have started the AI code review. It will take a few minutes to complete.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.


In @.github/workflows/benchmark.yml:
- Around line 26-35: Update the "Run benchmarks" job so the workflow produces
the three artifacts described in the README and enforces regression thresholds:
write the raw run output to benchmark-current.txt (replace the single
benchmark.txt), fetch or restore benchmark-base.txt (e.g., from artifacts or
main branch), run a small comparison step that generates
benchmark-comparison.txt and fails the job if any metric exceeds the README
thresholds (alloc +20%, bytes +50%, time +20%), and remove the "|| true" on the
benchmark run so failures surface instead of being swallowed; ensure the steps
and artifact names exactly match README references (benchmark-current.txt,
benchmark-base.txt, benchmark-comparison.txt) and name the steps such that "Run
benchmarks" is the producer of benchmark-current.txt.

In `@internal/bench/BASELINES.md`:
- Around line 185-191: The BASELINES.md table lists thresholds for "Allocation
Count", "Bytes Allocated", and "Time (ns/op)" but the CI job that runs
benchmarks (the "benchmark" job / step that runs benchmarks and uploads the
results) never compares results to these thresholds; add an enforcement step to
that job which downloads the benchmark artifact (e.g., the produced
bench-results JSON/CSV), parses the metrics, computes percent changes versus the
stored baseline values from BASELINES.md (or a machine-readable baseline file),
and exits non‑zero if any metric exceeds its threshold (+20% allocation count,
+50% bytes, +20% time); alternatively, if you don’t want enforcement yet, update
BASELINES.md to clearly mark these thresholds as "planned/not enforced" so
readers aren’t misled.

In `@internal/bench/block_bench_test.go`:
- Around line 527-607: The duplicated kesData type and era-specific header
switch in BenchmarkBlockKESVerification and the other benchmark should be
extracted into a single helper: add a function extractKesData(blocks
[]blockData) []kesData that owns the kesData type (or move kesData to package
scope) and implements the type-switch over header.(*shelley.ShelleyBlockHeader,
*allegra.AllegraBlockHeader, *mary.MaryBlockHeader, *alonzo.AlonzoBlockHeader,
*babbage.BabbageBlockHeader, *conway.ConwayBlockHeader) to cbor.Encode(h.Body),
set signature, hotVkey, kesPeriod and slot, skipping on encode errors, and
return the populated []kesData; then replace both duplicated blocks that iterate
postByronBlocks with a call to extractKesData(postByronBlocks).

In `@internal/bench/consensus_bench_test.go`:
- Around line 64-89: The shared big.Rat activeSlotCoeff in
BenchmarkConsensusThreshold can be mutated by functions like
consensus.CertifiedNatThreshold (and its internal lnOneMinus which may call
Denom/Mul); to fix, stop reusing the single activeSlotCoeff: create a new
big.NewRat(1, 20) inside each b.Run sub-benchmark (or just inside the inner
loop) and pass that fresh rat into consensus.CertifiedNatThreshold, mirroring
the safe pattern used by BenchmarkConsensusThresholdWithMode; update references
to activeSlotCoeff in the closure accordingly so each sub-benchmark gets its own
independent Rat.

In `@internal/bench/helpers_test.go`:
- Around line 103-119: TestGetTestBlocks and EraNames use different
capitalization (GetTestBlocks returns "Byron"/"Shelley" while EraNames returns
"byron"/"shelley"), so update any comparison logic to be case-insensitive: when
correlating between GetTestBlocks and EraNames (or in TestGetTestBlocks if you
add such assertions), normalize both sides (e.g., strings.ToLower on block.Name
and era name) or use strings.EqualFold to compare; reference GetTestBlocks,
EraNames, and TestGetTestBlocks to locate the code to change.

In `@internal/bench/profile_test.go`:
- Around line 189-261: TestProfileTxValidation currently uses BenchLedgerState()
contrary to the test-guideline that validation tests must use MockLedgerState;
change the setup to call MockLedgerState() instead of BenchLedgerState() (so the
VerifyTransaction(tx, slot, ls, nil, rules) calls run against the prescribed
mock ledger), and if MockLedgerState is genuinely incompatible with profiling
here, update the PR description to document why BenchLedgerState is required and
reference TestProfileTxValidation, BenchLedgerState, MockLedgerState, and
VerifyTransaction in that justification.

In `@internal/bench/README.md`:
- Around line 438-471: The README's CI artifact and regression threshold
description is out of sync with the workflow; update the "CI Benchmark Results"
section to match .github/workflows/benchmark.yml by replacing the three-file
list (benchmark-current.txt, benchmark-base.txt, benchmark-comparison.txt) with
the actual single artifact name (benchmark.txt), remove or mark as "not
implemented" the automatic comparison/regression threshold checks
(allocation/bytes/time), and add a short note pointing readers to
.github/workflows/benchmark.yml for the current artifact contents and any future
comparison logic.

In `@internal/bench/regression_test.go`:
- Around line 106-113: getThresholdMultiplier currently silently returns 1.0
when GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER is set but invalid; update
getThresholdMultiplier to detect ParseFloat errors and non-positive values and
emit a clear error message to stderr (e.g., via fmt.Fprintf(os.Stderr, ... ) or
log.Printf) that includes the env var name and the bad value, then fall back to
1.0; reference the function name getThresholdMultiplier and the env var
GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER when adding the logging so someone reading
failures can locate the change.
🧹 Nitpick comments (5)
🤖 Fix all nitpicks with AI agents
Verify each finding against the current code and only fix it if needed.


In `@internal/bench/block_bench_test.go`:
- Around line 527-607: The duplicated kesData type and era-specific header
switch in BenchmarkBlockKESVerification and the other benchmark should be
extracted into a single helper: add a function extractKesData(blocks
[]blockData) []kesData that owns the kesData type (or move kesData to package
scope) and implements the type-switch over header.(*shelley.ShelleyBlockHeader,
*allegra.AllegraBlockHeader, *mary.MaryBlockHeader, *alonzo.AlonzoBlockHeader,
*babbage.BabbageBlockHeader, *conway.ConwayBlockHeader) to cbor.Encode(h.Body),
set signature, hotVkey, kesPeriod and slot, skipping on encode errors, and
return the populated []kesData; then replace both duplicated blocks that iterate
postByronBlocks with a call to extractKesData(postByronBlocks).

In `@internal/bench/consensus_bench_test.go`:
- Around line 64-89: The shared big.Rat activeSlotCoeff in
BenchmarkConsensusThreshold can be mutated by functions like
consensus.CertifiedNatThreshold (and its internal lnOneMinus which may call
Denom/Mul); to fix, stop reusing the single activeSlotCoeff: create a new
big.NewRat(1, 20) inside each b.Run sub-benchmark (or just inside the inner
loop) and pass that fresh rat into consensus.CertifiedNatThreshold, mirroring
the safe pattern used by BenchmarkConsensusThresholdWithMode; update references
to activeSlotCoeff in the closure accordingly so each sub-benchmark gets its own
independent Rat.

In `@internal/bench/helpers_test.go`:
- Around line 103-119: TestGetTestBlocks and EraNames use different
capitalization (GetTestBlocks returns "Byron"/"Shelley" while EraNames returns
"byron"/"shelley"), so update any comparison logic to be case-insensitive: when
correlating between GetTestBlocks and EraNames (or in TestGetTestBlocks if you
add such assertions), normalize both sides (e.g., strings.ToLower on block.Name
and era name) or use strings.EqualFold to compare; reference GetTestBlocks,
EraNames, and TestGetTestBlocks to locate the code to change.

In `@internal/bench/profile_test.go`:
- Around line 189-261: TestProfileTxValidation currently uses BenchLedgerState()
contrary to the test-guideline that validation tests must use MockLedgerState;
change the setup to call MockLedgerState() instead of BenchLedgerState() (so the
VerifyTransaction(tx, slot, ls, nil, rules) calls run against the prescribed
mock ledger), and if MockLedgerState is genuinely incompatible with profiling
here, update the PR description to document why BenchLedgerState is required and
reference TestProfileTxValidation, BenchLedgerState, MockLedgerState, and
VerifyTransaction in that justification.

In `@internal/bench/regression_test.go`:
- Around line 106-113: getThresholdMultiplier currently silently returns 1.0
when GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER is set but invalid; update
getThresholdMultiplier to detect ParseFloat errors and non-positive values and
emit a clear error message to stderr (e.g., via fmt.Fprintf(os.Stderr, ... ) or
log.Printf) that includes the env var name and the bad value, then fall back to
1.0; reference the function name getThresholdMultiplier and the env var
GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER when adding the logging so someone reading
failures can locate the change.
internal/bench/helpers_test.go (1)

103-119: Note the capitalization mismatch between test block names and era names.

GetTestBlocks() returns blocks with capitalized names ("Byron", "Shelley", etc.) while EraNames() returns lowercase ("byron", "shelley", etc.). This isn't a bug since they come from different sources, but worth being aware of if any code needs to correlate between the two — a case-insensitive comparison or normalization would be needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/helpers_test.go` around lines 103 - 119, TestGetTestBlocks and
EraNames use different capitalization (GetTestBlocks returns "Byron"/"Shelley"
while EraNames returns "byron"/"shelley"), so update any comparison logic to be
case-insensitive: when correlating between GetTestBlocks and EraNames (or in
TestGetTestBlocks if you add such assertions), normalize both sides (e.g.,
strings.ToLower on block.Name and era name) or use strings.EqualFold to compare;
reference GetTestBlocks, EraNames, and TestGetTestBlocks to locate the code to
change.
internal/bench/regression_test.go (1)

106-113: Silent fallback on invalid GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER.

If someone sets GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER=abc, the function silently returns 1.0 with no warning. A t.Logf isn't possible here since there's no *testing.T, but consider logging to stderr or returning an error to avoid confusion when the env var is set but misspelled/malformed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/regression_test.go` around lines 106 - 113,
getThresholdMultiplier currently silently returns 1.0 when
GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER is set but invalid; update
getThresholdMultiplier to detect ParseFloat errors and non-positive values and
emit a clear error message to stderr (e.g., via fmt.Fprintf(os.Stderr, ... ) or
log.Printf) that includes the env var name and the bad value, then fall back to
1.0; reference the function name getThresholdMultiplier and the env var
GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER when adding the logging so someone reading
failures can locate the change.
internal/bench/consensus_bench_test.go (1)

64-89: Shared activeSlotCoeff may be mutated by CertifiedNatThreshold.

BenchmarkConsensusThreshold creates one activeSlotCoeff at line 65 and passes it through multiple sub-benchmarks sequentially. While sequential execution avoids a data race, CertifiedNatThresholdWithMode internally calls lnOneMinus(activeSlotCoeff) which may read the Rat's denominator via Denom() — and Denom() can normalize internal state. If the internal implementation ever wraps this in a Mul or similar operation that takes the same pointer as both receiver and argument, the shared value could be silently corrupted between sub-benchmarks.

Consider creating a fresh big.NewRat(1, 20) inside each sub-benchmark closure to be safe, consistent with how BenchmarkConsensusThresholdWithMode (line 94) already does it at the function level.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/consensus_bench_test.go` around lines 64 - 89, The shared
big.Rat activeSlotCoeff in BenchmarkConsensusThreshold can be mutated by
functions like consensus.CertifiedNatThreshold (and its internal lnOneMinus
which may call Denom/Mul); to fix, stop reusing the single activeSlotCoeff:
create a new big.NewRat(1, 20) inside each b.Run sub-benchmark (or just inside
the inner loop) and pass that fresh rat into consensus.CertifiedNatThreshold,
mirroring the safe pattern used by BenchmarkConsensusThresholdWithMode; update
references to activeSlotCoeff in the closure accordingly so each sub-benchmark
gets its own independent Rat.
internal/bench/block_bench_test.go (1)

527-607: Extract shared KES data extraction into a helper to eliminate duplication.

The kesData type definition and the era-specific header switch block (lines 527–607) is a near-exact copy of lines 279–358 in BenchmarkBlockKESVerification. Both iterate postByronBlocks, type-switch on the header, call cbor.Encode(h.Body), and populate the same struct fields. A single helper like extractKesData(blocks []blockData) []kesData would remove ~70 lines of duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/block_bench_test.go` around lines 527 - 607, The duplicated
kesData type and era-specific header switch in BenchmarkBlockKESVerification and
the other benchmark should be extracted into a single helper: add a function
extractKesData(blocks []blockData) []kesData that owns the kesData type (or move
kesData to package scope) and implements the type-switch over
header.(*shelley.ShelleyBlockHeader, *allegra.AllegraBlockHeader,
*mary.MaryBlockHeader, *alonzo.AlonzoBlockHeader, *babbage.BabbageBlockHeader,
*conway.ConwayBlockHeader) to cbor.Encode(h.Body), set signature, hotVkey,
kesPeriod and slot, skipping on encode errors, and return the populated
[]kesData; then replace both duplicated blocks that iterate postByronBlocks with
a call to extractKesData(postByronBlocks).
internal/bench/profile_test.go (1)

189-261: BenchLedgerState() used instead of MockLedgerState — guideline deviation.

Lines 192-195 explain the rationale, but the coding guideline for **/*_test.go states: "Use MockLedgerState from internal/test/ledger/ledger.go for validation tests." Since TestProfileTxValidation exercises VerifyTransaction with real era validation rules, it qualifies as a validation test. Consider switching to MockLedgerState or documenting why it's unsuitable for profiling in the PR description.

As per coding guidelines, **/*_test.go: "Use MockLedgerState from internal/test/ledger/ledger.go for validation tests".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/profile_test.go` around lines 189 - 261,
TestProfileTxValidation currently uses BenchLedgerState() contrary to the
test-guideline that validation tests must use MockLedgerState; change the setup
to call MockLedgerState() instead of BenchLedgerState() (so the
VerifyTransaction(tx, slot, ls, nil, rules) calls run against the prescribed
mock ledger), and if MockLedgerState is genuinely incompatible with profiling
here, update the PR description to document why BenchLedgerState is required and
reference TestProfileTxValidation, BenchLedgerState, MockLedgerState, and
VerifyTransaction in that justification.

Comment on lines 26 to 35
- name: Run benchmarks
run: |
go test -bench=. -benchmem -count=5 -timeout=30m \
./vrf/... \
./kes/... \
./consensus/... \
./cbor/... \
./pipeline/... \
./ledger/... \
2>&1 | tee benchmark.txt || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Workflow output doesn't match README documentation.

The README (lines 448–450) documents three artifacts (benchmark-current.txt, benchmark-base.txt, benchmark-comparison.txt) and describes regression thresholds (allocation +20%, bytes +50%, time +20%). However, this workflow only produces a single benchmark.txt with no comparison or threshold-checking step. Either update the README to match the actual workflow, or implement the missing comparison/regression logic.

Additionally, || true on line 35 silently swallows benchmark failures, so a completely broken benchmark run still uploads an empty or partial artifact and reports success.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/benchmark.yml around lines 26 - 35, Update the "Run
benchmarks" job so the workflow produces the three artifacts described in the
README and enforces regression thresholds: write the raw run output to
benchmark-current.txt (replace the single benchmark.txt), fetch or restore
benchmark-base.txt (e.g., from artifacts or main branch), run a small comparison
step that generates benchmark-comparison.txt and fails the job if any metric
exceeds the README thresholds (alloc +20%, bytes +50%, time +20%), and remove
the "|| true" on the benchmark run so failures surface instead of being
swallowed; ensure the steps and artifact names exactly match README references
(benchmark-current.txt, benchmark-base.txt, benchmark-comparison.txt) and name
the steps such that "Run benchmarks" is the producer of benchmark-current.txt.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 15 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/bench/consensus_bench_test.go">

<violation number="1" location="internal/bench/consensus_bench_test.go:81">
P3: Benchmark results are assigned to `_`, which allows the compiler to optimize away pure calls and skew allocations/time. Store results in a package-level sink variable to prevent elimination, per the benchmark README guidance.</violation>
</file>

<file name=".github/workflows/benchmark.yml">

<violation number="1" location=".github/workflows/benchmark.yml:27">
P2: The benchmark step masks failures (`|| true`) and pipes output without `pipefail`, so CI will succeed even if `go test` fails. This prevents the workflow from catching regressions or test failures.</violation>
</file>

<file name="internal/bench/helpers.go">

<violation number="1" location="internal/bench/helpers.go:140">
P3: `MustLoadTxFixture`/`LoadTxFixture` are never used anywhere in code (only defined here). Since `internal/bench` can’t be imported externally, these helpers are dead code unless a benchmark/test starts using them. Consider removing them or adding a benchmark that exercises them.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment on lines 27 to 35
run: |
go test -bench=. -benchmem -count=5 -timeout=30m \
./vrf/... \
./kes/... \
./consensus/... \
./cbor/... \
./pipeline/... \
./ledger/... \
2>&1 | tee benchmark.txt || true
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The benchmark step masks failures (|| true) and pipes output without pipefail, so CI will succeed even if go test fails. This prevents the workflow from catching regressions or test failures.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/benchmark.yml, line 27:

<comment>The benchmark step masks failures (`|| true`) and pipes output without `pipefail`, so CI will succeed even if `go test` fails. This prevents the workflow from catching regressions or test failures.</comment>

<file context>
@@ -0,0 +1,42 @@
+          go-version: '1.24'
+
+      - name: Run benchmarks
+        run: |
+          go test -bench=. -benchmem -count=5 -timeout=30m \
+            ./vrf/... \
</file context>
Suggested change
run: |
go test -bench=. -benchmem -count=5 -timeout=30m \
./vrf/... \
./kes/... \
./consensus/... \
./cbor/... \
./pipeline/... \
./ledger/... \
2>&1 | tee benchmark.txt || true
run: |
set -o pipefail
go test -bench=. -benchmem -count=5 -timeout=30m \
./vrf/... \
./kes/... \
./consensus/... \
./cbor/... \
./pipeline/... \
./ledger/... \
2>&1 | tee benchmark.txt
Fix with Cubic

@wolf31o2 wolf31o2 force-pushed the feat/memory-profiling branch from c626b9c to ef74249 Compare February 18, 2026 14:28
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (6)
internal/bench/helpers_test.go (2)

104-121: TestGetTestBlocks hardcodes 7 and will silently drift if testdata adds a new era.

The length assertion guards the count rather than the content; a new era in testdata would break this test. Consider asserting only on membership (which the subsequent map check already does), or deriving the expected length from EraNames().

♻️ Suggested improvement
 func TestGetTestBlocks(t *testing.T) {
     blocks := GetTestBlocks()
-    assert.Len(t, blocks, 7)
+    assert.Len(t, blocks, len(EraNames()))
 
     // Verify all eras are present
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/helpers_test.go` around lines 104 - 121, TestGetTestBlocks
currently asserts a fixed length of 7 which will break when testdata
adds/removes eras; update the test to avoid hardcoding by either removing the
assert.Len(t, blocks, 7) or replacing it with a dynamic check using EraNames()
(e.g., compare len(blocks) to len(EraNames()) or only assert membership as the
test already does). Locate the TestGetTestBlocks function and modify the
assertion around GetTestBlocks() so the test validates membership via the
eraNames map (or uses EraNames() for expected count) instead of a magic number.

25-30: BenchLedgerStateWithUtxos and BenchLedgerStateWithUtxoFunc are exported but untested.

Both are used by benchmark setup code and can drift silently if the ouroboros-mock builder API changes. Basic smoke tests (analogous to TestBenchLedgerState) would catch such regressions early.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/helpers_test.go` around lines 25 - 30, Add smoke tests for the
exported helpers BenchLedgerStateWithUtxos and BenchLedgerStateWithUtxoFunc
similar to TestBenchLedgerState: call each helper, assert the returned ledger
state is non-nil (require.NotNil) and validate basic expectations (e.g.,
NetworkId() equals 1 and that the returned UTXO set or iterator is
present/non-empty where applicable). For BenchLedgerStateWithUtxoFunc, pass a
simple UTXO-creating function and assert the ledger contains the produced UTXOs;
for BenchLedgerStateWithUtxos, provide a small slice of UTXO fixtures and assert
they appear in the ledger. Use the existing require/assert helpers and reference
the functions by name so the tests catch API drift in those builders.
internal/bench/helpers.go (1)

129-150: TxTypeFromEra is provided but there are no TxFixture, LoadTxFixture, or MustLoadTxFixture counterparts.

BlockTypeFromEra has a full fixture-loading API (BlockFixture, LoadBlockFixture, MustLoadBlockFixture). TxTypeFromEra is stranded without the corresponding fixture type and loaders, meaning tx_bench_test.go must roll its own CBOR loading logic. If transaction fixture loading is in scope for this PR, the helper is incomplete; if it's deferred, a TODO comment would clarify intent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/helpers.go` around lines 129 - 150, TxTypeFromEra currently
exists without the companion fixture types/loaders (TxFixture, LoadTxFixture,
MustLoadTxFixture) that BlockTypeFromEra has, so either implement the same
fixture-loading API or mark intent; add TxFixture type and functions
LoadTxFixture and MustLoadTxFixture mirroring the
BlockFixture/LoadBlockFixture/MustLoadBlockFixture behavior (CBOR loading, error
handling) so tx_bench_test.go can reuse them, or if transaction fixture loading
is out of scope for this PR, add a clear TODO comment above TxTypeFromEra
referencing tx_bench_test.go and noting that
TxFixture/LoadTxFixture/MustLoadTxFixture will be added later.
scripts/profile.sh (2)

62-63: Help extraction relies on hard-coded line numbers.

sed -n '16,43p' "$0" breaks silently if any lines are added or removed above line 16. A dedicated print_help() function is more maintainable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/profile.sh` around lines 62 - 63, Replace the fragile hard-coded sed
slice (sed -n '16,43p' "$0") with a dedicated print_help() function: create a
print_help() that outputs the script's help text (e.g., via a heredoc or by
printing commented blocks), call print_help() where the sed invocation and exit
0 currently live, and remove the sed-based extraction; this centralizes help
content, avoids dependence on line numbers, and ensures calls to print_help()
replace the old sed -n '16,43p' "$0" usage.

165-174: Silent failure in all mode gives no indication of which components failed.

run_profile "$comp" || true discards errors, and the view_profiles function silently skips missing profile files. If go test fails for any component (e.g., build error, missing build tag), the user receives no failure summary.

♻️ Suggested improvement: track and report failures
+FAILED_COMPONENTS=()
 for comp in "${COMPONENTS[@]}"; do
-    run_profile "$comp" || true
+    run_profile "$comp" || FAILED_COMPONENTS+=("$comp")
 done
+if [[ ${`#FAILED_COMPONENTS`[@]} -gt 0 ]]; then
+    echo "WARNING: profiling failed for: ${FAILED_COMPONENTS[*]}"
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/profile.sh` around lines 165 - 174, The loop that runs components
currently swallows errors via `run_profile "$comp" || true` and `view_profiles`
can silently skip missing files; change this to capture and report failures:
invoke `run_profile "$comp"` and store its exit status, append failing component
names to a failures array (e.g., FAILED_COMPONENTS+=("$comp")) instead of `||
true`, and after the loop if FAILED_COMPONENTS is non-empty print a concise
failure summary listing those components and exit non-zero (or return non-zero)
so CI/users see the error; also update `view_profiles` usage so when
`VIEW_PROFILES` is true it either warns about missing profile files and opens
the first successful component (e.g., choose the first element from COMPONENTS
that succeeded) or skips opening while clearly reporting which components had no
profiles.
internal/bench/regression_test.go (1)

127-180: Hardcoded allocation baselines are platform/Go-version specific.

The limits in TestAllocationRegression and the per-era caps in TestRegressionCBOR were measured on linux/arm64 with Go 1.24. Heap allocation counts can differ across CPU architectures and Go minor versions (escape-analysis improvements, GC heuristics, etc.). Running these tests on darwin/amd64 or with a newer Go toolchain may produce higher-or-lower counts, causing either spurious CI failures or silently missed regressions.

Consider adding a build constraint (e.g., //go:build linux && arm64) or a testing.Short() / platform check to skip the strict numerical assertions on non-baseline platforms, keeping them as informational via t.Logf only. The GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER env var is a good escape hatch, but requiring manual intervention for every new environment degrades DX.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/regression_test.go` around lines 127 - 180, The allocation
baselines in TestAllocationRegression (and related per-era caps referenced by
TestRegressionCBOR) are platform- and Go-version-specific and should not assert
strictly on non-baseline platforms; modify TestAllocationRegression to detect
the baseline platform (linux && arm64) and/or honor testing.Short() and only
perform the strict numeric assertion on the baseline platform (or when an
explicit env override GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER is set); for other
platforms, convert the check into a t.Logf informational message (or skip) so
tests don’t fail spuriously—look for the TestAllocationRegression and
TestRegressionCBOR symbols and gate the allocation comparison/ t.Errorf path
behind a platform/build check or testing.Short() guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/benchmark.yml:
- Around line 28-35: The benchmark workflow is missing the new bench package
path so CI never runs the new benchmarks; update the benchmark command that
lists packages (the block that calls `go test -bench=. -benchmem -count=5
-timeout=30m` and enumerates packages like `./vrf/...`, `./kes/...`, etc.) to
include `./internal/bench/...` among the package arguments so tests such as
`BenchmarkBlockValidation`, `BenchmarkCBORDecodeBlock`,
`BenchmarkNativeScriptEvaluate`, and `BenchmarkTxDecode` are executed in CI.

In `@internal/bench/BASELINES.md`:
- Around line 3-5: The BASELINES.md currently lists "Platform: linux/arm64"
which mismatches CI ("ubuntu-latest" / linux/amd64); update this by either
re-running the benchmark capture on an amd64 CI-equivalent environment and
replacing or adding an "amd64" section with new baselines, or clearly annotate
the existing section as "arm64" and add a separate "amd64" subsection with
corresponding allocation/byte/time baselines; ensure the metadata lines ("Last
Updated", "Go Version", "Platform") are updated for each platform and add a
short note describing which CI job these amd64 baselines map to.

In `@internal/bench/PROFILING.md`:
- Around line 57-83: Update the example commands in PROFILING.md that reference
non-existent packages/benchmarks: replace occurrences of "./pipeline/..." and
"./vrf/..." and the benchmark names "BenchmarkBlockPipeline" and
"BenchmarkParallelVerify" with valid targets (e.g., use "./internal/bench/..."
as the package path and an existing benchmark such as
"BenchmarkCBORDecodeBlockParallel" for the parallel/mutex examples) so the go
test examples run without "no packages loaded" or "no matching benchmarks"
errors.

In `@internal/bench/regression_test.go`:
- Around line 91-101: The init() currently panics if the Conway block isn’t
found; remove that panic and make the lookup lazy: stop failing in init() (leave
conwayBlockCbor nil if not found) and move the search for the Conway CBOR into
the helper cborBlockDecodeOnce so that it performs testdata.GetTestBlocks()
lookup on first use; if the block is still missing, have cborBlockDecodeOnce
return an error (or call testing.Skip when running inside a test) instead of
crashing, referencing conwayBlockCbor, init(), cborBlockDecodeOnce, and
testdata.GetTestBlocks() to locate the changes.

---

Duplicate comments:
In @.github/workflows/benchmark.yml:
- Around line 37-42: The workflow step "Upload benchmark results" currently
uploads a single artifact named benchmark.txt but README expects
benchmark-current.txt, benchmark-base.txt, and benchmark-comparison.txt and a
regression threshold check; update the job that produces benchmarks to output
those three files (create/rename benchmark-current.txt and benchmark-base.txt,
generate benchmark-comparison.txt by diffing/summary), then change the "Upload
benchmark results" step (actions/upload-artifact) to upload all three paths
(e.g., benchmark-current.txt, benchmark-base.txt, benchmark-comparison.txt)
using the same name pattern (benchmark-${{ github.sha }}-${{ github.run_number
}}); also add a new step before upload that enforces the regression threshold
(fail the job if the comparison shows regressions beyond the configured
threshold) and surface the failure via a non-zero exit so CI fails when the
threshold is exceeded.

In `@internal/bench/BASELINES.md`:
- Around line 185-191: The documented regression thresholds in BASELINES.md are
not enforced by the benchmark CI workflow; add a comparison step that reads the
uploaded/current benchmark.txt, computes percent deltas for Allocation Count,
Bytes Allocated, and Time (ns/op) versus the stored baseline, and fails the job
when any metric exceeds the thresholds (+20% allocs, +50% bytes, +20% time).
Implement this as a new workflow step (or action) that invokes a small
comparator script (e.g., compare_benchmarks) which accepts the current
benchmark.txt and the baseline, parses the named metrics, applies the
thresholds, and returns non-zero/prints a clear failure message referencing the
metric(s) that exceeded limits so the CI will fail the PR.

In `@internal/bench/profile_test.go`:
- Around line 189-261: The test TestProfileTxValidation uses BenchLedgerState()
but per test guidelines validation tests must use MockLedgerState; replace the
call ls := BenchLedgerState() with ls := MockLedgerState() (from
internal/test/ledger) and update the surrounding comment to reflect that this
uses the shared MockLedgerState for validation profiling (remove the incorrect
rationale about avoiding test framework dependencies); ensure any imports or
helper references in TestProfileTxValidation are adjusted to reference
MockLedgerState and its builder if needed.

In `@internal/bench/README.md`:
- Around line 428-461: Update the CI Benchmark Results README to reflect actual
artifact names and enforcement: replace references to `benchmark-current.txt`,
`benchmark-base.txt`, and `benchmark-comparison.txt` with the single
`benchmark.txt` produced by the workflow, and either remove or mark the listed
regression thresholds (allocation count >20%, bytes >50%, time ns/op >20%) as
not enforced — or add a note pointing to `.github/workflows/benchmark.yml` and
instruct to update that workflow to implement enforcement if you want them
enforced; ensure the README's “Contents” and “Regression Thresholds” sections
match the real artifact name `benchmark.txt` and the current workflow behavior.

---

Nitpick comments:
In `@internal/bench/helpers_test.go`:
- Around line 104-121: TestGetTestBlocks currently asserts a fixed length of 7
which will break when testdata adds/removes eras; update the test to avoid
hardcoding by either removing the assert.Len(t, blocks, 7) or replacing it with
a dynamic check using EraNames() (e.g., compare len(blocks) to len(EraNames())
or only assert membership as the test already does). Locate the
TestGetTestBlocks function and modify the assertion around GetTestBlocks() so
the test validates membership via the eraNames map (or uses EraNames() for
expected count) instead of a magic number.
- Around line 25-30: Add smoke tests for the exported helpers
BenchLedgerStateWithUtxos and BenchLedgerStateWithUtxoFunc similar to
TestBenchLedgerState: call each helper, assert the returned ledger state is
non-nil (require.NotNil) and validate basic expectations (e.g., NetworkId()
equals 1 and that the returned UTXO set or iterator is present/non-empty where
applicable). For BenchLedgerStateWithUtxoFunc, pass a simple UTXO-creating
function and assert the ledger contains the produced UTXOs; for
BenchLedgerStateWithUtxos, provide a small slice of UTXO fixtures and assert
they appear in the ledger. Use the existing require/assert helpers and reference
the functions by name so the tests catch API drift in those builders.

In `@internal/bench/helpers.go`:
- Around line 129-150: TxTypeFromEra currently exists without the companion
fixture types/loaders (TxFixture, LoadTxFixture, MustLoadTxFixture) that
BlockTypeFromEra has, so either implement the same fixture-loading API or mark
intent; add TxFixture type and functions LoadTxFixture and MustLoadTxFixture
mirroring the BlockFixture/LoadBlockFixture/MustLoadBlockFixture behavior (CBOR
loading, error handling) so tx_bench_test.go can reuse them, or if transaction
fixture loading is out of scope for this PR, add a clear TODO comment above
TxTypeFromEra referencing tx_bench_test.go and noting that
TxFixture/LoadTxFixture/MustLoadTxFixture will be added later.

In `@internal/bench/regression_test.go`:
- Around line 127-180: The allocation baselines in TestAllocationRegression (and
related per-era caps referenced by TestRegressionCBOR) are platform- and
Go-version-specific and should not assert strictly on non-baseline platforms;
modify TestAllocationRegression to detect the baseline platform (linux && arm64)
and/or honor testing.Short() and only perform the strict numeric assertion on
the baseline platform (or when an explicit env override
GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER is set); for other platforms, convert the
check into a t.Logf informational message (or skip) so tests don’t fail
spuriously—look for the TestAllocationRegression and TestRegressionCBOR symbols
and gate the allocation comparison/ t.Errorf path behind a platform/build check
or testing.Short() guard.

In `@scripts/profile.sh`:
- Around line 62-63: Replace the fragile hard-coded sed slice (sed -n '16,43p'
"$0") with a dedicated print_help() function: create a print_help() that outputs
the script's help text (e.g., via a heredoc or by printing commented blocks),
call print_help() where the sed invocation and exit 0 currently live, and remove
the sed-based extraction; this centralizes help content, avoids dependence on
line numbers, and ensures calls to print_help() replace the old sed -n '16,43p'
"$0" usage.
- Around line 165-174: The loop that runs components currently swallows errors
via `run_profile "$comp" || true` and `view_profiles` can silently skip missing
files; change this to capture and report failures: invoke `run_profile "$comp"`
and store its exit status, append failing component names to a failures array
(e.g., FAILED_COMPONENTS+=("$comp")) instead of `|| true`, and after the loop if
FAILED_COMPONENTS is non-empty print a concise failure summary listing those
components and exit non-zero (or return non-zero) so CI/users see the error;
also update `view_profiles` usage so when `VIEW_PROFILES` is true it either
warns about missing profile files and opens the first successful component
(e.g., choose the first element from COMPONENTS that succeeded) or skips opening
while clearly reporting which components had no profiles.

@wolf31o2 wolf31o2 force-pushed the feat/memory-profiling branch from ef74249 to caf22d1 Compare February 21, 2026 23:40
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (8)
internal/bench/helpers.go (1)

180-199: blockCborFromEra can panic despite returning ([]byte, error).

testdata.MustDecodeHex panics on invalid hex rather than returning an error, making the function's error return contract misleading. While static testdata makes a panic unlikely in practice, callers (e.g., LoadBlockFixture) wrap this in error handling that can never be reached for the hex-decode failure case.

♻️ Suggested fix
-func blockCborFromEra(era string) ([]byte, error) {
+func blockCborFromEra(era string) (_ []byte, retErr error) {
+    defer func() {
+        if r := recover(); r != nil {
+            retErr = fmt.Errorf("decode hex for era %s: %v", era, r)
+        }
+    }()
     switch strings.ToLower(era) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/helpers.go` around lines 180 - 199, The function
blockCborFromEra uses testdata.MustDecodeHex which panics on bad input,
violating its ([]byte, error) contract; change blockCborFromEra to perform a
non-panicking decode (e.g., call a testdata.DecodeHex that returns (bytes,
error) or use encoding/hex.DecodeString on testdata.ByronBlockHex,
testdata.ShelleyBlockHex, etc.), capture and return any decode error instead of
panicking, and update each case branch to return (decodedBytes, err) so callers
(like LoadBlockFixture) can handle errors normally.
internal/bench/helpers_test.go (1)

32-61: TxTypeFromEra (and the WithUtxos/WithUtxoFunc ledger state helpers) have no test coverage.

TxTypeFromEra in helpers.go is exported and has the same structure as BlockTypeFromEra, but is not tested here. BenchLedgerStateWithUtxos and BenchLedgerStateWithUtxoFunc are also untested.

♻️ Suggested additions
func TestTxTypeFromEra(t *testing.T) {
    tests := []struct {
        era     string
        want    uint
        wantErr bool
    }{
        {"byron", ledger.TxTypeByron, false},
        {"shelley", ledger.TxTypeShelley, false},
        {"allegra", ledger.TxTypeAllegra, false},
        {"mary", ledger.TxTypeMary, false},
        {"alonzo", ledger.TxTypeAlonzo, false},
        {"babbage", ledger.TxTypeBabbage, false},
        {"conway", ledger.TxTypeConway, false},
        {"Conway", ledger.TxTypeConway, false},
        {"unknown", 0, true},
    }
    for _, tc := range tests {
        t.Run(tc.era, func(t *testing.T) {
            txType, err := TxTypeFromEra(tc.era)
            if tc.wantErr {
                require.Error(t, err)
            } else {
                require.NoError(t, err)
                assert.Equal(t, tc.want, txType)
            }
        })
    }
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/helpers_test.go` around lines 32 - 61, Add unit tests
mirroring TestBlockTypeFromEra for the exported TxTypeFromEra: create
TestTxTypeFromEra with cases for "byron", "shelley", "allegra", "mary",
"alonzo", "babbage", "conway" (case-insensitive) and an "unknown" case asserting
error; call TxTypeFromEra and assert expected uint or error. Also add small
tests for BenchLedgerStateWithUtxos and BenchLedgerStateWithUtxoFunc: invoke
BenchLedgerStateWithUtxos with a sample UTXO slice and assert the returned
ledger state contains those UTXOs, and invoke BenchLedgerStateWithUtxoFunc with
a function that produces known UTXOs and assert the ledger state reflects them
(or that the provided function was called and its results are present).
internal/bench/cbor_bench_test.go (1)

30-38: Consider verifying that the hardcoded hex fixtures decode to valid transactions.

The hex strings are loaded in init() via mustDecodeHex, which only validates hex encoding — not CBOR validity. If a hex string is truncated or corrupted, the error will surface later as a confusing benchmark failure. A quick validation pass in init() (decoding each tx via NewTransactionFromCbor and discarding the result) would catch this early.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/cbor_bench_test.go` around lines 30 - 38, The hardcoded hex
fixtures (byronTxCborHex, shelleyTxCborHex, etc.) are only checked for hex
validity by mustDecodeHex in init(); add a validation step in init() that, after
mustDecodeHex returns the bytes for each fixture, calls NewTransactionFromCbor
(or the appropriate CBOR-to-transaction constructor) and handles any error
(panic or log.Fatal) so corrupted/truncated CBOR is detected early; reference
the existing init(), mustDecodeHex, and NewTransactionFromCbor symbols when
adding this validation loop over each fixture.
internal/bench/consensus_bench_test.go (1)

156-193: Shared benchActiveSlotCoeff is safe here but fragile for future changes.

benchActiveSlotCoeff (*big.Rat) is passed to consensus.IsSlotLeader in the hot loop (line 184). This works because sub-benchmarks run sequentially. However, if anyone adds b.SetParallelism or refactors to RunParallel, this will race — as was already discovered in BenchmarkConsensusParallelThreshold. Consider adding a brief comment noting the sequential-only constraint, similar to the comment in the parallel benchmark at line 308.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/consensus_bench_test.go` around lines 156 - 193, Add a short
comment above BenchmarkConsensusIsSlotLeader explaining that the shared
benchActiveSlotCoeff (*big.Rat) is reused across sub-benchmarks and the
benchmark relies on sequential execution (sub-benchmarks run one after another);
note that enabling b.SetParallelism or switching to RunParallel would introduce
races when passing benchActiveSlotCoeff into consensus.IsSlotLeader, and
reference the similar comment style used in BenchmarkConsensusParallelThreshold
to justify keeping the current non-parallel structure.
internal/bench/block_bench_test.go (2)

147-171: Misleading comment: loadTestBlocks loads all blocks including Byron.

Line 154 says "Skip Byron for VRF/KES benchmarks as it uses different validation", but the function actually loads all blocks. Byron filtering happens later in getPostByronBlocks. The comment should describe what this function does, not what callers do.

📝 Suggested comment fix
 	for _, tb := range testBlocks {
-		// Skip Byron for VRF/KES benchmarks as it uses different validation
+		// Parse each test block with bench-focused validation config
 		block, err := ledger.NewBlockFromCbor(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/block_bench_test.go` around lines 147 - 171, Update the
misleading function comment for loadTestBlocks to accurately state its behavior:
it loads all test blocks (including Byron) and panics on load error; do not
mention skipping Byron here—leave any filtering to getPostByronBlocks (or
callers) as appropriate. Locate the comment above the loadTestBlocks function
and replace the current “Skip Byron…” text with a concise description that this
function returns all test blocks from testdata and panics if any block fails to
load, referencing loadTestBlocks and getPostByronBlocks for clarity if needed.

289-361: BenchmarkBlockVRFVerification discards VRF verification result without checking.

Line 357: _, _ = vrf.Verify(...) — the comment (lines 343-346) explains that verification will fail due to dummy eta0. However, the benchmark is measuring successful crypto path cost. If vrf.Verify short-circuits on some validation failure (e.g., early return on proof format check), you may be measuring less work than intended. Consider asserting in the setup (outside the loop) that Verify returns a non-error to confirm the crypto path executes fully, even if valid is false.

💡 Suggested pre-validation check
 	for _, vd := range vrfDataList {
 		b.Run("Era_"+vd.name, func(b *testing.B) {
 			// Pre-compute the VRF input message
 			vrfMsg := vrf.MkInputVrf(int64(vd.slot), eta0)
+			// Verify the crypto path runs without error (result may be false
+			// due to dummy eta0, but the full computation should execute)
+			_, err := vrf.Verify(vd.vrfKey, vd.proof, vd.output, vrfMsg)
+			if err != nil {
+				b.Skipf("VRF Verify errors for %s (may lack matching nonce): %v", vd.name, err)
+			}
 
 			b.ReportAllocs()
 			b.ResetTimer()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/block_bench_test.go` around lines 289 - 361,
BenchmarkBlockVRFVerification currently ignores the result of vrf.Verify inside
the loop which can mask early short-circuits; add a pre-check per vrfData entry
(using vrf.MkInputVrf with eta0 and the same vd.slot) before b.ResetTimer() to
call vrf.Verify(vd.vrfKey, vd.proof, vd.output, vrfMsg) once and assert the call
returns no error (fail the sub-benchmark with b.Fatalf or b.Fatalf-like check)
so you guarantee the full cryptographic verification path is executed during
benchmarking; keep this check outside the timed loop that iterates b.N.
internal/bench/regression_test.go (1)

464-518: TestRegressionCBOR uses testdata.GetTestBlocks() directly instead of the package helper.

The helpers.go file (line 202-204) wraps this as GetTestBlocks(). Other code in this file (e.g., init() at line 92) also calls testdata.GetTestBlocks() directly, so this is at least internally consistent — but using the wrapper would improve indirection if the data source ever changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/regression_test.go` around lines 464 - 518, TestRegressionCBOR
calls testdata.GetTestBlocks() directly instead of the package wrapper
GetTestBlocks() defined in helpers.go; replace the direct call in
TestRegressionCBOR with the helper GetTestBlocks() (and update any other direct
uses of testdata.GetTestBlocks() in this file such as the init() usage) so the
code uses the indirection provided by the helper. Locate the TestRegressionCBOR
function and change the block source to call GetTestBlocks() from the local
package helper (not testdata.GetTestBlocks()), ensuring imports remain correct
and tests still pass.
internal/bench/tx_bench_test.go (1)

28-37: Silent continue on empty fixtures could mask broken test data.

When len(txs) == 0 or len(txCbor) == 0, the benchmark for that era is silently skipped with no log output. If a fixture regresses to having zero transactions, you won't notice. Consider adding b.Logf("skipping %s: no transactions in fixture", era) before continue.

💡 Example
 		txs := fixture.Block.Transactions()
 		if len(txs) == 0 {
+			b.Logf("skipping %s: no transactions in fixture", era)
 			continue
 		}
 		txCbor := txs[0].Cbor()
 		if len(txCbor) == 0 {
+			b.Logf("skipping %s: empty tx CBOR", era)
 			continue
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/tx_bench_test.go` around lines 28 - 37, The loop over eras
using PostByronEraNames() silently skips fixtures with no transactions or empty
CBOR; update the checks around MustLoadBlockFixture(...).Block.Transactions()
and txs[0].Cbor() to log the reason before continuing (e.g., call
b.Logf("skipping %s: no transactions in fixture", era) and b.Logf("skipping %s:
empty tx cbor", era)) so missing or empty fixtures are visible during test runs;
place these b.Logf calls immediately before each existing continue to preserve
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/bench/profile_test.go`:
- Around line 267-278: The test calls common.VerifyTransaction(tx, slot, ls,
nil, rules) passing a nil ProtocolParameters which can panic if rule ordering
changes; replace the nil with a safe zero-value/stub ProtocolParameters instance
(e.g., create a local zeroProtocolParameters or newProtocolParametersStub that
implements ProtocolParameters with safe defaults) and pass that into
VerifyTransaction (used in the initial err check and inside the hot loop that
iterates profileIterations using tx, slot, ls, rules) so validation rules that
reference protocol params do not dereference nil.

In `@internal/bench/PROFILING.md`:
- Around line 143-147: The example command uses a non-existent benchmark name
`BenchmarkVerifyBlock`; update the `go test -bench=...` example to reference an
actual benchmark such as `BenchmarkBlockValidation` or one of the block
serialization benchmarks (e.g. the ledger block serialization benchmark names)
so the command runs; replace `BenchmarkVerifyBlock` with
`BenchmarkBlockValidation` (or another real benchmark symbol) in the example and
keep the rest of the `-cpuprofile=cpu.prof ./...` invocation the same.

In `@internal/bench/testdata/README.md`:
- Around line 7-11: Update the three fenced code blocks in README.md (the
fixtures listing at the top and the two other code blocks around lines 38 and
50) to include a language specifier by changing the opening backticks to
```text; e.g., replace ``` with ```text for the blocks that show the testdata
directory and the two other examples so markdownlint MD040 is satisfied and
syntax highlighting is explicit.

In `@scripts/profile.sh`:
- Around line 199-201: The echo lines printing pprof commands currently escape
the COMPONENT variable (echo "  go tool pprof -top
$OUTPUT_DIR/cpu_\${COMPONENT}.prof" and mem) so users see a literal
${COMPONENT}; change them to allow shell expansion by removing the backslash
before $COMPONENT so the examples use the actual COMPONENT value at runtime
(keep OUTPUT_DIR expansion intact and ensure this behavior aligns with the
existing all-case handling around the if [[ "$COMPONENT" == "all" ]] block).

---

Duplicate comments:
In `@internal/bench/BASELINES.md`:
- Around line 3-5: The baselines file lists Platform: linux/arm64 but CI uses
ubuntu-latest (linux/amd64), so update BASELINES.md to match CI or add
architecture-specific baselines; specifically edit the "Platform" line in
BASELINES.md to "linux/amd64" (or add separate entries/files for linux/amd64 and
linux/arm64 with clear labels) and regenerate or note which baseline set CI
should use so allocation/timing comparisons align; ensure the "Last Updated" is
refreshed when you make the change.

In `@internal/bench/profile_test.go`:
- Around line 213-217: The test is using BenchLedgerState() where the project's
test-file guideline requires using the canonical MockLedgerState from
internal/test/ledger/ledger.go for validation tests; replace the
BenchLedgerState() call with the MockLedgerState() helper (and any required
setup/teardown it expects) so the test uses the shared mock implementation,
update any surrounding comments to reflect MockLedgerState, and adjust imports
or initialization in the test function to match MockLedgerState's API.

In `@internal/bench/PROFILING.md`:
- Around line 57-83: The examples reference non-existent benchmarks and package
globs (BenchmarkBlockPipeline, BenchmarkParallelVerify, ./pipeline/...,
./vrf/..., trace.out) which will error; update the commands to point to real
packages/benchmarks or use generic patterns: replace ./pipeline/... and
./vrf/... with either ./... or the actual existing package paths in the repo,
and replace the concrete benchmark names with a valid benchmark selector (for
example -bench=. or the actual benchmark function name present) so the go test
commands run successfully; also standardize the trace/profile output filenames
(e.g., cpu.prof, block.prof, mutex.prof, trace.out) if needed and ensure the
example show viewing the trace with go tool trace trace.out.

In `@internal/bench/README.md`:
- Around line 437-456: Update the README's "Contents" and "Regression
Thresholds" to match the actual behavior defined in the benchmark workflow:
inspect the bench workflow definition (the job/step names and artifact names in
benchmark.yml) and replace the listed artifact filenames
(`benchmark-current.txt`, `benchmark-base.txt`, `benchmark-comparison.txt`) and
numeric thresholds (allocation count 20%, bytes 50%, time 20%) with the exact
artifact names and regression rules enforced by the workflow; ensure the
README's "Accessing Artifacts" instructions point to the correct job/run name
used by the workflow and mirror any differences in how base/current comparisons
are produced so the README and the workflow (benchmark.yml) are consistent.

In `@internal/bench/regression_test.go`:
- Around line 91-101: The current init() panics if the "Conway" block is
missing; instead remove the panic and move the lookup into the lazy loader used
by cborBlockDecodeOnce (or wrap it with a sync.Once inside cborBlockDecodeOnce).
Concretely: stop searching for "Conway" in init(), ensure conwayBlockCbor is
looked up on first use inside the function/method named cborBlockDecodeOnce (or
its helper), protect the lookup with sync.Once so it runs once, and return a
clear error from cborBlockDecodeOnce (or the test helper) when the block cannot
be found rather than panicking the whole package.

---

Nitpick comments:
In `@internal/bench/block_bench_test.go`:
- Around line 147-171: Update the misleading function comment for loadTestBlocks
to accurately state its behavior: it loads all test blocks (including Byron) and
panics on load error; do not mention skipping Byron here—leave any filtering to
getPostByronBlocks (or callers) as appropriate. Locate the comment above the
loadTestBlocks function and replace the current “Skip Byron…” text with a
concise description that this function returns all test blocks from testdata and
panics if any block fails to load, referencing loadTestBlocks and
getPostByronBlocks for clarity if needed.
- Around line 289-361: BenchmarkBlockVRFVerification currently ignores the
result of vrf.Verify inside the loop which can mask early short-circuits; add a
pre-check per vrfData entry (using vrf.MkInputVrf with eta0 and the same
vd.slot) before b.ResetTimer() to call vrf.Verify(vd.vrfKey, vd.proof,
vd.output, vrfMsg) once and assert the call returns no error (fail the
sub-benchmark with b.Fatalf or b.Fatalf-like check) so you guarantee the full
cryptographic verification path is executed during benchmarking; keep this check
outside the timed loop that iterates b.N.

In `@internal/bench/cbor_bench_test.go`:
- Around line 30-38: The hardcoded hex fixtures (byronTxCborHex,
shelleyTxCborHex, etc.) are only checked for hex validity by mustDecodeHex in
init(); add a validation step in init() that, after mustDecodeHex returns the
bytes for each fixture, calls NewTransactionFromCbor (or the appropriate
CBOR-to-transaction constructor) and handles any error (panic or log.Fatal) so
corrupted/truncated CBOR is detected early; reference the existing init(),
mustDecodeHex, and NewTransactionFromCbor symbols when adding this validation
loop over each fixture.

In `@internal/bench/consensus_bench_test.go`:
- Around line 156-193: Add a short comment above BenchmarkConsensusIsSlotLeader
explaining that the shared benchActiveSlotCoeff (*big.Rat) is reused across
sub-benchmarks and the benchmark relies on sequential execution (sub-benchmarks
run one after another); note that enabling b.SetParallelism or switching to
RunParallel would introduce races when passing benchActiveSlotCoeff into
consensus.IsSlotLeader, and reference the similar comment style used in
BenchmarkConsensusParallelThreshold to justify keeping the current non-parallel
structure.

In `@internal/bench/helpers_test.go`:
- Around line 32-61: Add unit tests mirroring TestBlockTypeFromEra for the
exported TxTypeFromEra: create TestTxTypeFromEra with cases for "byron",
"shelley", "allegra", "mary", "alonzo", "babbage", "conway" (case-insensitive)
and an "unknown" case asserting error; call TxTypeFromEra and assert expected
uint or error. Also add small tests for BenchLedgerStateWithUtxos and
BenchLedgerStateWithUtxoFunc: invoke BenchLedgerStateWithUtxos with a sample
UTXO slice and assert the returned ledger state contains those UTXOs, and invoke
BenchLedgerStateWithUtxoFunc with a function that produces known UTXOs and
assert the ledger state reflects them (or that the provided function was called
and its results are present).

In `@internal/bench/helpers.go`:
- Around line 180-199: The function blockCborFromEra uses testdata.MustDecodeHex
which panics on bad input, violating its ([]byte, error) contract; change
blockCborFromEra to perform a non-panicking decode (e.g., call a
testdata.DecodeHex that returns (bytes, error) or use encoding/hex.DecodeString
on testdata.ByronBlockHex, testdata.ShelleyBlockHex, etc.), capture and return
any decode error instead of panicking, and update each case branch to return
(decodedBytes, err) so callers (like LoadBlockFixture) can handle errors
normally.

In `@internal/bench/regression_test.go`:
- Around line 464-518: TestRegressionCBOR calls testdata.GetTestBlocks()
directly instead of the package wrapper GetTestBlocks() defined in helpers.go;
replace the direct call in TestRegressionCBOR with the helper GetTestBlocks()
(and update any other direct uses of testdata.GetTestBlocks() in this file such
as the init() usage) so the code uses the indirection provided by the helper.
Locate the TestRegressionCBOR function and change the block source to call
GetTestBlocks() from the local package helper (not testdata.GetTestBlocks()),
ensuring imports remain correct and tests still pass.

In `@internal/bench/tx_bench_test.go`:
- Around line 28-37: The loop over eras using PostByronEraNames() silently skips
fixtures with no transactions or empty CBOR; update the checks around
MustLoadBlockFixture(...).Block.Transactions() and txs[0].Cbor() to log the
reason before continuing (e.g., call b.Logf("skipping %s: no transactions in
fixture", era) and b.Logf("skipping %s: empty tx cbor", era)) so missing or
empty fixtures are visible during test runs; place these b.Logf calls
immediately before each existing continue to preserve behavior.

@wolf31o2 wolf31o2 force-pushed the feat/memory-profiling branch from caf22d1 to a9daa92 Compare February 22, 2026 20:39
@wolf31o2
Copy link
Member Author

@cubic-dev-ai review

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Feb 23, 2026

@cubic-dev-ai review

@wolf31o2 I have started the AI code review. It will take a few minutes to complete.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 15 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/bench/consensus_bench_test.go">

<violation number="1" location="internal/bench/consensus_bench_test.go:86">
P3: Reset the benchmark timer after per-run setup so results measure only the operation under test (per internal/bench/README best practices).</violation>
</file>

<file name="scripts/profile.sh">

<violation number="1" location="scripts/profile.sh:200">
P3: When running with `component=all`, the script prints text-based analysis commands for `cpu_all.prof`/`mem_all.prof`, which are never generated. This misleads users after an `all` run. Consider substituting a concrete component (e.g., block) or clarifying the placeholder in these lines.</violation>
</file>

<file name=".github/workflows/benchmark.yml">

<violation number="1" location=".github/workflows/benchmark.yml:35">
P2: The pipeline exits with `tee`’s status, so `go test` failures won’t fail the workflow. Capture the pipeline status or enable pipefail to ensure benchmark failures stop the job.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

./cbor/... \
./pipeline/... \
./ledger/... \
2>&1 | tee benchmark.txt
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The pipeline exits with tee’s status, so go test failures won’t fail the workflow. Capture the pipeline status or enable pipefail to ensure benchmark failures stop the job.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/benchmark.yml, line 35:

<comment>The pipeline exits with `tee`’s status, so `go test` failures won’t fail the workflow. Capture the pipeline status or enable pipefail to ensure benchmark failures stop the job.</comment>

<file context>
@@ -0,0 +1,42 @@
+            ./cbor/... \
+            ./pipeline/... \
+            ./ledger/... \
+            2>&1 | tee benchmark.txt
+
+      - name: Upload benchmark results
</file context>
Fix with Cubic

for _, s := range stakes {
b.Run(s.name, func(b *testing.B) {
activeSlotCoeff := big.NewRat(1, 20) // fresh per sub-benchmark
b.ReportAllocs()
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Reset the benchmark timer after per-run setup so results measure only the operation under test (per internal/bench/README best practices).

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/consensus_bench_test.go, line 86:

<comment>Reset the benchmark timer after per-run setup so results measure only the operation under test (per internal/bench/README best practices).</comment>

<file context>
@@ -0,0 +1,324 @@
+	for _, s := range stakes {
+		b.Run(s.name, func(b *testing.B) {
+			activeSlotCoeff := big.NewRat(1, 20) // fresh per sub-benchmark
+			b.ReportAllocs()
+			for i := 0; i < b.N; i++ {
+				consensusSink = consensus.CertifiedNatThreshold(
</file context>
Fix with Cubic

fi
echo ""
echo "For text-based analysis:"
echo " go tool pprof -top $OUTPUT_DIR/cpu_${COMPONENT}.prof"
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: When running with component=all, the script prints text-based analysis commands for cpu_all.prof/mem_all.prof, which are never generated. This misleads users after an all run. Consider substituting a concrete component (e.g., block) or clarifying the placeholder in these lines.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At scripts/profile.sh, line 200:

<comment>When running with `component=all`, the script prints text-based analysis commands for `cpu_all.prof`/`mem_all.prof`, which are never generated. This misleads users after an `all` run. Consider substituting a concrete component (e.g., block) or clarifying the placeholder in these lines.</comment>

<file context>
@@ -0,0 +1,204 @@
+fi
+echo ""
+echo "For text-based analysis:"
+echo "  go tool pprof -top $OUTPUT_DIR/cpu_${COMPONENT}.prof"
+echo "  go tool pprof -top $OUTPUT_DIR/mem_${COMPONENT}.prof"
+echo ""
</file context>
Fix with Cubic

@wolf31o2 wolf31o2 force-pushed the feat/memory-profiling branch from a9daa92 to 4d70dd3 Compare February 25, 2026 22:25
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (6)
.github/workflows/benchmark.yml (2)

29-36: ⚠️ Potential issue | 🟠 Major

./internal/bench/... is still absent — CI never runs any of the benchmarks introduced by this PR.

All new benchmarks (BenchmarkBlockValidation, BenchmarkCBORDecodeBlock, BenchmarkNativeScriptEvaluate, BenchmarkTxDecode, etc.) live in ./internal/bench/..., which is missing from the package list. Every benchmark this PR introduces is silently skipped in CI.

🐛 Proposed fix
       run: |
         set -o pipefail
         go test -bench=. -benchmem -count=5 -timeout=30m \
           ./vrf/... \
           ./kes/... \
           ./consensus/... \
           ./cbor/... \
           ./pipeline/... \
           ./ledger/... \
+          ./internal/bench/... \
           2>&1 | tee benchmark.txt
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/benchmark.yml around lines 29 - 36, The workflow's go test
package list omits the new benchmark packages so CI never runs the new
benchmarks; update the go test invocation in the benchmark job (the multi-line
command that starts with "go test -bench=. -benchmem -count=5 -timeout=30m") to
include ./internal/bench/... alongside the other packages (so that benchmarks
such as BenchmarkBlockValidation, BenchmarkCBORDecodeBlock,
BenchmarkNativeScriptEvaluate, BenchmarkTxDecode are executed), then keep piping
output to tee as before.

29-36: ⚠️ Potential issue | 🟠 Major

./internal/bench/... is still missing from the CI benchmark run.

All benchmarks introduced in this PR (BenchmarkBlockValidation, BenchmarkCBORDecodeBlock, BenchmarkNativeScriptEvaluate, BenchmarkTxDecode, etc.) live in ./internal/bench/..., which is absent from the package list. CI silently skips every benchmark this PR adds.

🐛 Proposed fix
       run: |
         set -o pipefail
         go test -bench=. -benchmem -count=5 -timeout=30m \
           ./vrf/... \
           ./kes/... \
           ./consensus/... \
           ./cbor/... \
           ./pipeline/... \
           ./ledger/... \
+          ./internal/bench/... \
           2>&1 | tee benchmark.txt
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/benchmark.yml around lines 29 - 36, The CI benchmark job
is missing the package that contains the new benchmarks; add the missing package
glob ./internal/bench/... to the list of packages passed to the go test command
(alongside ./vrf/..., ./kes/..., etc.) so the new benchmarks (e.g.,
BenchmarkBlockValidation, BenchmarkCBORDecodeBlock,
BenchmarkNativeScriptEvaluate, BenchmarkTxDecode) are executed and recorded in
benchmark.txt.
internal/bench/PROFILING.md (2)

57-82: ⚠️ Potential issue | 🟡 Minor

Block/mutex/trace profile examples still reference non-existent package paths and benchmarks.

Lines 62, 70, and 78 all reference ./pipeline/... (which may not exist) and benchmark names BenchmarkBlockPipeline/BenchmarkParallelVerify that are likely absent. Users following these examples will get no packages loaded or no matching benchmarks errors.

Consider replacing with real targets — e.g., ./internal/bench/... and BenchmarkCBORDecodeBlockParallel for the parallel/mutex case.

💡 Suggested replacements
-go test -bench=BenchmarkBlockPipeline -blockprofile=block.prof ./pipeline/...
+go test -bench=BenchmarkConsensusParallelThreshold -blockprofile=block.prof ./internal/bench/...
-go test -bench=BenchmarkParallelVerify -mutexprofile=mutex.prof ./vrf/...
+go test -bench=BenchmarkCBORDecodeBlockParallel -mutexprofile=mutex.prof ./internal/bench/...
-go test -bench=BenchmarkBlockPipeline -trace=trace.out ./pipeline/...
+go test -bench=BenchmarkCBORDecodeBlockParallel -trace=trace.out ./internal/bench/...
#!/bin/bash
# Verify whether ./pipeline exists and whether BenchmarkBlockPipeline / BenchmarkParallelVerify exist
echo "=== Checking for pipeline package ==="
fd -t d "pipeline" --max-depth 2

echo ""
echo "=== Checking for BenchmarkBlockPipeline ==="
rg -rn "func BenchmarkBlockPipeline" --type=go

echo ""
echo "=== Checking for BenchmarkParallelVerify ==="
rg -rn "func BenchmarkParallelVerify" --type=go
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/PROFILING.md` around lines 57 - 82, Update the example
commands in PROFILING.md to point at actual packages and benchmarks: replace
occurrences of "./pipeline/..." with "./internal/bench/..." and swap the
non-existent benchmark names (BenchmarkBlockPipeline, BenchmarkParallelVerify)
for real benchmarks such as BenchmarkCBORDecodeBlockParallel (use this for the
parallel/mutex examples) and the corresponding block-profile benchmark (e.g.,
BenchmarkCBORDecodeBlock or another existing block-related benchmark in
internal/bench) so the examples run without "no packages loaded" or "no matching
benchmarks" errors; locate and change the three example blocks that reference
BenchmarkBlockPipeline/BenchmarkParallelVerify and ./pipeline/... accordingly.

57-82: ⚠️ Potential issue | 🟡 Minor

Block/mutex/trace examples still reference non-existent ./pipeline/... and benchmark names.

Lines 62, 70, and 78 all reference ./pipeline/... and BenchmarkBlockPipeline/BenchmarkParallelVerify. These paths/benchmarks likely don't exist, so users following these examples will get no packages loaded or silent no-match errors.

💡 Suggested replacements
-go test -bench=BenchmarkBlockPipeline -blockprofile=block.prof ./pipeline/...
+go test -bench=BenchmarkConsensusParallelThreshold -blockprofile=block.prof ./internal/bench/...
-go test -bench=BenchmarkParallelVerify -mutexprofile=mutex.prof ./vrf/...
+go test -bench=BenchmarkCBORDecodeBlockParallel -mutexprofile=mutex.prof ./internal/bench/...
-go test -bench=BenchmarkBlockPipeline -trace=trace.out ./pipeline/...
+go test -bench=BenchmarkCBORDecodeBlockParallel -trace=trace.out ./internal/bench/...
#!/bin/bash
# Verify pipeline package existence and non-existent benchmark names
echo "=== pipeline package? ==="
fd -t d "^pipeline$" --max-depth 3

echo ""
echo "=== BenchmarkBlockPipeline ==="
rg -rn "func BenchmarkBlockPipeline" --type=go

echo ""
echo "=== BenchmarkParallelVerify ==="
rg -rn "func BenchmarkParallelVerify" --type=go
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/PROFILING.md` around lines 57 - 82, The examples reference
non-existent packages/benchmarks (BenchmarkBlockPipeline,
BenchmarkParallelVerify and ./pipeline/..././vrf...), so update the three
commands to use either real packages/benchmarks or a generic safe form; replace
./pipeline/... and ./vrf/... with ./... (or specific existing package
directories discovered via fd/rg) and swap the explicit
BenchmarkBlockPipeline/BenchmarkParallelVerify names for either an actual
benchmark found in the repo or a generic pattern such as -bench=. to ensure the
commands run without "no packages loaded" or "no matching benchmarks" errors;
update all occurrences in the Block Profile, Mutex Profile and Trace examples
accordingly (look for the symbols BenchmarkBlockPipeline,
BenchmarkParallelVerify, ./pipeline/... and ./vrf... to locate the lines to
change).
internal/bench/block_bench_test.go (1)

374-381: LGTM — uint64 underflow guard correctly implemented.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/block_bench_test.go` around lines 374 - 381, The uint64
underflow guard in the test (the check comparing currentKesPeriod and
kd.kesPeriod that calls b.Skipf and returns) is correct; no code change
required—keep the existing conditional that skips when currentKesPeriod <
kd.kesPeriod (variables currentKesPeriod and kd.kesPeriod) as implemented.
internal/bench/regression_test.go (1)

91-101: ⚠️ Potential issue | 🟡 Minor

init() panic on missing Conway block will crash the entire bench package — lazily load instead.

If testdata.GetTestBlocks() ever omits or renames the "Conway" entry, every test and benchmark in this package fails at startup. Move the lookup into cborBlockDecodeOnce so only the CBOR regression tests are affected.

🛡️ Proposed fix
-	// Initialize block test data from testdata package
-	blocks := testdata.GetTestBlocks()
-	for _, block := range blocks {
-		if block.Name == "Conway" {
-			conwayBlockCbor = block.Cbor
-			break
-		}
-	}
-	if conwayBlockCbor == nil {
-		panic("failed to find Conway block in testdata")
-	}
 func cborBlockDecodeOnce() any {
+	if conwayBlockCbor == nil {
+		for _, block := range testdata.GetTestBlocks() {
+			if block.Name == "Conway" {
+				conwayBlockCbor = block.Cbor
+				break
+			}
+		}
+	}
+	if conwayBlockCbor == nil {
+		panic("Conway block not found in testdata")
+	}
 	block, err := ledger.NewBlockFromCbor(
 		ledger.BlockTypeConway,
 		conwayBlockCbor,
 	)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/regression_test.go` around lines 91 - 101, The init()
currently panics if testdata.GetTestBlocks() lacks the "Conway" block; move that
lookup into the existing cborBlockDecodeOnce so the lookup is done lazily only
when cborBlockDecodeOnce runs. Remove the panic from init(), and inside the
cborBlockDecodeOnce.Do(...) body perform the loop over testdata.GetTestBlocks(),
set conwayBlockCbor when Name == "Conway", and handle a missing block by
returning/setting an error or nil only for the CBOR regression path (no
package-wide panic). Ensure the lookup is protected by the same sync.Once
(cborBlockDecodeOnce) and update any callers to handle a missing conwayBlockCbor
gracefully.
🧹 Nitpick comments (8)
scripts/profile.sh (2)

183-207: Post-run instructions unconditionally print both CPU and memory profile paths regardless of PROFILE_TYPE.

When run with cpu or mem only, one of the two echoed paths won't exist on disk, which is confusing.

💡 Suggested fix for the `else` block (lines 194–197); apply same pattern to lines 203–207
 else
     echo "To analyze profiles manually:"
-    echo "  go tool pprof -http=localhost:8080 $OUTPUT_DIR/cpu_${COMPONENT}.prof"
-    echo "  go tool pprof -http=localhost:8081 $OUTPUT_DIR/mem_${COMPONENT}.prof"
+    if [[ "$PROFILE_TYPE" == "cpu" || "$PROFILE_TYPE" == "both" ]]; then
+        echo "  go tool pprof -http=localhost:8080 $OUTPUT_DIR/cpu_${COMPONENT}.prof"
+    fi
+    if [[ "$PROFILE_TYPE" == "mem" || "$PROFILE_TYPE" == "both" ]]; then
+        echo "  go tool pprof -http=localhost:8081 $OUTPUT_DIR/mem_${COMPONENT}.prof"
+    fi
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/profile.sh` around lines 183 - 207, The post-run help prints both CPU
and memory profile paths even when PROFILE_TYPE is set to "cpu" or "mem"; update
the echo logic to conditionally print only the relevant profile(s) based on the
PROFILE_TYPE variable: when PROFILE_TYPE == "cpu" print only
cpu_${COMPONENT}.prof, when "mem" print only mem_${COMPONENT}.prof, and when
"all" or empty keep printing both. Change both the interactive example block
(uses go tool pprof -http... $OUTPUT_DIR/cpu_${COMPONENT}.prof /
mem_${COMPONENT}.prof) and the text-based analysis block (uses go tool pprof
-top ...) to follow this same conditional branching, referencing the existing
COMPONENT, PROFILE_TYPE and OUTPUT_DIR variables.

183-207: Post-run instructions unconditionally show both CPU and memory paths regardless of PROFILE_TYPE.

When the user runs with -cpu only or -mem only, the closing instructions always echo both cpu_*.prof and mem_*.prof paths — one of which won't exist. This is a minor UX issue.

💡 Suggested fix
 else
     echo "To analyze profiles manually:"
-    echo "  go tool pprof -http=localhost:8080 $OUTPUT_DIR/cpu_${COMPONENT}.prof"
-    echo "  go tool pprof -http=localhost:8081 $OUTPUT_DIR/mem_${COMPONENT}.prof"
+    if [[ "$PROFILE_TYPE" == "cpu" || "$PROFILE_TYPE" == "both" ]]; then
+        echo "  go tool pprof -http=localhost:8080 $OUTPUT_DIR/cpu_${COMPONENT}.prof"
+    fi
+    if [[ "$PROFILE_TYPE" == "mem" || "$PROFILE_TYPE" == "both" ]]; then
+        echo "  go tool pprof -http=localhost:8081 $OUTPUT_DIR/mem_${COMPONENT}.prof"
+    fi
 fi

Apply the same pattern to the text-based analysis block on lines 203–207.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/profile.sh` around lines 183 - 207, The post-run help always prints
both CPU and memory profile paths even when PROFILE_TYPE is set to "cpu" or
"mem"; update the echo blocks that print manual and text-based analysis
instructions to respect PROFILE_TYPE: when PROFILE_TYPE == "cpu" only print
cpu_${COMPONENT}.prof (using OUTPUT_DIR), when PROFILE_TYPE == "mem" only print
mem_${COMPONENT}.prof, and when PROFILE_TYPE == "all" keep the existing
dual-path outputs (handle both the "all" branch and the else branch logic for
the text-based analysis block as well as the earlier manual-analysis block).
internal/bench/profile_test.go (2)

71-199: Consider replacing t.Fatalf with require.NoError for unexpected-error paths.

All t.Fatalf(...) calls in setup paths are semantically equivalent to require.NoError(t, err, "..."). Adopting testify here aligns with the project guideline. As per coding guidelines: **/*_test.go: "Use testify assertions (e.g., assert.NoError(), assert.Equal()) for Go tests".

The deliberate t.Logf calls for expected errors (dummy VRF nonce, missing UTXOs) are fine as-is.

Also applies to: 210-283, 293-318, 328-401, 412-439, 450-477, 487-556

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/profile_test.go` around lines 71 - 199, Replace
unexpected-error t.Fatalf usage in TestProfileBlockValidation setup with
testify's require.NoError to follow project test guidelines: in the pre-decode
loop where ledger.NewBlockFromCbor is called (and any other setup spots
flagged), call require.NoError(t, err, "failed to pre-decode %s block",
block.Name) (or equivalent) and import "github.com/stretchr/testify/require" at
the top of the test file; keep the existing t.Logf calls for expected errors in
VerifyBlock hot loops untouched.

71-199: Consider using testify assertions for unexpected-error fatal paths.

The profiling tests use t.Fatalf(...) for unexpected errors (setup failures). Per the coding guidelines, Go test files should use testify assertions. The expected-error paths that use t.Logf are fine as-is, but setup failures could use require.NoError(t, err, "...") for consistency.

This is a low-priority change since the semantics are equivalent, but it aligns the file with the project's testing conventions. Based on coding guidelines: **/*_test.go: "Use testify assertions (e.g., assert.NoError(), assert.Equal()) for Go tests".

Also applies to: 210-283, 293-318, 328-401, 412-439, 450-477, 487-556

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/profile_test.go` around lines 71 - 199, The test uses t.Fatalf
for setup failures (e.g., in TestProfileBlockValidation when pre-decoding blocks
via ledger.NewBlockFromCbor) which should be replaced with testify's
require.NoError to follow project testing conventions; update the pre-decode
error check to use require.NoError(t, err, "failed to pre-decode %s block",
block.Name) and similarly replace other t.Fatalf occurrences in this file
(search for t.Fatalf and setup failure checks around NewBlockFromCbor, any test
setup sections, and the various TestProfile* blocks) with require.NoError
assertions, ensuring you import github.com/stretchr/testify/require at the top
of the test file.
internal/bench/cbor_bench_test.go (1)

109-115: mustDecodeHex duplicates testdata.MustDecodeHex — use the existing helper.

testdata is already imported at line 22 and provides testdata.MustDecodeHex. The local function can be removed and call sites updated.

♻️ Proposed fix
-func mustDecodeHex(s string) []byte {
-	b, err := hex.DecodeString(s)
-	if err != nil {
-		panic("failed to decode hex: " + err.Error())
-	}
-	return b
-}

Replace all mustDecodeHex(...) call sites in init() with testdata.MustDecodeHex(...), and remove the "encoding/hex" import.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/cbor_bench_test.go` around lines 109 - 115, Remove the
duplicate mustDecodeHex function and replace its usages in init() with the
existing testdata.MustDecodeHex helper; update all call sites from
mustDecodeHex(...) to testdata.MustDecodeHex(...), delete the now-unused
mustDecodeHex definition, and remove the "encoding/hex" import so the file only
relies on the imported testdata package.
internal/bench/block_bench_test.go (1)

356-358: Consider assigning the VRF result to benchSink to guard against future inlining.

Currently both return values are fully discarded. While Go won't eliminate a complex crypto call today, pinning the result to benchSink is a cheap, forward-safe hedge that makes the intent explicit.

♻️ Proposed change
-			_, _ = vrf.Verify(vd.vrfKey, vd.proof, vd.output, vrfMsg)
+			ok, err := vrf.Verify(vd.vrfKey, vd.proof, vd.output, vrfMsg)
+			benchSink = ok
+			_ = err
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/block_bench_test.go` around lines 356 - 358, Assign the VRF
result to the package-level benchSink to prevent the compiler from inlining away
the call: inside the loop that currently calls vrf.Verify(vd.vrfKey, vd.proof,
vd.output, vrfMsg) and discards both return values, capture at least one
returned value (e.g., the output or error) into benchSink (or set benchSink =
someValueDerivedFrom(the returned value)) so the call is observable; update the
loop using the existing vrf.Verify call site and variables vd.vrfKey, vd.proof,
vd.output, vrfMsg to store the result rather than discarding it.
internal/bench/helpers.go (2)

218-220: GetTestBlocks is a zero-value wrapper — consider removing it.

The function is a pure delegation with no additional logic; block_bench_test.go already calls testdata.GetTestBlocks() directly (line 453), making this wrapper inconsistently used and unnecessary.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/helpers.go` around lines 218 - 220, Remove the redundant
wrapper GetTestBlocks in helpers.go and update call sites to use
testdata.GetTestBlocks() directly; specifically delete the GetTestBlocks
function and replace any references to helpers.GetTestBlocks() (e.g., in
block_bench_test.go) with testdata.GetTestBlocks(), and then run tests/build to
ensure no unused imports or compilation errors remain.

124-143: Three parallel era-switch functions share identical case sets — consolidate into a lookup table.

BlockTypeFromEra, TxTypeFromEra, and blockCborFromEra all switch over the same seven era strings. Adding a new era (e.g., Leios) requires updating all three independently.

♻️ Suggested refactor sketch
+type eraInfo struct {
+    BlockType uint
+    TxType    uint
+    CborFn    func() []byte
+}
+
+var eraRegistry = map[string]eraInfo{
+    "byron":   {ledger.BlockTypeByronMain, ledger.TxTypeByron, func() []byte { return testdata.MustDecodeHex(testdata.ByronBlockHex) }},
+    "shelley": {ledger.BlockTypeShelley, ledger.TxTypeShelley, func() []byte { return testdata.MustDecodeHex(testdata.ShelleyBlockHex) }},
+    // ... remaining eras
+}
+
+func BlockTypeFromEra(era string) (uint, error) {
+    info, ok := eraRegistry[strings.ToLower(era)]
+    if !ok {
+        return 0, fmt.Errorf("unknown era: %s", era)
+    }
+    return info.BlockType, nil
+}

Also applies to: 147-166, 196-215

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/helpers.go` around lines 124 - 143, Multiple functions
(BlockTypeFromEra, TxTypeFromEra, blockCborFromEra) duplicate the same era
switch; replace the repeated switches with a single shared lookup map keyed by
strings.ToLower(era) (e.g., var eraToBlockType map[string]uint / eraToTxType
map[string]uint / eraToCbor map[string]SomeType or a single composite map)
defined at package scope, then have each function simply lookup the era, return
the mapped value or the same fmt.Errorf("unknown era: %s", era) on miss; update
all three functions to use the shared map(s) so adding a new era requires a
single map change and preserve current error behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/bench/helpers.go`:
- Line 93: The local variable named `cbor` in LoadBlockFixture shadows the
imported package `cbor`; rename the local variable (e.g., to `cborBytes` or
`blockCbor`) wherever it is assigned from blockCborFromEra and where it is
subsequently used inside LoadBlockFixture to avoid shadowing; update all usages
in that function to the new name and keep the call to blockCborFromEra
unchanged.

In `@internal/bench/profile_test.go`:
- Around line 462-475: The comment incorrectly states that NewBlockFromCbor
validates the body hash; update the comment to accurately describe what's being
measured: decoding via ledger.NewBlockFromCbor followed by accessing
decoded.BlockBodyHash() (i.e., compute/access the body hash but not validate
it), and note that full body-hash validation is performed by ledger.VerifyBlock
if you want to profile that path; keep the loop and calls to NewBlockFromCbor,
decoded.BlockBodyHash(), and profileIterations unchanged.
- Around line 462-475: Update the misleading comment: replace the sentence
claiming "NewBlockFromCbor with default config validates body hash" with a
concise note that NewBlockFromCbor only decodes CBOR and does not validate the
body hash, and clarify that body hash validation is performed by
ledger.VerifyBlock; mention that the loop over profileIterations is profiling
decode + accessing BlockBodyHash() rather than verifying the hash so readers
understand what is measured.

---

Duplicate comments:
In @.github/workflows/benchmark.yml:
- Around line 29-36: The workflow's go test package list omits the new benchmark
packages so CI never runs the new benchmarks; update the go test invocation in
the benchmark job (the multi-line command that starts with "go test -bench=.
-benchmem -count=5 -timeout=30m") to include ./internal/bench/... alongside the
other packages (so that benchmarks such as BenchmarkBlockValidation,
BenchmarkCBORDecodeBlock, BenchmarkNativeScriptEvaluate, BenchmarkTxDecode are
executed), then keep piping output to tee as before.
- Around line 29-36: The CI benchmark job is missing the package that contains
the new benchmarks; add the missing package glob ./internal/bench/... to the
list of packages passed to the go test command (alongside ./vrf/..., ./kes/...,
etc.) so the new benchmarks (e.g., BenchmarkBlockValidation,
BenchmarkCBORDecodeBlock, BenchmarkNativeScriptEvaluate, BenchmarkTxDecode) are
executed and recorded in benchmark.txt.

In `@internal/bench/block_bench_test.go`:
- Around line 374-381: The uint64 underflow guard in the test (the check
comparing currentKesPeriod and kd.kesPeriod that calls b.Skipf and returns) is
correct; no code change required—keep the existing conditional that skips when
currentKesPeriod < kd.kesPeriod (variables currentKesPeriod and kd.kesPeriod) as
implemented.

In `@internal/bench/PROFILING.md`:
- Around line 57-82: Update the example commands in PROFILING.md to point at
actual packages and benchmarks: replace occurrences of "./pipeline/..." with
"./internal/bench/..." and swap the non-existent benchmark names
(BenchmarkBlockPipeline, BenchmarkParallelVerify) for real benchmarks such as
BenchmarkCBORDecodeBlockParallel (use this for the parallel/mutex examples) and
the corresponding block-profile benchmark (e.g., BenchmarkCBORDecodeBlock or
another existing block-related benchmark in internal/bench) so the examples run
without "no packages loaded" or "no matching benchmarks" errors; locate and
change the three example blocks that reference
BenchmarkBlockPipeline/BenchmarkParallelVerify and ./pipeline/... accordingly.
- Around line 57-82: The examples reference non-existent packages/benchmarks
(BenchmarkBlockPipeline, BenchmarkParallelVerify and ./pipeline/..././vrf...),
so update the three commands to use either real packages/benchmarks or a generic
safe form; replace ./pipeline/... and ./vrf/... with ./... (or specific existing
package directories discovered via fd/rg) and swap the explicit
BenchmarkBlockPipeline/BenchmarkParallelVerify names for either an actual
benchmark found in the repo or a generic pattern such as -bench=. to ensure the
commands run without "no packages loaded" or "no matching benchmarks" errors;
update all occurrences in the Block Profile, Mutex Profile and Trace examples
accordingly (look for the symbols BenchmarkBlockPipeline,
BenchmarkParallelVerify, ./pipeline/... and ./vrf... to locate the lines to
change).

In `@internal/bench/regression_test.go`:
- Around line 91-101: The init() currently panics if testdata.GetTestBlocks()
lacks the "Conway" block; move that lookup into the existing cborBlockDecodeOnce
so the lookup is done lazily only when cborBlockDecodeOnce runs. Remove the
panic from init(), and inside the cborBlockDecodeOnce.Do(...) body perform the
loop over testdata.GetTestBlocks(), set conwayBlockCbor when Name == "Conway",
and handle a missing block by returning/setting an error or nil only for the
CBOR regression path (no package-wide panic). Ensure the lookup is protected by
the same sync.Once (cborBlockDecodeOnce) and update any callers to handle a
missing conwayBlockCbor gracefully.

---

Nitpick comments:
In `@internal/bench/block_bench_test.go`:
- Around line 356-358: Assign the VRF result to the package-level benchSink to
prevent the compiler from inlining away the call: inside the loop that currently
calls vrf.Verify(vd.vrfKey, vd.proof, vd.output, vrfMsg) and discards both
return values, capture at least one returned value (e.g., the output or error)
into benchSink (or set benchSink = someValueDerivedFrom(the returned value)) so
the call is observable; update the loop using the existing vrf.Verify call site
and variables vd.vrfKey, vd.proof, vd.output, vrfMsg to store the result rather
than discarding it.

In `@internal/bench/cbor_bench_test.go`:
- Around line 109-115: Remove the duplicate mustDecodeHex function and replace
its usages in init() with the existing testdata.MustDecodeHex helper; update all
call sites from mustDecodeHex(...) to testdata.MustDecodeHex(...), delete the
now-unused mustDecodeHex definition, and remove the "encoding/hex" import so the
file only relies on the imported testdata package.

In `@internal/bench/helpers.go`:
- Around line 218-220: Remove the redundant wrapper GetTestBlocks in helpers.go
and update call sites to use testdata.GetTestBlocks() directly; specifically
delete the GetTestBlocks function and replace any references to
helpers.GetTestBlocks() (e.g., in block_bench_test.go) with
testdata.GetTestBlocks(), and then run tests/build to ensure no unused imports
or compilation errors remain.
- Around line 124-143: Multiple functions (BlockTypeFromEra, TxTypeFromEra,
blockCborFromEra) duplicate the same era switch; replace the repeated switches
with a single shared lookup map keyed by strings.ToLower(era) (e.g., var
eraToBlockType map[string]uint / eraToTxType map[string]uint / eraToCbor
map[string]SomeType or a single composite map) defined at package scope, then
have each function simply lookup the era, return the mapped value or the same
fmt.Errorf("unknown era: %s", era) on miss; update all three functions to use
the shared map(s) so adding a new era requires a single map change and preserve
current error behavior.

In `@internal/bench/profile_test.go`:
- Around line 71-199: Replace unexpected-error t.Fatalf usage in
TestProfileBlockValidation setup with testify's require.NoError to follow
project test guidelines: in the pre-decode loop where ledger.NewBlockFromCbor is
called (and any other setup spots flagged), call require.NoError(t, err, "failed
to pre-decode %s block", block.Name) (or equivalent) and import
"github.com/stretchr/testify/require" at the top of the test file; keep the
existing t.Logf calls for expected errors in VerifyBlock hot loops untouched.
- Around line 71-199: The test uses t.Fatalf for setup failures (e.g., in
TestProfileBlockValidation when pre-decoding blocks via ledger.NewBlockFromCbor)
which should be replaced with testify's require.NoError to follow project
testing conventions; update the pre-decode error check to use require.NoError(t,
err, "failed to pre-decode %s block", block.Name) and similarly replace other
t.Fatalf occurrences in this file (search for t.Fatalf and setup failure checks
around NewBlockFromCbor, any test setup sections, and the various TestProfile*
blocks) with require.NoError assertions, ensuring you import
github.com/stretchr/testify/require at the top of the test file.

In `@scripts/profile.sh`:
- Around line 183-207: The post-run help prints both CPU and memory profile
paths even when PROFILE_TYPE is set to "cpu" or "mem"; update the echo logic to
conditionally print only the relevant profile(s) based on the PROFILE_TYPE
variable: when PROFILE_TYPE == "cpu" print only cpu_${COMPONENT}.prof, when
"mem" print only mem_${COMPONENT}.prof, and when "all" or empty keep printing
both. Change both the interactive example block (uses go tool pprof -http...
$OUTPUT_DIR/cpu_${COMPONENT}.prof / mem_${COMPONENT}.prof) and the text-based
analysis block (uses go tool pprof -top ...) to follow this same conditional
branching, referencing the existing COMPONENT, PROFILE_TYPE and OUTPUT_DIR
variables.
- Around line 183-207: The post-run help always prints both CPU and memory
profile paths even when PROFILE_TYPE is set to "cpu" or "mem"; update the echo
blocks that print manual and text-based analysis instructions to respect
PROFILE_TYPE: when PROFILE_TYPE == "cpu" only print cpu_${COMPONENT}.prof (using
OUTPUT_DIR), when PROFILE_TYPE == "mem" only print mem_${COMPONENT}.prof, and
when PROFILE_TYPE == "all" keep the existing dual-path outputs (handle both the
"all" branch and the else branch logic for the text-based analysis block as well
as the earlier manual-analysis block).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between caf22d1 and 4d70dd3.

📒 Files selected for processing (15)
  • .github/workflows/benchmark.yml
  • internal/bench/BASELINES.md
  • internal/bench/PROFILING.md
  • internal/bench/README.md
  • internal/bench/block_bench_test.go
  • internal/bench/cbor_bench_test.go
  • internal/bench/consensus_bench_test.go
  • internal/bench/helpers.go
  • internal/bench/helpers_test.go
  • internal/bench/profile_test.go
  • internal/bench/regression_test.go
  • internal/bench/script_bench_test.go
  • internal/bench/testdata/README.md
  • internal/bench/tx_bench_test.go
  • scripts/profile.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/bench/testdata/README.md
  • internal/bench/consensus_bench_test.go

@wolf31o2 wolf31o2 force-pushed the feat/memory-profiling branch 2 times, most recently from 3c78cf9 to adf1591 Compare February 26, 2026 23:12
@wolf31o2
Copy link
Member Author

@cubic-dev-ai review

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Feb 26, 2026

@cubic-dev-ai review

@wolf31o2 I have started the AI code review. It will take a few minutes to complete.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 issues found across 15 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/bench/PROFILING.md">

<violation number="1" location="internal/bench/PROFILING.md:19">
P2: Benchmark profiling commands should disable normal tests (`-run=^$`) so profiles capture benchmark workload only.</violation>
</file>

<file name=".github/workflows/benchmark.yml">

<violation number="1" location=".github/workflows/benchmark.yml:29">
P2: This benchmark job also runs normal tests because `-run` is not set. Restrict test execution (`-run=^$`) so only benchmark code is measured.</violation>
</file>

<file name="internal/bench/script_bench_test.go">

<violation number="1" location="internal/bench/script_bench_test.go:546">
P2: The native-script "Encode" benchmark is measuring cached-byte return, not real CBOR serialization, because the script already carries stored CBOR bytes.</violation>
</file>

<file name="internal/bench/BASELINES.md">

<violation number="1" location="internal/bench/BASELINES.md:185">
P2: This line documents CI behavior that conflicts with current project docs: benchmark CI is described as failing PRs on threshold regressions, but the benchmark README says no automatic CI threshold checking exists. Align this section to describe these as aspirational/manual thresholds until enforcement is implemented.</violation>
</file>

<file name="scripts/profile.sh">

<violation number="1" location="scripts/profile.sh:168">
P2: Errors are swallowed when profiling `all` components, so the script can report success even when profiling commands fail.</violation>
</file>

<file name="internal/bench/block_bench_test.go">

<violation number="1" location="internal/bench/block_bench_test.go:67">
P2: Setup-time encode/type failures are silently skipped, which can hide missing-era benchmark coverage instead of failing fast.</violation>

<violation number="2" location="internal/bench/block_bench_test.go:213">
P2: KES verification result is ignored, so failed verification (`false, nil`) is treated as success in the validation benchmark.</violation>

<violation number="3" location="internal/bench/block_bench_test.go:266">
P2: Pre-parsed validation benchmark also ignores the boolean KES verification result, allowing silent verification failures.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.


```bash
# Generate CPU profile
go test -bench=BenchmarkBlockValidation -cpuprofile=cpu.prof ./internal/bench/...
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Benchmark profiling commands should disable normal tests (-run=^$) so profiles capture benchmark workload only.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/PROFILING.md, line 19:

<comment>Benchmark profiling commands should disable normal tests (`-run=^$`) so profiles capture benchmark workload only.</comment>

<file context>
@@ -0,0 +1,399 @@
+
+```bash
+# Generate CPU profile
+go test -bench=BenchmarkBlockValidation -cpuprofile=cpu.prof ./internal/bench/...
+
+# Generate memory profile
</file context>
Fix with Cubic

- name: Run benchmarks
run: |
set -o pipefail
go test -bench=. -benchmem -count=5 -timeout=30m \
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: This benchmark job also runs normal tests because -run is not set. Restrict test execution (-run=^$) so only benchmark code is measured.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/benchmark.yml, line 29:

<comment>This benchmark job also runs normal tests because `-run` is not set. Restrict test execution (`-run=^$`) so only benchmark code is measured.</comment>

<file context>
@@ -0,0 +1,44 @@
+      - name: Run benchmarks
+        run: |
+          set -o pipefail
+          go test -bench=. -benchmem -count=5 -timeout=30m \
+            ./vrf/... \
+            ./kes/... \
</file context>
Fix with Cubic

b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, err := cbor.Encode(&script)
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The native-script "Encode" benchmark is measuring cached-byte return, not real CBOR serialization, because the script already carries stored CBOR bytes.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/script_bench_test.go, line 546:

<comment>The native-script "Encode" benchmark is measuring cached-byte return, not real CBOR serialization, because the script already carries stored CBOR bytes.</comment>

<file context>
@@ -0,0 +1,598 @@
+		b.ReportAllocs()
+		b.ResetTimer()
+		for i := 0; i < b.N; i++ {
+			_, err := cbor.Encode(&script)
+			if err != nil {
+				b.Fatalf("failed to encode: %v", err)
</file context>
Fix with Cubic


### CI Failure Criteria

The benchmark CI workflow fails a PR if any of these thresholds are exceeded:
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: This line documents CI behavior that conflicts with current project docs: benchmark CI is described as failing PRs on threshold regressions, but the benchmark README says no automatic CI threshold checking exists. Align this section to describe these as aspirational/manual thresholds until enforcement is implemented.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/BASELINES.md, line 185:

<comment>This line documents CI behavior that conflicts with current project docs: benchmark CI is described as failing PRs on threshold regressions, but the benchmark README says no automatic CI threshold checking exists. Align this section to describe these as aspirational/manual thresholds until enforcement is implemented.</comment>

<file context>
@@ -0,0 +1,231 @@
+
+### CI Failure Criteria
+
+The benchmark CI workflow fails a PR if any of these thresholds are exceeded:
+
+| Metric | Threshold | Rationale |
</file context>
Fix with Cubic

if [[ "$COMPONENT" == "all" ]]; then
COMPONENTS=(block cbor vrf consensus bodyhash script)
for comp in "${COMPONENTS[@]}"; do
run_profile "$comp" || true
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Errors are swallowed when profiling all components, so the script can report success even when profiling commands fail.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At scripts/profile.sh, line 168:

<comment>Errors are swallowed when profiling `all` components, so the script can report success even when profiling commands fail.</comment>

<file context>
@@ -0,0 +1,231 @@
+if [[ "$COMPONENT" == "all" ]]; then
+    COMPONENTS=(block cbor vrf consensus bodyhash script)
+    for comp in "${COMPONENTS[@]}"; do
+        run_profile "$comp" || true
+    done
+
</file context>
Fix with Cubic

case *shelley.ShelleyBlockHeader:
bodyCbor, err = cbor.Encode(h.Body)
if err != nil {
continue
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Setup-time encode/type failures are silently skipped, which can hide missing-era benchmark coverage instead of failing fast.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/block_bench_test.go, line 67:

<comment>Setup-time encode/type failures are silently skipped, which can hide missing-era benchmark coverage instead of failing fast.</comment>

<file context>
@@ -0,0 +1,578 @@
+		case *shelley.ShelleyBlockHeader:
+			bodyCbor, err = cbor.Encode(h.Body)
+			if err != nil {
+				continue
+			}
+			sig = h.Signature
</file context>
Fix with Cubic

b.ResetTimer()
for i := 0; i < b.N; i++ {
// KES verification
_, err := ledger.VerifyKes(
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Pre-parsed validation benchmark also ignores the boolean KES verification result, allowing silent verification failures.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/block_bench_test.go, line 266:

<comment>Pre-parsed validation benchmark also ignores the boolean KES verification result, allowing silent verification failures.</comment>

<file context>
@@ -0,0 +1,578 @@
+			b.ResetTimer()
+			for i := 0; i < b.N; i++ {
+				// KES verification
+				_, err := ledger.VerifyKes(
+					bd.block.Header(),
+					defaultSlotsPerKesPeriod,
</file context>
Fix with Cubic

}

// KES verification
_, err = ledger.VerifyKes(
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: KES verification result is ignored, so failed verification (false, nil) is treated as success in the validation benchmark.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/block_bench_test.go, line 213:

<comment>KES verification result is ignored, so failed verification (`false, nil`) is treated as success in the validation benchmark.</comment>

<file context>
@@ -0,0 +1,578 @@
+				}
+
+				// KES verification
+				_, err = ledger.VerifyKes(
+					block.Header(),
+					defaultSlotsPerKesPeriod,
</file context>
Fix with Cubic

@wolf31o2 wolf31o2 force-pushed the feat/memory-profiling branch from adf1591 to e663c62 Compare February 27, 2026 23:37
@wolf31o2
Copy link
Member Author

@cubic-dev-ai review

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Feb 27, 2026

@cubic-dev-ai review

@wolf31o2 I have started the AI code review. It will take a few minutes to complete.

@wolf31o2
Copy link
Member Author

/benchmark

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
internal/bench/BASELINES.md (1)

11-11: ⚠️ Potential issue | 🟡 Minor

Regression-enforcement statement is inconsistent within the same document.

Line 11 says CI currently fails on threshold breaches, but Lines 185-187 say thresholds are aspirational and not CI-enforced yet. Please align the overview wording to current behavior to avoid confusion.

🛠️ Suggested doc fix
-1. **Regression Detection**: CI fails if allocation count increases by >20%, time increases by >20%, or bytes allocated increases by >50% (see [Regression Thresholds](`#regression-thresholds`))
+1. **Regression Detection**: Thresholds guide manual review today and are intended for future CI enforcement (see [Regression Thresholds](`#regression-thresholds`))

Also applies to: 185-187

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/BASELINES.md` at line 11, Update the inconsistent enforcement
language so the "Regression Detection" overview and the later "Regression
Thresholds" section agree: either change the sentence under the "Regression
Detection" heading that currently reads "CI fails if allocation count increases
by >20%..." to state these thresholds are aspirational/not CI-enforced yet, or
conversely change the wording in the paragraphs around lines 185-187 to say the
thresholds are enforced by CI. Edit the Markdown under the "Regression
Detection" heading and the "Regression Thresholds" section so both describe the
same current behavior and use consistent phrasing (refer to the "Regression
Detection" heading and the "Regression Thresholds" anchor when locating the
text).
🧹 Nitpick comments (3)
internal/bench/script_bench_test.go (1)

49-154: Consider consolidating repeated script-wrap logic into one helper.

createPubkeyScript, createAllScript, createAnyScript, createNofKScript, and both timelock helpers all repeat the same encode→unmarshal→panic flow. A shared helper would reduce drift risk.

♻️ Refactor sketch
+func mustWrapNativeScript(v any, label string) common.NativeScript {
+	data, err := cbor.Encode(v)
+	if err != nil {
+		panic("failed to encode " + label + " script: " + err.Error())
+	}
+	var ns common.NativeScript
+	if err := ns.UnmarshalCBOR(data); err != nil {
+		panic("failed to unmarshal " + label + " script: " + err.Error())
+	}
+	return ns
+}
+
 func createPubkeyScript(keyHash common.Blake2b224) common.NativeScript {
 	script := common.NativeScriptPubkey{
 		Type: 0,
 		Hash: keyHash[:],
 	}
-	data, err := cbor.Encode(&script)
-	if err != nil {
-		panic("failed to encode pubkey script: " + err.Error())
-	}
-	var ns common.NativeScript
-	if err := ns.UnmarshalCBOR(data); err != nil {
-		panic("failed to unmarshal pubkey script: " + err.Error())
-	}
-	return ns
+	return mustWrapNativeScript(&script, "pubkey")
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/script_bench_test.go` around lines 49 - 154, Multiple factory
funcs (createPubkeyScript, createAllScript, createAnyScript, createNofKScript,
createInvalidBeforeScript, createInvalidHereafterScript) duplicate the same
encode→unmarshal→panic pattern; introduce a single helper (e.g.,
wrapNativeScript(script interface{}) common.NativeScript) that takes the
concrete CBOR struct, calls cbor.Encode, checks error, unmarshals into a
common.NativeScript via ns.UnmarshalCBOR, and returns ns (preserving the same
panic-on-error behavior and clear messages), then replace each factory to build
the specific struct and call wrapNativeScript instead to remove repetition and
centralize error handling.
internal/bench/tx_bench_test.go (1)

27-281: Era/fixture setup boilerplate is heavily repeated across benchmarks.

The same PostByronEraNames() + MustLoadBlockFixture() + first-tx extraction pattern is duplicated many times. A shared helper would reduce maintenance overhead and keep per-benchmark logic focused.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/tx_bench_test.go` around lines 27 - 281, The benchmarks
duplicate era/fixture setup (PostByronEraNames, MustLoadBlockFixture,
fixture.Block.Transactions, selecting txs[0], txCbor) across many Benchmark*
functions; factor this into a small helper (e.g., iterateFixtureTxs or
forEachEraFixtureFirstTx) that iterates PostByronEraNames(), calls
MustLoadBlockFixture(era, "default"), skips when txs empty, and invokes a
callback with era and the first tx (or txCbor when needed); then update each
BenchmarkTxDecode, BenchmarkTxHash, BenchmarkTxInputs, BenchmarkTxOutputs,
BenchmarkTxFee, BenchmarkTxMetadata, BenchmarkTxCbor, BenchmarkTxUtxorpc,
BenchmarkTxProducedUtxos, BenchmarkTxConsumedUtxos, and
BenchmarkTxBlockIteration to call that helper and run per-era b.Run bodies,
keeping benchSink and pre-validation logic inside the callback.
internal/bench/helpers.go (1)

160-184: Consider deriving era lists from the eraLookup map.

EraNames() and PostByronEraNames() return hardcoded slices while eraLookup is the source of truth for era information. If a new era is added to eraLookup, these functions would need manual updates. For benchmark code this is acceptable, but consider adding a comment noting the intentional ordering or deriving from the map if order doesn't matter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/helpers.go` around lines 160 - 184, Replace the hardcoded era
slices in EraNames() and PostByronEraNames() by deriving the lists from the
canonical eraLookup map (or if ordering is important, add a clarifying comment);
specifically, iterate over eraLookup to build EraNames() (and filter out "byron"
for PostByronEraNames()), or add a comment above EraNames() and
PostByronEraNames() stating that the ordering is intentional and must be updated
alongside eraLookup when new eras are added so the source-of-truth (eraLookup)
remains clear.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/bench/block_bench_test.go`:
- Around line 372-373: The benchmark currently discards the error returned by
vrf.Verify; change the call to capture both return values (valid, err :=
vrf.Verify(...)) and handle err instead of ignoring it: if err != nil then fail
the benchmark/run (e.g., b.Fatalf or t.Fatalf) or log the error and stop,
otherwise assign benchSink = valid. Update the invocation that uses vrf.Verify
with vd.vrfKey, vd.proof, vd.output, vrfMsg and ensure benchSink remains used to
avoid compiler optimizations.
- Around line 566-573: The benchmark currently ignores the results of
ledger.VerifyKesComponents causing failures to be suppressed; in
BenchmarkVerifyKesComponents capture the two return values from
ledger.VerifyKesComponents (the verification bool and error), and if the error
is non-nil or the verification bool is false call the benchmark's failure helper
(e.g., b.Fatalf/b.Fatal/b.Fatalf with a descriptive message including the error
and relevant inputs like kd.bodyCbor, kd.signature, kd.hotVkey, kd.kesPeriod,
kd.slot) so failing verification paths are surfaced during the benchmark run.

---

Duplicate comments:
In `@internal/bench/BASELINES.md`:
- Line 11: Update the inconsistent enforcement language so the "Regression
Detection" overview and the later "Regression Thresholds" section agree: either
change the sentence under the "Regression Detection" heading that currently
reads "CI fails if allocation count increases by >20%..." to state these
thresholds are aspirational/not CI-enforced yet, or conversely change the
wording in the paragraphs around lines 185-187 to say the thresholds are
enforced by CI. Edit the Markdown under the "Regression Detection" heading and
the "Regression Thresholds" section so both describe the same current behavior
and use consistent phrasing (refer to the "Regression Detection" heading and the
"Regression Thresholds" anchor when locating the text).

---

Nitpick comments:
In `@internal/bench/helpers.go`:
- Around line 160-184: Replace the hardcoded era slices in EraNames() and
PostByronEraNames() by deriving the lists from the canonical eraLookup map (or
if ordering is important, add a clarifying comment); specifically, iterate over
eraLookup to build EraNames() (and filter out "byron" for PostByronEraNames()),
or add a comment above EraNames() and PostByronEraNames() stating that the
ordering is intentional and must be updated alongside eraLookup when new eras
are added so the source-of-truth (eraLookup) remains clear.

In `@internal/bench/script_bench_test.go`:
- Around line 49-154: Multiple factory funcs (createPubkeyScript,
createAllScript, createAnyScript, createNofKScript, createInvalidBeforeScript,
createInvalidHereafterScript) duplicate the same encode→unmarshal→panic pattern;
introduce a single helper (e.g., wrapNativeScript(script interface{})
common.NativeScript) that takes the concrete CBOR struct, calls cbor.Encode,
checks error, unmarshals into a common.NativeScript via ns.UnmarshalCBOR, and
returns ns (preserving the same panic-on-error behavior and clear messages),
then replace each factory to build the specific struct and call wrapNativeScript
instead to remove repetition and centralize error handling.

In `@internal/bench/tx_bench_test.go`:
- Around line 27-281: The benchmarks duplicate era/fixture setup
(PostByronEraNames, MustLoadBlockFixture, fixture.Block.Transactions, selecting
txs[0], txCbor) across many Benchmark* functions; factor this into a small
helper (e.g., iterateFixtureTxs or forEachEraFixtureFirstTx) that iterates
PostByronEraNames(), calls MustLoadBlockFixture(era, "default"), skips when txs
empty, and invokes a callback with era and the first tx (or txCbor when needed);
then update each BenchmarkTxDecode, BenchmarkTxHash, BenchmarkTxInputs,
BenchmarkTxOutputs, BenchmarkTxFee, BenchmarkTxMetadata, BenchmarkTxCbor,
BenchmarkTxUtxorpc, BenchmarkTxProducedUtxos, BenchmarkTxConsumedUtxos, and
BenchmarkTxBlockIteration to call that helper and run per-era b.Run bodies,
keeping benchSink and pre-validation logic inside the callback.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d70dd3 and e663c62.

📒 Files selected for processing (15)
  • .github/workflows/benchmark.yml
  • internal/bench/BASELINES.md
  • internal/bench/PROFILING.md
  • internal/bench/README.md
  • internal/bench/block_bench_test.go
  • internal/bench/cbor_bench_test.go
  • internal/bench/consensus_bench_test.go
  • internal/bench/helpers.go
  • internal/bench/helpers_test.go
  • internal/bench/profile_test.go
  • internal/bench/regression_test.go
  • internal/bench/script_bench_test.go
  • internal/bench/testdata/README.md
  • internal/bench/tx_bench_test.go
  • scripts/profile.sh
🚧 Files skipped from review as they are similar to previous changes (3)
  • internal/bench/profile_test.go
  • internal/bench/testdata/README.md
  • .github/workflows/benchmark.yml

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 issues found across 15 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/bench/profile_test.go">

<violation number="1" location="internal/bench/profile_test.go:466">
P2: `TestProfileBodyHash` profiles a hash-field accessor instead of actual body-hash validation, so the generated profile is not measuring the intended code path.</violation>
</file>

<file name="internal/bench/consensus_bench_test.go">

<violation number="1" location="internal/bench/consensus_bench_test.go:316">
P2: The parallel threshold benchmark includes `atomic.Value.Store` in the hot loop, which materially distorts the measured performance of `CertifiedNatThreshold`.</violation>
</file>

<file name="internal/bench/block_bench_test.go">

<violation number="1" location="internal/bench/block_bench_test.go:65">
P2: Era-specific benchmark switches are missing Leios support and silently skip that era, causing incomplete benchmark coverage.</violation>

<violation number="2" location="internal/bench/block_bench_test.go:372">
P2: Handle the error from `vrf.Verify` so the benchmark fails if verification errors occur and avoids measuring an unintended error path.</violation>

<violation number="3" location="internal/bench/block_bench_test.go:566">
P2: Check the `VerifyKesComponents` result and error to ensure the benchmark measures the success path and fails on verification errors.</violation>
</file>

<file name="internal/bench/BASELINES.md">

<violation number="1" location="internal/bench/BASELINES.md:11">
P2: The baseline doc claims CI enforces regression thresholds, but later sections (and benchmark README) state CI does not currently enforce them. This inconsistency can cause incorrect expectations about benchmark gate behavior.</violation>
</file>

<file name="internal/bench/helpers.go">

<violation number="1" location="internal/bench/helpers.go:162">
P3: Era names are hardcoded in multiple places (`eraLookup`, `EraNames`, `PostByronEraNames`), which increases maintenance risk and can cause future inconsistencies.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

b.RunParallel(func(pb *testing.PB) {
// Each goroutine gets its own Rat to avoid data races.
activeSlotCoeff := big.NewRat(1, 20)
for pb.Next() {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The parallel threshold benchmark includes atomic.Value.Store in the hot loop, which materially distorts the measured performance of CertifiedNatThreshold.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/consensus_bench_test.go, line 316:

<comment>The parallel threshold benchmark includes `atomic.Value.Store` in the hot loop, which materially distorts the measured performance of `CertifiedNatThreshold`.</comment>

<file context>
@@ -0,0 +1,326 @@
+	b.RunParallel(func(pb *testing.PB) {
+		// Each goroutine gets its own Rat to avoid data races.
+		activeSlotCoeff := big.NewRat(1, 20)
+		for pb.Next() {
+			consensusParallelSink.Store(
+				consensus.CertifiedNatThreshold(
</file context>
Fix with Cubic

b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, _ = ledger.VerifyKesComponents(
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Check the VerifyKesComponents result and error to ensure the benchmark measures the success path and fails on verification errors.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/block_bench_test.go, line 566:

<comment>Check the `VerifyKesComponents` result and error to ensure the benchmark measures the success path and fails on verification errors.</comment>

<file context>
@@ -0,0 +1,599 @@
+			b.ReportAllocs()
+			b.ResetTimer()
+			for i := 0; i < b.N; i++ {
+				_, _ = ledger.VerifyKesComponents(
+					kd.bodyCbor,
+					kd.signature,
</file context>
Fix with Cubic

b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
valid, _ := vrf.Verify(vd.vrfKey, vd.proof, vd.output, vrfMsg)
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Handle the error from vrf.Verify so the benchmark fails if verification errors occur and avoids measuring an unintended error path.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/block_bench_test.go, line 372:

<comment>Handle the error from `vrf.Verify` so the benchmark fails if verification errors occur and avoids measuring an unintended error path.</comment>

<file context>
@@ -0,0 +1,599 @@
+			b.ReportAllocs()
+			b.ResetTimer()
+			for i := 0; i < b.N; i++ {
+				valid, _ := vrf.Verify(vd.vrfKey, vd.proof, vd.output, vrfMsg)
+				benchSink = valid
+			}
</file context>
Fix with Cubic


This document tracks memory allocation baselines for key validation paths in gouroboros. These baselines serve three purposes:

1. **Regression Detection**: CI fails if allocation count increases by >20%, time increases by >20%, or bytes allocated increases by >50% (see [Regression Thresholds](#regression-thresholds))
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The baseline doc claims CI enforces regression thresholds, but later sections (and benchmark README) state CI does not currently enforce them. This inconsistency can cause incorrect expectations about benchmark gate behavior.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/BASELINES.md, line 11:

<comment>The baseline doc claims CI enforces regression thresholds, but later sections (and benchmark README) state CI does not currently enforce them. This inconsistency can cause incorrect expectations about benchmark gate behavior.</comment>

<file context>
@@ -0,0 +1,233 @@
+
+This document tracks memory allocation baselines for key validation paths in gouroboros. These baselines serve three purposes:
+
+1. **Regression Detection**: CI fails if allocation count increases by >20%, time increases by >20%, or bytes allocated increases by >50% (see [Regression Thresholds](#regression-thresholds))
+2. **Optimization Tracking**: Measure impact of performance improvements
+3. **Contributor Guidance**: Set expectations for new code
</file context>
Fix with Cubic

var kesPeriod uint64
var err error

switch h := header.(type) {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Era-specific benchmark switches are missing Leios support and silently skip that era, causing incomplete benchmark coverage.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/block_bench_test.go, line 65:

<comment>Era-specific benchmark switches are missing Leios support and silently skip that era, causing incomplete benchmark coverage.</comment>

<file context>
@@ -0,0 +1,599 @@
+		var kesPeriod uint64
+		var err error
+
+		switch h := header.(type) {
+		case *shelley.ShelleyBlockHeader:
+			bodyCbor, err = cbor.Encode(h.Body)
</file context>
Fix with Cubic


// EraNames returns the list of supported era names for benchmarking.
func EraNames() []string {
return []string{
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Era names are hardcoded in multiple places (eraLookup, EraNames, PostByronEraNames), which increases maintenance risk and can cause future inconsistencies.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/helpers.go, line 162:

<comment>Era names are hardcoded in multiple places (`eraLookup`, `EraNames`, `PostByronEraNames`), which increases maintenance risk and can cause future inconsistencies.</comment>

<file context>
@@ -0,0 +1,194 @@
+
+// EraNames returns the list of supported era names for benchmarking.
+func EraNames() []string {
+	return []string{
+		"byron",
+		"shelley",
</file context>
Fix with Cubic

@wolf31o2 wolf31o2 force-pushed the feat/memory-profiling branch from e663c62 to a29c785 Compare February 27, 2026 23:55
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
internal/bench/helpers.go (1)

160-186: Single-source supported era names to prevent drift.

EraNames() and PostByronEraNames() duplicate literals already represented in
eraLookup. A shared ordered slice would reduce maintenance mistakes when new
eras are added.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/helpers.go` around lines 160 - 186, Replace the duplicated
literal lists with a single ordered slice (e.g., supportedEraNames) that is the
authoritative ordering used by EraNames and PostByronEraNames; have EraNames
return that slice directly and have PostByronEraNames derive its result by
slicing or filtering that single slice to exclude "byron" (e.g.,
supportedEraNames[1:] or filter out "byron") and ensure the single slice stays
consistent with eraLookup so adding a new era only requires updating one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/bench/block_bench_test.go`:
- Around line 207-245: The benchmark currently double-validates the block body
hash because benchConfig() returns a cfg used by ledger.NewBlockFromCbor (which
does validation when SkipBodyHashValidation is false) and the test then calls
common.ValidateBlockBodyHash again; fix by setting cfg.SkipBodyHashValidation =
true on the cfg returned from benchConfig() before calling
ledger.NewBlockFromCbor in BenchmarkBlockValidation so NewBlockFromCbor skips
internal body-hash validation and only the explicit common.ValidateBlockBodyHash
call is measured.
- Around line 394-415: The benchmark currently discards the bool returned by
kes.VerifySignedKES which can hide failing verification; add a pre-check using
kes.VerifySignedKES(kd.hotVkey, t, kd.bodyCbor, kd.signature) before starting
the timed loop and call b.Fatalf or b.Skip if it returns false, then inside the
loop assign the result to a package-level sink (e.g., benchSink) instead of
discarding it so the compiler cannot optimize the call away; locate the call to
kes.VerifySignedKES in the b.Run("Era_"+kd.name) closure and update it
accordingly.

In `@internal/bench/helpers.go`:
- Around line 190-196: Replace the panic-prone call to testdata.MustDecodeHex in
blockCborFromEra with a non-panicking decode and propagate any decode error via
the function's error return; locate blockCborFromEra and swap MustDecodeHex for
the decode function that returns (bytes, error) (or add one in the testdata
package), check the error and return fmt.Errorf(...) or the decode error
wrapped, so callers like LoadBlockFixture receive an error instead of a panic.

---

Nitpick comments:
In `@internal/bench/helpers.go`:
- Around line 160-186: Replace the duplicated literal lists with a single
ordered slice (e.g., supportedEraNames) that is the authoritative ordering used
by EraNames and PostByronEraNames; have EraNames return that slice directly and
have PostByronEraNames derive its result by slicing or filtering that single
slice to exclude "byron" (e.g., supportedEraNames[1:] or filter out "byron") and
ensure the single slice stays consistent with eraLookup so adding a new era only
requires updating one place.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e663c62 and a29c785.

📒 Files selected for processing (15)
  • .github/workflows/benchmark.yml
  • internal/bench/BASELINES.md
  • internal/bench/PROFILING.md
  • internal/bench/README.md
  • internal/bench/block_bench_test.go
  • internal/bench/cbor_bench_test.go
  • internal/bench/consensus_bench_test.go
  • internal/bench/helpers.go
  • internal/bench/helpers_test.go
  • internal/bench/profile_test.go
  • internal/bench/regression_test.go
  • internal/bench/script_bench_test.go
  • internal/bench/testdata/README.md
  • internal/bench/tx_bench_test.go
  • scripts/profile.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • .github/workflows/benchmark.yml
  • internal/bench/testdata/README.md

Comment on lines +190 to +196
func blockCborFromEra(era string) ([]byte, error) {
info, ok := eraLookup[strings.ToLower(era)]
if !ok {
return nil, fmt.Errorf("unknown era: %s", era)
}
return testdata.MustDecodeHex(info.blockHex), nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, locate and read the testdata.MustDecodeHex implementation
find . -name "*.go" -path "*/testdata/*" | head -20

Repository: blinklabs-io/gouroboros

Length of output: 96


🏁 Script executed:

#!/bin/bash
# Search for MustDecodeHex definition and usage
rg -n "func MustDecodeHex" --type go
rg -n "MustDecodeHex" --type go | head -30

Repository: blinklabs-io/gouroboros

Length of output: 1961


🏁 Script executed:

#!/bin/bash
# Examine the blockCborFromEra function and surrounding context
cat -n internal/bench/helpers.go | sed -n '130,200p'

Repository: blinklabs-io/gouroboros

Length of output: 2598


🏁 Script executed:

#!/bin/bash
# Search for callers of blockCborFromEra to understand error handling expectations
rg -n "blockCborFromEra" --type go

Repository: blinklabs-io/gouroboros

Length of output: 309


🏁 Script executed:

#!/bin/bash
# Check the imports section of internal/bench/helpers.go
cat -n internal/bench/helpers.go | head -30

Repository: blinklabs-io/gouroboros

Length of output: 1328


🏁 Script executed:

#!/bin/bash
# Check if encoding/hex is already imported
rg -n "import|encoding/hex" internal/bench/helpers.go | head -20

Repository: blinklabs-io/gouroboros

Length of output: 78


🏁 Script executed:

#!/bin/bash
# Verify the exact context around blockCborFromEra caller
cat -n internal/bench/helpers.go | sed -n '85,100p'

Repository: blinklabs-io/gouroboros

Length of output: 665


🏁 Script executed:

#!/bin/bash
# Get the full MustDecodeHex implementation to confirm panic behavior
cat -n internal/testdata/blocks.go | sed -n '103,110p'

Repository: blinklabs-io/gouroboros

Length of output: 323


Replace MustDecodeHex panic with proper error handling in error-returning function.

Line 195 uses testdata.MustDecodeHex, which panics on invalid hex, bypassing the error return contract of blockCborFromEra. The caller LoadBlockFixture (line 93) expects error handling via the returned error value, not a panic.

Proposed fix
 import (
+	"encoding/hex"
 	"fmt"
 	"math/big"
 	"strings"
 func blockCborFromEra(era string) ([]byte, error) {
 	info, ok := eraLookup[strings.ToLower(era)]
 	if !ok {
 		return nil, fmt.Errorf("unknown era: %s", era)
 	}
-	return testdata.MustDecodeHex(info.blockHex), nil
+	data, err := hex.DecodeString(info.blockHex)
+	if err != nil {
+		return nil, fmt.Errorf("decode %s fixture hex: %w", era, err)
+	}
+	return data, nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func blockCborFromEra(era string) ([]byte, error) {
info, ok := eraLookup[strings.ToLower(era)]
if !ok {
return nil, fmt.Errorf("unknown era: %s", era)
}
return testdata.MustDecodeHex(info.blockHex), nil
}
func blockCborFromEra(era string) ([]byte, error) {
info, ok := eraLookup[strings.ToLower(era)]
if !ok {
return nil, fmt.Errorf("unknown era: %s", era)
}
data, err := hex.DecodeString(info.blockHex)
if err != nil {
return nil, fmt.Errorf("decode %s fixture hex: %w", era, err)
}
return data, nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/bench/helpers.go` around lines 190 - 196, Replace the panic-prone
call to testdata.MustDecodeHex in blockCborFromEra with a non-panicking decode
and propagate any decode error via the function's error return; locate
blockCborFromEra and swap MustDecodeHex for the decode function that returns
(bytes, error) (or add one in the testdata package), check the error and return
fmt.Errorf(...) or the decode error wrapped, so callers like LoadBlockFixture
receive an error instead of a panic.

Signed-off-by: Chris Gianelloni <wolf31o2@blinklabs.io>
@wolf31o2 wolf31o2 force-pushed the feat/memory-profiling branch from a29c785 to 854b7c2 Compare February 28, 2026 00:13
@wolf31o2
Copy link
Member Author

@cubic-dev-ai review

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Feb 28, 2026

@cubic-dev-ai review

@wolf31o2 I have started the AI code review. It will take a few minutes to complete.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 15 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/bench/README.md">

<violation number="1" location="internal/bench/README.md:437">
P3: The documented CI benchmark package list is incomplete and omits `internal/bench`, which the workflow actually benchmarks.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.


Each run produces a single file, `benchmark.txt`, containing the output of
`go test -bench=. -benchmem -count=5` across the core packages (vrf, kes,
consensus, cbor, pipeline, ledger).
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: The documented CI benchmark package list is incomplete and omits internal/bench, which the workflow actually benchmarks.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/README.md, line 437:

<comment>The documented CI benchmark package list is incomplete and omits `internal/bench`, which the workflow actually benchmarks.</comment>

<file context>
@@ -0,0 +1,474 @@
+
+Each run produces a single file, `benchmark.txt`, containing the output of
+`go test -bench=. -benchmem -count=5` across the core packages (vrf, kes,
+consensus, cbor, pipeline, ledger).
+
+1. **Location**: GitHub Actions > Workflows > Benchmarks > Run > Artifacts
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant