Conversation
📝 WalkthroughWalkthroughAdds a benchmarking and profiling subsystem under internal/bench with extensive Go benchmark suites (blocks, CBOR, transactions, consensus, scripts, VRF/KES), allocation-regression and profile-tagged tests, helpers, fixtures and unit tests, baseline and profiling documentation (BASELINES.md, PROFILING.md, README, testdata README), a CI workflow (.github/workflows/benchmark.yml) to run and upload benchmarks, and a profiling helper script (scripts/profile.sh). Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
9 issues found across 15 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="internal/bench/block_bench_test.go">
<violation number="1" location="internal/bench/block_bench_test.go:317">
P2: Guard against unsigned underflow when computing the KES period offset; if the fixture’s kes period is ahead of the slot-derived period, skip or handle it explicitly instead of wrapping.</violation>
</file>
<file name="internal/bench/regression_test.go">
<violation number="1" location="internal/bench/regression_test.go:254">
P2: Check the VRF verification result inside the allocation loop; otherwise the regression test can pass even when verification fails.</violation>
<violation number="2" location="internal/bench/regression_test.go:264">
P3: Validate the VerifyAndHash error inside the allocation loop so failures don't get masked.</violation>
<violation number="3" location="internal/bench/regression_test.go:274">
P3: Handle the vrf.Prove error during allocation measurement to ensure the test fails on invalid proofs.</violation>
<violation number="4" location="internal/bench/regression_test.go:299">
P3: Assert the KES verification result so allocation measurements only reflect the success path.</violation>
<violation number="5" location="internal/bench/regression_test.go:380">
P2: Fail the test if CBOR decoding returns an error so regression results stay meaningful.</violation>
</file>
<file name="internal/bench/helpers.go">
<violation number="1" location="internal/bench/helpers.go:15">
P3: Several added lines exceed the 80-character limit enforced by golines; wrap comments and long signatures/lines to avoid lint failures.</violation>
</file>
<file name="internal/bench/tx_bench_test.go">
<violation number="1" location="internal/bench/tx_bench_test.go:63">
P2: Benchmark results are discarded, which can let the compiler optimize away the work and skew benchmark results. Store results in a package-level sink variable as recommended in the bench README to prevent dead-code elimination.</violation>
</file>
<file name="internal/bench/profile_test.go">
<violation number="1" location="internal/bench/profile_test.go:99">
P2: VerifyBlock errors are discarded, so the profiling test can silently skip the rest of the validation path when verification fails (VerifyBlock returns early on VRF/KES/body hash errors). Capture and handle the error to avoid profiling the wrong code path.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
|
||
| for _, kd := range kesDataList { | ||
| b.Run("Era_"+kd.name, func(b *testing.B) { | ||
| currentKesPeriod := kd.slot / defaultSlotsPerKesPeriod |
There was a problem hiding this comment.
P2: Guard against unsigned underflow when computing the KES period offset; if the fixture’s kes period is ahead of the slot-derived period, skip or handle it explicitly instead of wrapping.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/block_bench_test.go, line 317:
<comment>Guard against unsigned underflow when computing the KES period offset; if the fixture’s kes period is ahead of the slot-derived period, skip or handle it explicitly instead of wrapping.</comment>
<file context>
@@ -0,0 +1,585 @@
+
+ for _, kd := range kesDataList {
+ b.Run("Era_"+kd.name, func(b *testing.B) {
+ currentKesPeriod := kd.slot / defaultSlotsPerKesPeriod
+ t := currentKesPeriod - kd.kesPeriod
+
</file context>
internal/bench/regression_test.go
Outdated
|
|
||
| t.Run("Verify", func(t *testing.T) { | ||
| allocs := testing.AllocsPerRun(100, func() { | ||
| _, _ = vrf.Verify(vrfPK, vrfProof, vrfOutput, vrfAlpha) |
There was a problem hiding this comment.
P2: Check the VRF verification result inside the allocation loop; otherwise the regression test can pass even when verification fails.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/regression_test.go, line 254:
<comment>Check the VRF verification result inside the allocation loop; otherwise the regression test can pass even when verification fails.</comment>
<file context>
@@ -0,0 +1,392 @@
+
+ t.Run("Verify", func(t *testing.T) {
+ allocs := testing.AllocsPerRun(100, func() {
+ _, _ = vrf.Verify(vrfPK, vrfProof, vrfOutput, vrfAlpha)
+ })
+ limit := 12.0 * multiplier // Baseline: 11 allocs
</file context>
internal/bench/profile_test.go
Outdated
|
|
||
| // VerifyBlock includes VRF/KES verification, body hash, etc. | ||
| // With our config, transaction validation is skipped. | ||
| _, _, _, _, _ = ledger.VerifyBlock(block, eta0Hex, slotsPerKes, config) |
There was a problem hiding this comment.
P2: VerifyBlock errors are discarded, so the profiling test can silently skip the rest of the validation path when verification fails (VerifyBlock returns early on VRF/KES/body hash errors). Capture and handle the error to avoid profiling the wrong code path.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/profile_test.go, line 99:
<comment>VerifyBlock errors are discarded, so the profiling test can silently skip the rest of the validation path when verification fails (VerifyBlock returns early on VRF/KES/body hash errors). Capture and handle the error to avoid profiling the wrong code path.</comment>
<file context>
@@ -0,0 +1,446 @@
+
+ // VerifyBlock includes VRF/KES verification, body hash, etc.
+ // With our config, transaction validation is skipped.
+ _, _, _, _, _ = ledger.VerifyBlock(block, eta0Hex, slotsPerKes, config)
+ }
+ }
</file context>
internal/bench/regression_test.go
Outdated
|
|
||
| t.Run("VerifyAndHash", func(t *testing.T) { | ||
| allocs := testing.AllocsPerRun(100, func() { | ||
| _, _ = vrf.VerifyAndHash(vrfPK, vrfProof, vrfAlpha) |
There was a problem hiding this comment.
P3: Validate the VerifyAndHash error inside the allocation loop so failures don't get masked.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/regression_test.go, line 264:
<comment>Validate the VerifyAndHash error inside the allocation loop so failures don't get masked.</comment>
<file context>
@@ -0,0 +1,392 @@
+
+ t.Run("VerifyAndHash", func(t *testing.T) {
+ allocs := testing.AllocsPerRun(100, func() {
+ _, _ = vrf.VerifyAndHash(vrfPK, vrfProof, vrfAlpha)
+ })
+ limit := 12.0 * multiplier // Baseline: 11 allocs
</file context>
| @@ -0,0 +1,199 @@ | |||
| // Copyright 2026 Blink Labs Software | |||
There was a problem hiding this comment.
P3: Several added lines exceed the 80-character limit enforced by golines; wrap comments and long signatures/lines to avoid lint failures.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/helpers.go, line 15:
<comment>Several added lines exceed the 80-character limit enforced by golines; wrap comments and long signatures/lines to avoid lint failures.</comment>
<file context>
@@ -0,0 +1,199 @@
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// Package bench provides benchmark utilities and test fixtures for memory profiling.
+package bench
+
</file context>
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Fix all issues with AI agents
In @.github/workflows/benchmark.yml:
- Around line 43-58: The base benchmark step currently checks out BASE_SHA and
runs go test against a fixed list of package paths (e.g., ./vrf/...,
./pipeline/..., ./ledger/...) which will fail if some of those paths don't exist
on the base branch; after checking out BASE_SHA, detect which of the listed
package paths actually exist (for example by running git ls-tree or test -d / go
list on each path) and build the go test argument list from only those existing
paths, then run go test -bench... -benchmem ... and tee to benchmark-base.txt;
alternatively, make the go test invocation tolerant of missing packages by
filtering them out or skipping the step failure for missing-path errors before
checking out HEAD_SHA.
In `@internal/bench/BASELINES.md`:
- Around line 186-192: The time regression threshold is inconsistent between
BASELINES.md ("Time (ns/op) | +50%") and README.md ("Time increases by more than
20%"); pick a canonical threshold and make them match: update the "Time (ns/op)"
cell in BASELINES.md and the corresponding sentence in README.md to the chosen
value, and also update the benchmark/CI configuration that enforces the
threshold (the bench threshold variable or workflow step that checks time
regressions) so the docs and CI are consistent. Ensure the updated value is used
everywhere and adjust any explanatory text to match.
In `@internal/bench/block_bench_test.go`:
- Around line 315-318: The subtraction currentKesPeriod - kd.kesPeriod can
underflow (both are uint64); inside the b.Run closure that iterates kesDataList
(where kd, currentKesPeriod and defaultSlotsPerKesPeriod are defined) add a
guard: if currentKesPeriod < kd.kesPeriod then handle that case (e.g. set t to
0, skip the subtest, or convert to a signed type before subtracting) otherwise
perform t := currentKesPeriod - kd.kesPeriod; update any downstream logic that
uses t accordingly to avoid relying on a wrapped large value.
In `@internal/bench/cbor_bench_test.go`:
- Around line 113-119: Replace the unsafe b.Fatalf call inside the goroutines
spawned by b.RunParallel: capture the first decode error from inside the
parallel loop into a shared container (e.g., an error variable set once via
sync/atomic.Value or a buffered chan error) instead of calling b.Fatalf
directly, then after b.RunParallel returns check that container and call
b.Fatalf from the benchmark goroutine if an error was recorded; apply the same
change for both the block decode loop (where NewBlockFromCbor is called) and the
BenchmarkCBORDecodeTxParallel test.
In `@internal/bench/consensus_bench_test.go`:
- Line 1: Update the file header year to match the rest of the PR: change the
copyright year in internal/bench/consensus_bench_test.go from 2025 to 2026 so it
is consistent with other new files.
In `@internal/bench/README.md`:
- Around line 122-130: The README references a missing file
PLAN_MEMORY_PROFILING.md and lists "Target Allocs" (e.g., Threshold Calc, VRF
Verify, KES Verify, Leader Check, MkInputVrf) that conflict with actual
baselines in BASELINES.md; update the table to either remove or correct the
stale reference and clarify whether the numbers are aspirational by (a) adding a
parenthetical "aspirational target" note for each target allocation row or (b)
replacing the target values with the current baseline numbers from BASELINES.md,
and remove or add a link to PLAN_MEMORY_PROFILING.md if that file is intended to
exist so readers are not misled.
In `@internal/bench/regression_test.go`:
- Line 45: Fix the typo in the inline comment for kesSeed: change "lenght" to
"length" but do not modify the kesSeed value itself (the string must remain
unchanged to match the canonical test vector); update the comment next to
kesSeed to read something like "Matches canonical test vector; string length
spelled correctly" or simply correct "lenght" → "length" while keeping kesSeed =
[]byte("test string of 32 byte of lenght") intact.
In `@internal/bench/script_bench_test.go`:
- Around line 258-270: The benchmark name construction is fragile: replace the
expression using string(rune('0'+depth)) inside the b.Run call with a robust
formatted string using fmt.Sprintf (e.g., fmt.Sprintf("%s/Depth%d", scriptType,
depth)); update the import to include fmt if missing and ensure the b.Run
identifier that currently builds the name (where scriptTypes, depths,
createNestedScript and b.Run are used) uses the new formatted name so
multi-digit depths render correctly.
- Around line 48-119: The helper functions createPubkeyScript, createAllScript,
createAnyScript, createNofKScript, createInvalidBeforeScript, and
createInvalidHereafterScript currently ignore errors from cbor.Encode and
ns.UnmarshalCBOR; update each to surface failures by either (A) adding a
testing.TB parameter and calling tb.Fatal/ tb.Fatalf when cbor.Encode or
ns.UnmarshalCBOR returns an error, or (B) if you prefer not to thread testing.TB
through tests, panic on any non-nil error from cbor.Encode or ns.UnmarshalCBOR
so encoding/unmarshal failures fail fast during test setup.
In `@internal/bench/tx_bench_test.go`:
- Around line 36-44: The benchmark is passing a BlockType (from
BlockTypeFromEra) into ledger.NewTransactionFromCbor which expects a TxType, and
the error from BlockTypeFromEra is being ignored; add a conversion so you
compute a txType (either by implementing a TxTypeFromEra helper or a simple
era->TxType map) and pass that txType into ledger.NewTransactionFromCbor instead
of blockType, and check the returned error from the era->type conversion before
running the benchmark (i.e., replace BlockTypeFromEra(era) with a proper TxType
resolution and handle its error).
🧹 Nitpick comments (15)
internal/bench/README.md (1)
21-21: Add language specifiers to fenced code blocks.Static analysis flagged lines 21, 109, and 134 as fenced code blocks without a language specified. Adding a language (e.g.,
textor the appropriate language) improves rendering and satisfies markdownlint (MD040).internal/bench/helpers.go (1)
142-193: Consider consolidating the parallel era switch statements.
BlockTypeFromEraandblockCborFromEraboth have identical switch structures over era names. A map-based approach (e.g.,map[string]struct{ blockType uint; hex string }) would reduce duplication and make adding new eras a single-point change.internal/bench/helpers_test.go (2)
31-60: Consider using ledger block type constants instead of hardcoded integers.The expected values (1-7) are hardcoded magic numbers. Using
ledger.BlockTypeByronMain,ledger.BlockTypeShelley, etc. would make the test self-documenting and resilient to constant value changes.Example for the first few entries
+ "github.com/blinklabs-io/gouroboros/ledger" ... tests := []struct { era string expected uint wantErr bool }{ - {"byron", 1, false}, - {"shelley", 2, false}, + {"byron", ledger.BlockTypeByronMain, false}, + {"shelley", ledger.BlockTypeShelley, false}, // ... and so on
62-119: Missing test coverage forLoadTxFixture,MustLoadTxFixture, and ledger state variants.There are no tests for
LoadTxFixture/MustLoadTxFixtureor forBenchLedgerStateWithUtxos/BenchLedgerStateWithUtxoFunc. These are exported functions that benchmark tests rely on. Consider adding coverage for at least the happy path of tx fixture loading and the ledger state variants.internal/bench/script_bench_test.go (1)
339-343: All three Plutus version benchmarks use identical script bytes.
testPlutusV1ScriptHexandtestPlutusV2ScriptHexare aliased totestPlutusV3ScriptHex, making the V1/V2/V3 sub-benchmarks measure the exact same thing. The comments acknowledge this, but it could be misleading when comparing results across versions. Consider adding a comment in the benchmark output or using distinct (even synthetic) scripts per version if available.internal/bench/block_bench_test.go (1)
232-327: Significant code duplication withBenchmarkVerifyKesComponents(lines 461–563).The
kesDatastruct and the entire header type-switch for extracting KES fields (bodyCbor, signature, hotVkey, kesPeriod) are duplicated verbatim betweenBenchmarkBlockKESVerificationandBenchmarkVerifyKesComponents. Consider extracting a helper function likeextractKesData(blocks []blockData) []kesDatato eliminate the ~80 duplicated lines.♻️ Sketch of proposed extraction
Define the
kesDatastruct and the extraction helper at file scope:type kesData struct { name string bodyCbor []byte signature []byte hotVkey []byte kesPeriod uint64 slot uint64 } func extractKesData(blocks []blockData) []kesData { result := make([]kesData, 0, len(blocks)) for _, bd := range blocks { header := bd.block.Header() var bodyCbor []byte var sig []byte var hotVkey []byte var kesPeriod uint64 var err error switch h := header.(type) { // ... same switch cases ... default: continue } result = append(result, kesData{ name: bd.name, bodyCbor: bodyCbor, signature: sig, hotVkey: hotVkey, kesPeriod: kesPeriod, slot: header.SlotNumber(), }) } return result }Then both benchmarks would simply call
extractKesData(postByronBlocks).internal/bench/cbor_bench_test.go (1)
326-331: Inconsistent Byron filtering: string comparison vs. block type constant.Line 329 checks
tb.name == "Byron"while other benchmarks (e.g.,block_bench_test.go:88,cbor_bench_test.go:412) useBlockTypeByronMain/BlockTypeByronEbbconstants. The string comparison is fragile and could miss a Byron block with a different name or fail to skip EBB blocks.♻️ Suggested fix
- if tb.name == "Byron" { + if tb.blockType == ledger.BlockTypeByronMain || tb.blockType == ledger.BlockTypeByronEbb { continue }This would require changing
testBlockDatato store blockType (which it already does at line 41).internal/bench/tx_bench_test.go (1)
24-47: Several benchmarks silently skip eras with no transactions or empty CBOR.The pattern of
continueat lines 29, 33 means no sub-benchmark is registered for those eras. If a fixture is broken or missing, this silently produces incomplete benchmark results. Consider usingb.Skip()inside theb.Runclosure instead, so the skipped era is visible in output.internal/bench/profile_test.go (2)
411-423: Native script detection logic doesn't actually find native scripts.The check
tx.Cbor() != nil && len(tx.Cbor()) > 0is true for any transaction with CBOR data, not only those containing native script witnesses. ThescriptsFoundflag is set but never meaningfully used to drive the actual profiling subtests below. Consider either removing this dead detection block or actually extracting and evaluating native scripts from witnesses.
427-445: Profiling subtests measure trivial Go operations, not native script evaluation.
EmptyKeyHashLookupprofiles a map lookup andSlotComparisonprofiles integer comparisons — both are basic Go operations unrelated to the actualNativeScript.Evaluatecode path. These won't produce useful profiling data for native script performance. If actual script evaluation profiling is intended, consider constructing syntheticNativeScriptinstances and callingEvaluate.scripts/profile.sh (2)
105-119: Unquoted$cmd_argsrelies on word splitting for argument passing.Using a bash array would be more robust and avoid potential issues if any argument ever contains spaces or glob characters.
♻️ Suggested improvement
- local cmd_args="-tags=profile -run=$test_name -timeout=10m" + local -a cmd_args=(-tags=profile "-run=$test_name" -timeout=10m) case "$PROFILE_TYPE" in cpu) echo "Generating CPU profile: $cpu_prof" - go test $cmd_args -cpuprofile="$cpu_prof" ./internal/bench/... + go test "${cmd_args[@]}" -cpuprofile="$cpu_prof" ./internal/bench/... ;; mem) echo "Generating memory profile: $mem_prof" - go test $cmd_args -memprofile="$mem_prof" ./internal/bench/... + go test "${cmd_args[@]}" -memprofile="$mem_prof" ./internal/bench/... ;; both) echo "Generating CPU profile: $cpu_prof" echo "Generating memory profile: $mem_prof" - go test $cmd_args -cpuprofile="$cpu_prof" -memprofile="$mem_prof" ./internal/bench/... + go test "${cmd_args[@]}" -cpuprofile="$cpu_prof" -memprofile="$mem_prof" ./internal/bench/... ;;
62-64: Hardcoded line numbers for help text extraction are fragile.If the header comments are edited (added/removed lines),
sed -n '16,44p'will extract the wrong range. Consider using a marker-based approach (e.g., extracting lines between# Usage:and the next blank line) or simply inlining the help text in a heredoc.internal/bench/regression_test.go (1)
191-246:panicin helper functions will crash the test process without useful diagnostics.The
*Oncehelpers (e.g.,vrfVerifyOnce,kesVerifyOnce,cborBlockDecodeOnce) usepanicon failure. Since they're called insidetesting.AllocsPerRun, a failure would crash the entire process rather than produce a proper test failure with-voutput. This is a pragmatic tradeoff given theAllocsPerRunAPI, but worth documenting with a brief comment explaining whypanicis used instead oft.Fatal..github/workflows/benchmark.yml (2)
29-30: Pinbenchstatto a specific version for reproducibility.All other actions in this workflow are pinned by commit hash, but
benchstat@latestcan silently change behavior or output format between runs, potentially affecting regression detection.Suggested fix
- name: Install benchstat - run: go install golang.org/x/perf/cmd/benchstat@latest + run: go install golang.org/x/perf/cmd/benchstat@v0.0.0-20250210151756-PICK_A_COMMITPick a known-good commit hash from https://pkg.go.dev/golang.org/x/perf.
63-87: Quote shell variables to prevent globbing and word splitting.Static analysis (SC2086) flags unquoted
$GITHUB_STEP_SUMMARYand$GITHUB_OUTPUT. While GHA controls these paths, quoting is a good defensive practice.Suggested fix
- echo "## Benchmark Comparison" >> $GITHUB_STEP_SUMMARY - echo "" >> $GITHUB_STEP_SUMMARY - echo '```' >> $GITHUB_STEP_SUMMARY - cat benchmark-comparison.txt >> $GITHUB_STEP_SUMMARY - echo '```' >> $GITHUB_STEP_SUMMARY + { + echo "## Benchmark Comparison" + echo "" + echo '```' + cat benchmark-comparison.txt + echo '```' + } >> "$GITHUB_STEP_SUMMARY" ... - echo "regression_found=$REGRESSION_FOUND" >> $GITHUB_OUTPUT + echo "regression_found=$REGRESSION_FOUND" >> "$GITHUB_OUTPUT"
|
|
||
| | Metric | Threshold | Rationale | | ||
| |--------|-----------|-----------| | ||
| | Allocation Count | +50% | Catches allocation leaks | | ||
| | Bytes Allocated | +100% | Allows some flexibility for features | | ||
| | Time (ns/op) | +50% | Catches performance regressions | | ||
|
|
There was a problem hiding this comment.
Inconsistent time regression threshold with README.md.
BASELINES.md states the time threshold is +50%, but README.md (line 397) states "Time increases by more than 20%." These should be consistent to avoid confusion among contributors. Please reconcile the threshold across both documents (and ideally with the actual CI workflow configuration).
🤖 Prompt for AI Agents
In `@internal/bench/BASELINES.md` around lines 186 - 192, The time regression
threshold is inconsistent between BASELINES.md ("Time (ns/op) | +50%") and
README.md ("Time increases by more than 20%"); pick a canonical threshold and
make them match: update the "Time (ns/op)" cell in BASELINES.md and the
corresponding sentence in README.md to the chosen value, and also update the
benchmark/CI configuration that enforces the threshold (the bench threshold
variable or workflow step that checks time regressions) so the docs and CI are
consistent. Ensure the updated value is used everywhere and adjust any
explanatory text to match.
| b.RunParallel(func(pb *testing.PB) { | ||
| for pb.Next() { | ||
| _, err := ledger.NewBlockFromCbor(tb.blockType, tb.cbor) | ||
| if err != nil { | ||
| b.Fatalf("failed to decode %s block: %v", tb.name, err) | ||
| } | ||
| } |
There was a problem hiding this comment.
b.Fatalf inside b.RunParallel causes undefined behavior.
Go's testing package requires that Fatal/Fatalf/FailNow are only called from the goroutine running the benchmark function. RunParallel spawns multiple goroutines, so calling b.Fatalf here leads to a data race and may panic or produce corrupt results. The same issue exists in BenchmarkCBORDecodeTxParallel (line 154).
🐛 Proposed fix
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
_, err := ledger.NewBlockFromCbor(tb.blockType, tb.cbor)
if err != nil {
- b.Fatalf("failed to decode %s block: %v", tb.name, err)
+ b.Errorf("failed to decode %s block: %v", tb.name, err)
+ return
}
}
})Apply the same pattern at line 154 for BenchmarkCBORDecodeTxParallel.
🤖 Prompt for AI Agents
In `@internal/bench/cbor_bench_test.go` around lines 113 - 119, Replace the unsafe
b.Fatalf call inside the goroutines spawned by b.RunParallel: capture the first
decode error from inside the parallel loop into a shared container (e.g., an
error variable set once via sync/atomic.Value or a buffered chan error) instead
of calling b.Fatalf directly, then after b.RunParallel returns check that
container and call b.Fatalf from the benchmark goroutine if an error was
recorded; apply the same change for both the block decode loop (where
NewBlockFromCbor is called) and the BenchmarkCBORDecodeTxParallel test.
internal/bench/regression_test.go
Outdated
| vrfEpochNonce = make([]byte, 32) | ||
|
|
||
| // KES test data | ||
| kesSeed = []byte("test string of 32 byte of lenght") // Matches canonical test vector |
There was a problem hiding this comment.
Typo in comment: "lenght" → "length".
"test string of 32 byte of lenght" — the string itself likely needs to stay unchanged to match the canonical test vector, but the comment could clarify this.
🤖 Prompt for AI Agents
In `@internal/bench/regression_test.go` at line 45, Fix the typo in the inline
comment for kesSeed: change "lenght" to "length" but do not modify the kesSeed
value itself (the string must remain unchanged to match the canonical test
vector); update the comment next to kesSeed to read something like "Matches
canonical test vector; string length spelled correctly" or simply correct
"lenght" → "length" while keeping kesSeed = []byte("test string of 32 byte of
lenght") intact.
b334cf3 to
c7b7371
Compare
Benchmark ResultsBenchmark Comparison (click to expand)Generated by benchmark workflow |
c7b7371 to
710f9cb
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In `@internal/bench/profile_test.go`:
- Around line 337-368: The sub-test TestProfileConsensus/LeaderValueComputation
generates VRF proofs but discards the proof hash instead of computing the leader
value; fix by either (A) invoking the actual consensus leader-value logic
(import the consensus package and call consensus.VrfLeaderValue or the exported
function that consumes VRF output) passing the VRF proof/hash returned by
vrf.Prove, or (B) if you prefer not to import consensus, compute the leader
value inline exactly as consensus does (e.g., compute a Blake2b-256 of the VRF
output via blake2b.Sum256(hash) and use that result instead of `_ = hash`), and
if you intend only to profile vrf.Prove rename the sub-test from
"LeaderValueComputation" to avoid confusion; update the loop body in
TestProfileConsensus/LeaderValueComputation accordingly.
- Around line 219-243: TestProfileTxValidation currently profiles an empty
validation loop because the local rules slice is empty; replace the empty rules
slice with the era's actual UtxoValidationRules (or a representative subset) and
pass that into common.VerifyTransaction so real validation paths are exercised
(refer to the UtxoValidationRules symbol to locate the rule set). Also switch
from using BenchLedgerState() to MockLedgerState (from
internal/test/ledger/ledger.go) for the ls variable so the test follows the
*_test.go guideline and uses the mock ledger state during validation profiling.
- Around line 417-487: TestProfileNativeScript currently never exercises native
script evaluation: it only checks tx.Cbor and then runs trivial benchmarks
(EmptyKeyHashLookup/SlotComparison). Replace or augment those sub-tests to
construct real NativeScript instances (use the helper constructors in
script_bench_test.go) and call NativeScript.Evaluate inside the hot loops
(instead of raw map lookups/integer compares). Also use the scriptsFound check
or remove it if unused; ensure you pass the same key set and validity window
(keyHashes, slot/validityStart/validityEnd) into the Evaluate calls so the
benchmark measures the actual script evaluation path (refer to
TestProfileNativeScript, EmptyKeyHashLookup, SlotComparison, and
NativeScript.Evaluate).
In `@internal/bench/README.md`:
- Around line 19-27: The README's "Package Structure" and "Available Benchmarks"
sections are incomplete; update internal/bench/README.md to list the new files
and directories (block_bench_test.go, cbor_bench_test.go, tx_bench_test.go,
script_bench_test.go, regression_test.go, helpers_test.go, BASELINES.md,
PROFILING.md, and testdata/) in the package structure, and expand the "Available
Benchmarks" section to document the block, CBOR, transaction, and script
benchmarks (reference the corresponding test files by name) and mention the
regression and profiling tests and baseline/profiling docs; keep descriptions
concise and consistent with the existing consensus benchmark entries.
In `@internal/bench/script_bench_test.go`:
- Around line 470-492: In BenchmarkPlutusScriptHash, the hex.DecodeString calls
for testPlutusV3ScriptHex and testPlutusV1ScriptHex discard errors; update the
setup in function BenchmarkPlutusScriptHash to check the returned errors for
plutusV3Bytes and plutusV1Bytes and call b.Fatalf (or b.Fatalf with the error)
if decoding fails so the benchmark fails fast instead of operating on nil/empty
bytes when creating common.PlutusV1Script and common.PlutusV3Script.
- Around line 560-576: The CBOR encode errors are currently ignored in the
benchmark setup (calls to cbor.Encode when building data for the "Decode/Pubkey"
and "Decode/All_Depth3" sub-benchmarks), which can produce nil data and
invalidate the decode benchmark; after each cbor.Encode(&script) call (used in
the createPubkeyScript and createNestedScript setups), check the returned error
and call b.Fatalf with a descriptive message including the error (e.g., "failed
to encode script: %v") so the benchmark fails fast on encode errors rather than
repeatedly unmarshaling nil.
🧹 Nitpick comments (2)
internal/bench/testdata/README.md (1)
7-11: Add language identifiers to fenced code blocks.Markdownlint flags three fenced code blocks (lines 7, 38, 50) missing language specifiers. Use
```textfor directory structure and fixture format examples.Example fix for line 7
-``` +```text testdata/ blocks/ # Block CBOR fixtures (future use) transactions/ # Transaction CBOR fixtures (future use)</details> </blockquote></details> <details> <summary>internal/bench/block_bench_test.go (1)</summary><blockquote> `260-329`: **Duplicated KES data extraction switch block.** The type-switch extracting `bodyCbor`, `signature`, `hotVkey`, and `kesPeriod` from era-specific headers is copy-pasted verbatim between `BenchmarkBlockKESVerification` (lines 260-329) and `BenchmarkVerifyKesComponents` (lines 508-577). Consider extracting a shared helper like `extractKesData(blocks []blockData) []kesData` to reduce maintenance burden. <details> <summary>Sketch of helper</summary> ```go func extractKesData(blocks []blockData) []kesData { result := make([]kesData, 0, len(blocks)) for _, bd := range blocks { header := bd.block.Header() var bodyCbor, sig, hotVkey []byte var kesPeriod uint64 var err error switch h := header.(type) { case *shelley.ShelleyBlockHeader: bodyCbor, err = cbor.Encode(h.Body) if err != nil { continue } sig = h.Signature hotVkey = h.Body.OpCertHotVkey kesPeriod = uint64(h.Body.OpCertKesPeriod) // ... remaining cases ... default: continue } result = append(result, kesData{ name: bd.name, bodyCbor: bodyCbor, signature: sig, hotVkey: hotVkey, kesPeriod: kesPeriod, slot: header.SlotNumber(), }) } return result }Also applies to: 508-577
internal/bench/profile_test.go
Outdated
| // Use nil for protocol params - validation will be limited | ||
| // but we're profiling the validation code paths | ||
| var pp common.ProtocolParameters | ||
|
|
||
| // Empty validation rules slice - we're just profiling the framework | ||
| // overhead | ||
| var rules []common.UtxoValidationRuleFunc | ||
|
|
||
| for i := 0; i < profileIterations; i++ { | ||
| for txIdx, tx := range txs { | ||
| // VerifyTransaction runs all applicable validation rules | ||
| // Errors may occur since we use nil protocol parameters | ||
| err := common.VerifyTransaction(tx, slot, ls, pp, rules) | ||
| if err != nil && i == 0 && txIdx == 0 { | ||
| // Log once to avoid flooding output | ||
| t.Logf( | ||
| "VerifyTransaction error (expected with nil params): %v", | ||
| err, | ||
| ) | ||
| } | ||
| } | ||
| } | ||
| }) | ||
| } | ||
| } |
There was a problem hiding this comment.
TestProfileTxValidation profiles an empty loop — no validation rules are executed.
The rules slice on line 225 is empty, so common.VerifyTransaction iterates over zero rules and returns nil immediately. This profiles only the function-call overhead of the framework, not any actual transaction validation logic. To produce useful profiles, populate rules with the era's actual UtxoValidationRules or at least a representative subset.
Also, line 196 uses BenchLedgerState() rather than MockLedgerState from internal/test/ledger/ledger.go. As per coding guidelines, **/*_test.go: "Use MockLedgerState from internal/test/ledger/ledger.go for validation tests".
🤖 Prompt for AI Agents
In `@internal/bench/profile_test.go` around lines 219 - 243,
TestProfileTxValidation currently profiles an empty validation loop because the
local rules slice is empty; replace the empty rules slice with the era's actual
UtxoValidationRules (or a representative subset) and pass that into
common.VerifyTransaction so real validation paths are exercised (refer to the
UtxoValidationRules symbol to locate the rule set). Also switch from using
BenchLedgerState() to MockLedgerState (from internal/test/ledger/ledger.go) for
the ls variable so the test follows the *_test.go guideline and uses the mock
ledger state during validation profiling.
710f9cb to
c4ebbb8
Compare
|
@cubic-dev-ai review |
@wolf31o2 I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
5 issues found across 15 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="internal/bench/block_bench_test.go">
<violation number="1" location="internal/bench/block_bench_test.go:127">
P2: The pre-parsed validation benchmark also uses the fixed all-zero eta0, which causes VerifyBlock to return on VRF failure and skip KES/body-hash checks for blocks whose epoch nonce differs. Use a per-block epoch nonce (or fixtures that include it) so this benchmark measures full validation instead of the VRF failure path.</violation>
</file>
<file name="internal/bench/regression_test.go">
<violation number="1" location="internal/bench/regression_test.go:354">
P3: `NewSumKesFromBytes` allocations are measured without first validating the deserialization succeeds, so an invalid signature/depth would still pass while measuring the error path. Add a pre-check (like other subtests) to assert the call succeeds before running `AllocsPerRun`.</violation>
</file>
<file name="internal/bench/tx_bench_test.go">
<violation number="1" location="internal/bench/tx_bench_test.go:46">
P2: Check the error from ledger.NewTransactionFromCbor so the benchmark fails if decoding regresses instead of silently benchmarking a failure path.</violation>
<violation number="2" location="internal/bench/tx_bench_test.go:186">
P2: Handle the error from tx.Utxorpc (e.g., pre-validate before timing or fail fast) to avoid benchmarking a failing path.</violation>
</file>
<file name="scripts/profile.sh">
<violation number="1" location="scripts/profile.sh:49">
P2: Option parsing treats `--help`/`--view` as positional arguments when profile_type is omitted, so the documented options fail unless both positional args are supplied. This makes `./scripts/profile.sh --help` and `./scripts/profile.sh block --view` error out even though defaults exist.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
internal/bench/block_bench_test.go
Outdated
| } | ||
| _, _, _, _, err = ledger.VerifyBlock( | ||
| block, | ||
| defaultEta0Hex, |
There was a problem hiding this comment.
P2: The pre-parsed validation benchmark also uses the fixed all-zero eta0, which causes VerifyBlock to return on VRF failure and skip KES/body-hash checks for blocks whose epoch nonce differs. Use a per-block epoch nonce (or fixtures that include it) so this benchmark measures full validation instead of the VRF failure path.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/block_bench_test.go, line 127:
<comment>The pre-parsed validation benchmark also uses the fixed all-zero eta0, which causes VerifyBlock to return on VRF failure and skip KES/body-hash checks for blocks whose epoch nonce differs. Use a per-block epoch nonce (or fixtures that include it) so this benchmark measures full validation instead of the VRF failure path.</comment>
<file context>
@@ -0,0 +1,617 @@
+ }
+ _, _, _, _, err = ledger.VerifyBlock(
+ block,
+ defaultEta0Hex,
+ defaultSlotsPerKesPeriod,
+ cfg,
</file context>
c4ebbb8 to
ccce5cb
Compare
|
@cubic-dev-ai review |
@wolf31o2 I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
4 issues found across 15 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="internal/bench/PROFILING.md">
<violation number="1" location="internal/bench/PROFILING.md:25">
P2: Using `-http=:8080` binds pprof on all interfaces. For safety, docs should recommend binding to localhost (e.g., `-http=localhost:8080`) to avoid exposing profiling data on shared machines.</violation>
</file>
<file name="scripts/profile.sh">
<violation number="1" location="scripts/profile.sh:187">
P3: When `COMPONENT=all`, the final instructions point to non-existent `cpu_all.prof`/`mem_all.prof` files. This misleads users after running `./scripts/profile.sh all`. Adjust the output to use a real component or conditionally describe the example profile.</violation>
</file>
<file name="internal/bench/cbor_bench_test.go">
<violation number="1" location="internal/bench/cbor_bench_test.go:209">
P2: The block encode and round-trip benchmarks call `block.Cbor()`, which returns the original decoded CBOR bytes for DecodeStoreCbor-backed types. Since NewBlockFromCbor sets the stored CBOR, these benchmarks skip re-encoding and only measure a byte slice lookup. Use an explicit encode path (e.g., cbor.Encode) in both BenchmarkCBOREncodeBlock and BenchmarkCBORRoundTrip to benchmark serialization work.</violation>
<violation number="2" location="internal/bench/cbor_bench_test.go:231">
P2: BenchmarkCBOREncodeTx is using `transaction.Cbor()`, which returns the cached original bytes for DecodeStoreCbor-backed transactions. Since NewTransactionFromCbor sets the stored bytes, this benchmark skips encoding work. Use an explicit encoding call to measure serialization cost.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
internal/bench/cbor_bench_test.go
Outdated
| b.ReportAllocs() | ||
| b.ResetTimer() | ||
| for i := 0; i < b.N; i++ { | ||
| result := transaction.Cbor() |
There was a problem hiding this comment.
P2: BenchmarkCBOREncodeTx is using transaction.Cbor(), which returns the cached original bytes for DecodeStoreCbor-backed transactions. Since NewTransactionFromCbor sets the stored bytes, this benchmark skips encoding work. Use an explicit encoding call to measure serialization cost.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/cbor_bench_test.go, line 231:
<comment>BenchmarkCBOREncodeTx is using `transaction.Cbor()`, which returns the cached original bytes for DecodeStoreCbor-backed transactions. Since NewTransactionFromCbor sets the stored bytes, this benchmark skips encoding work. Use an explicit encoding call to measure serialization cost.</comment>
<file context>
@@ -0,0 +1,387 @@
+ b.ReportAllocs()
+ b.ResetTimer()
+ for i := 0; i < b.N; i++ {
+ result := transaction.Cbor()
+ if len(result) == 0 {
+ b.Fatalf("empty CBOR result for %s tx", tx.name)
</file context>
internal/bench/cbor_bench_test.go
Outdated
| b.ReportAllocs() | ||
| b.ResetTimer() | ||
| for i := 0; i < b.N; i++ { | ||
| result := block.Cbor() |
There was a problem hiding this comment.
P2: The block encode and round-trip benchmarks call block.Cbor(), which returns the original decoded CBOR bytes for DecodeStoreCbor-backed types. Since NewBlockFromCbor sets the stored CBOR, these benchmarks skip re-encoding and only measure a byte slice lookup. Use an explicit encode path (e.g., cbor.Encode) in both BenchmarkCBOREncodeBlock and BenchmarkCBORRoundTrip to benchmark serialization work.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/cbor_bench_test.go, line 209:
<comment>The block encode and round-trip benchmarks call `block.Cbor()`, which returns the original decoded CBOR bytes for DecodeStoreCbor-backed types. Since NewBlockFromCbor sets the stored CBOR, these benchmarks skip re-encoding and only measure a byte slice lookup. Use an explicit encode path (e.g., cbor.Encode) in both BenchmarkCBOREncodeBlock and BenchmarkCBORRoundTrip to benchmark serialization work.</comment>
<file context>
@@ -0,0 +1,387 @@
+ b.ReportAllocs()
+ b.ResetTimer()
+ for i := 0; i < b.N; i++ {
+ result := block.Cbor()
+ if len(result) == 0 {
+ b.Fatalf("empty CBOR result for %s block", tb.name)
</file context>
scripts/profile.sh
Outdated
|
|
||
| echo "=== Profiling complete ===" | ||
| echo "" | ||
| echo "To analyze profiles manually:" |
There was a problem hiding this comment.
P3: When COMPONENT=all, the final instructions point to non-existent cpu_all.prof/mem_all.prof files. This misleads users after running ./scripts/profile.sh all. Adjust the output to use a real component or conditionally describe the example profile.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At scripts/profile.sh, line 187:
<comment>When `COMPONENT=all`, the final instructions point to non-existent `cpu_all.prof`/`mem_all.prof` files. This misleads users after running `./scripts/profile.sh all`. Adjust the output to use a real component or conditionally describe the example profile.</comment>
<file context>
@@ -0,0 +1,196 @@
+
+echo "=== Profiling complete ==="
+echo ""
+echo "To analyze profiles manually:"
+echo " go tool pprof -http=:8080 $OUTPUT_DIR/cpu_${COMPONENT}.prof"
+echo " go tool pprof -http=:8081 $OUTPUT_DIR/mem_${COMPONENT}.prof"
</file context>
ccce5cb to
2dc3f8a
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@internal/bench/cbor_bench_test.go`:
- Around line 244-280: BenchmarkCBORDecodeRaw currently assumes testBlocks has
at least one element and does testBlocks[0], which will panic if it's empty;
modify BenchmarkCBORDecodeRaw to first check if len(testBlocks) == 0 and call
b.Fatal or b.Fatalf with a clear message (e.g., "no test blocks available for
BenchmarkCBORDecodeRaw") before accessing testBlocks[0] or iterating to find
smallestBlock, then proceed with the existing smallestBlock selection and
benchmark runs.
In `@internal/bench/consensus_bench_test.go`:
- Around line 295-310: BenchmarkConsensusParallelThreshold currently shares a
single *big.Rat (activeSlotCoeff) across goroutines which causes a data race;
move creation of the big.Rat into the b.RunParallel callback so each goroutine
constructs its own instance (e.g., call big.NewRat(1, 20) or
new(big.Rat).Set(...) inside the pb loop) before invoking
consensus.CertifiedNatThreshold to avoid concurrent access to the same *big.Rat.
🧹 Nitpick comments (5)
internal/bench/helpers.go (1)
214-233:blockCborFromErareturnserrorbut internally panics viaMustDecodeHex.The function signature returns
([]byte, error), buttestdata.MustDecodeHexpanics on invalid hex rather than returning an error. Since the hex data comes from compile-time constants this is practically safe, but it makes the error return misleading—callers believe errors are handled gracefully when they'd actually get a panic. Consider either using a non-panicking hex decode to propagate errors, or simplifying the return type if panics are acceptable here.Option: propagate errors instead of panicking
func blockCborFromEra(era string) ([]byte, error) { + var hexStr string switch strings.ToLower(era) { case "byron": - return testdata.MustDecodeHex(testdata.ByronBlockHex), nil + hexStr = testdata.ByronBlockHex case "shelley": - return testdata.MustDecodeHex(testdata.ShelleyBlockHex), nil + hexStr = testdata.ShelleyBlockHex // ... same for other eras default: return nil, fmt.Errorf("unknown era: %s", era) } + b, err := hex.DecodeString(strings.TrimSpace(hexStr)) + if err != nil { + return nil, fmt.Errorf("decode %s hex: %w", era, err) + } + return b, nil }internal/bench/cbor_bench_test.go (1)
109-115: Minor duplication:mustDecodeHexvstestdata.MustDecodeHex.This local helper duplicates
testdata.MustDecodeHex(which also panics on decode failure). Consider reusing the existing utility to avoid drift.internal/bench/block_bench_test.go (1)
525-629: Duplicated KES data extraction switch betweenBenchmarkBlockKESVerificationandBenchmarkVerifyKesComponents.The
kesDatastruct and the ~50-line type switch extractingbodyCbor,signature,hotVkey, andkesPeriodfrom era-specific headers (lines 531-611) is a near-exact copy of lines 283-363. Consider extracting a shared helper likeextractKesData(blocks []blockData) []kesDatato reduce duplication and maintenance burden.Sketch of extracted helper
type kesData struct { name string bodyCbor []byte signature []byte hotVkey []byte kesPeriod uint64 slot uint64 } func extractKesData(blocks []blockData) []kesData { result := make([]kesData, 0, len(blocks)) for _, bd := range blocks { header := bd.block.Header() var bodyCbor []byte var sig, hotVkey []byte var kesPeriod uint64 var err error switch h := header.(type) { case *shelley.ShelleyBlockHeader: bodyCbor, err = cbor.Encode(h.Body) sig = h.Signature hotVkey = h.Body.OpCertHotVkey kesPeriod = uint64(h.Body.OpCertKesPeriod) case *allegra.AllegraBlockHeader: bodyCbor, err = cbor.Encode(h.Body) sig = h.Signature hotVkey = h.Body.OpCertHotVkey kesPeriod = uint64(h.Body.OpCertKesPeriod) // ... remaining eras ... default: continue } if err != nil { continue } result = append(result, kesData{ name: bd.name, bodyCbor: bodyCbor, signature: sig, hotVkey: hotVkey, kesPeriod: kesPeriod, slot: header.SlotNumber(), }) } return result }Then both benchmarks would call
extractKesData(postByronBlocks).internal/bench/profile_test.go (1)
278-308: Stale/misleading comment: consensus package is directly imported, not "dynamically."Line 279's comment says "Import consensus package dynamically to avoid circular dependencies" but
consensusis already imported normally at line 42 and used directly at line 305. This comment should be removed to avoid confusion.Suggested fix
func TestProfileConsensus(t *testing.T) { - // Import consensus package dynamically to avoid circular dependencies - // For now, we test the VRF components directly - t.Run("LeaderValueComputation", func(t *testing.T) {internal/bench/regression_test.go (1)
102-167: Consider extracting shared allocation baselines to avoid drift betweenTestAllocationRegressionandTestRegression*tests.The same limits appear in two places — e.g., VRFVerify limit of 12 is defined here (line 132) and again in
TestRegressionVRF/Verify(line 280), LeaderCheck limit of 1350 here (line 137) and inTestRegressionConsensus/IsSlotLeaderFromComponents(line 412). If a baseline is updated in one place but not the other, the two tests will enforce different limits for the same operation.Consider defining the baselines as package-level constants or a shared map.
2dc3f8a to
3feff17
Compare
|
@cubic-dev-ai review |
@wolf31o2 I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
2 issues found across 15 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name=".github/workflows/benchmark.yml">
<violation number="1" location=".github/workflows/benchmark.yml:35">
P2: The `|| true` forces the benchmark step to succeed even when `go test` fails, so CI will never fail on benchmark/test errors. Remove it so failures are surfaced.</violation>
</file>
<file name="internal/bench/BASELINES.md">
<violation number="1" location="internal/bench/BASELINES.md:11">
P3: The overview claims CI fails when allocations exceed thresholds by >50%, but the regression thresholds table lists allocation count at +20%. This inconsistency can mislead contributors about the actual CI criteria. Align the overview with the table or make it refer to the detailed thresholds below.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
.github/workflows/benchmark.yml
Outdated
| ./cbor/... \ | ||
| ./pipeline/... \ | ||
| ./ledger/... \ | ||
| 2>&1 | tee benchmark.txt || true |
There was a problem hiding this comment.
P2: The || true forces the benchmark step to succeed even when go test fails, so CI will never fail on benchmark/test errors. Remove it so failures are surfaced.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/benchmark.yml, line 35:
<comment>The `|| true` forces the benchmark step to succeed even when `go test` fails, so CI will never fail on benchmark/test errors. Remove it so failures are surfaced.</comment>
<file context>
@@ -0,0 +1,42 @@
+ ./cbor/... \
+ ./pipeline/... \
+ ./ledger/... \
+ 2>&1 | tee benchmark.txt || true
+
+ - name: Upload benchmark results
</file context>
3feff17 to
c626b9c
Compare
|
@cubic-dev-ai review |
@wolf31o2 I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/benchmark.yml:
- Around line 26-35: Update the "Run benchmarks" job so the workflow produces
the three artifacts described in the README and enforces regression thresholds:
write the raw run output to benchmark-current.txt (replace the single
benchmark.txt), fetch or restore benchmark-base.txt (e.g., from artifacts or
main branch), run a small comparison step that generates
benchmark-comparison.txt and fails the job if any metric exceeds the README
thresholds (alloc +20%, bytes +50%, time +20%), and remove the "|| true" on the
benchmark run so failures surface instead of being swallowed; ensure the steps
and artifact names exactly match README references (benchmark-current.txt,
benchmark-base.txt, benchmark-comparison.txt) and name the steps such that "Run
benchmarks" is the producer of benchmark-current.txt.
In `@internal/bench/BASELINES.md`:
- Around line 185-191: The BASELINES.md table lists thresholds for "Allocation
Count", "Bytes Allocated", and "Time (ns/op)" but the CI job that runs
benchmarks (the "benchmark" job / step that runs benchmarks and uploads the
results) never compares results to these thresholds; add an enforcement step to
that job which downloads the benchmark artifact (e.g., the produced
bench-results JSON/CSV), parses the metrics, computes percent changes versus the
stored baseline values from BASELINES.md (or a machine-readable baseline file),
and exits non‑zero if any metric exceeds its threshold (+20% allocation count,
+50% bytes, +20% time); alternatively, if you don’t want enforcement yet, update
BASELINES.md to clearly mark these thresholds as "planned/not enforced" so
readers aren’t misled.
In `@internal/bench/block_bench_test.go`:
- Around line 527-607: The duplicated kesData type and era-specific header
switch in BenchmarkBlockKESVerification and the other benchmark should be
extracted into a single helper: add a function extractKesData(blocks
[]blockData) []kesData that owns the kesData type (or move kesData to package
scope) and implements the type-switch over header.(*shelley.ShelleyBlockHeader,
*allegra.AllegraBlockHeader, *mary.MaryBlockHeader, *alonzo.AlonzoBlockHeader,
*babbage.BabbageBlockHeader, *conway.ConwayBlockHeader) to cbor.Encode(h.Body),
set signature, hotVkey, kesPeriod and slot, skipping on encode errors, and
return the populated []kesData; then replace both duplicated blocks that iterate
postByronBlocks with a call to extractKesData(postByronBlocks).
In `@internal/bench/consensus_bench_test.go`:
- Around line 64-89: The shared big.Rat activeSlotCoeff in
BenchmarkConsensusThreshold can be mutated by functions like
consensus.CertifiedNatThreshold (and its internal lnOneMinus which may call
Denom/Mul); to fix, stop reusing the single activeSlotCoeff: create a new
big.NewRat(1, 20) inside each b.Run sub-benchmark (or just inside the inner
loop) and pass that fresh rat into consensus.CertifiedNatThreshold, mirroring
the safe pattern used by BenchmarkConsensusThresholdWithMode; update references
to activeSlotCoeff in the closure accordingly so each sub-benchmark gets its own
independent Rat.
In `@internal/bench/helpers_test.go`:
- Around line 103-119: TestGetTestBlocks and EraNames use different
capitalization (GetTestBlocks returns "Byron"/"Shelley" while EraNames returns
"byron"/"shelley"), so update any comparison logic to be case-insensitive: when
correlating between GetTestBlocks and EraNames (or in TestGetTestBlocks if you
add such assertions), normalize both sides (e.g., strings.ToLower on block.Name
and era name) or use strings.EqualFold to compare; reference GetTestBlocks,
EraNames, and TestGetTestBlocks to locate the code to change.
In `@internal/bench/profile_test.go`:
- Around line 189-261: TestProfileTxValidation currently uses BenchLedgerState()
contrary to the test-guideline that validation tests must use MockLedgerState;
change the setup to call MockLedgerState() instead of BenchLedgerState() (so the
VerifyTransaction(tx, slot, ls, nil, rules) calls run against the prescribed
mock ledger), and if MockLedgerState is genuinely incompatible with profiling
here, update the PR description to document why BenchLedgerState is required and
reference TestProfileTxValidation, BenchLedgerState, MockLedgerState, and
VerifyTransaction in that justification.
In `@internal/bench/README.md`:
- Around line 438-471: The README's CI artifact and regression threshold
description is out of sync with the workflow; update the "CI Benchmark Results"
section to match .github/workflows/benchmark.yml by replacing the three-file
list (benchmark-current.txt, benchmark-base.txt, benchmark-comparison.txt) with
the actual single artifact name (benchmark.txt), remove or mark as "not
implemented" the automatic comparison/regression threshold checks
(allocation/bytes/time), and add a short note pointing readers to
.github/workflows/benchmark.yml for the current artifact contents and any future
comparison logic.
In `@internal/bench/regression_test.go`:
- Around line 106-113: getThresholdMultiplier currently silently returns 1.0
when GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER is set but invalid; update
getThresholdMultiplier to detect ParseFloat errors and non-positive values and
emit a clear error message to stderr (e.g., via fmt.Fprintf(os.Stderr, ... ) or
log.Printf) that includes the env var name and the bad value, then fall back to
1.0; reference the function name getThresholdMultiplier and the env var
GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER when adding the logging so someone reading
failures can locate the change.
🧹 Nitpick comments (5)
🤖 Fix all nitpicks with AI agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/block_bench_test.go`: - Around line 527-607: The duplicated kesData type and era-specific header switch in BenchmarkBlockKESVerification and the other benchmark should be extracted into a single helper: add a function extractKesData(blocks []blockData) []kesData that owns the kesData type (or move kesData to package scope) and implements the type-switch over header.(*shelley.ShelleyBlockHeader, *allegra.AllegraBlockHeader, *mary.MaryBlockHeader, *alonzo.AlonzoBlockHeader, *babbage.BabbageBlockHeader, *conway.ConwayBlockHeader) to cbor.Encode(h.Body), set signature, hotVkey, kesPeriod and slot, skipping on encode errors, and return the populated []kesData; then replace both duplicated blocks that iterate postByronBlocks with a call to extractKesData(postByronBlocks). In `@internal/bench/consensus_bench_test.go`: - Around line 64-89: The shared big.Rat activeSlotCoeff in BenchmarkConsensusThreshold can be mutated by functions like consensus.CertifiedNatThreshold (and its internal lnOneMinus which may call Denom/Mul); to fix, stop reusing the single activeSlotCoeff: create a new big.NewRat(1, 20) inside each b.Run sub-benchmark (or just inside the inner loop) and pass that fresh rat into consensus.CertifiedNatThreshold, mirroring the safe pattern used by BenchmarkConsensusThresholdWithMode; update references to activeSlotCoeff in the closure accordingly so each sub-benchmark gets its own independent Rat. In `@internal/bench/helpers_test.go`: - Around line 103-119: TestGetTestBlocks and EraNames use different capitalization (GetTestBlocks returns "Byron"/"Shelley" while EraNames returns "byron"/"shelley"), so update any comparison logic to be case-insensitive: when correlating between GetTestBlocks and EraNames (or in TestGetTestBlocks if you add such assertions), normalize both sides (e.g., strings.ToLower on block.Name and era name) or use strings.EqualFold to compare; reference GetTestBlocks, EraNames, and TestGetTestBlocks to locate the code to change. In `@internal/bench/profile_test.go`: - Around line 189-261: TestProfileTxValidation currently uses BenchLedgerState() contrary to the test-guideline that validation tests must use MockLedgerState; change the setup to call MockLedgerState() instead of BenchLedgerState() (so the VerifyTransaction(tx, slot, ls, nil, rules) calls run against the prescribed mock ledger), and if MockLedgerState is genuinely incompatible with profiling here, update the PR description to document why BenchLedgerState is required and reference TestProfileTxValidation, BenchLedgerState, MockLedgerState, and VerifyTransaction in that justification. In `@internal/bench/regression_test.go`: - Around line 106-113: getThresholdMultiplier currently silently returns 1.0 when GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER is set but invalid; update getThresholdMultiplier to detect ParseFloat errors and non-positive values and emit a clear error message to stderr (e.g., via fmt.Fprintf(os.Stderr, ... ) or log.Printf) that includes the env var name and the bad value, then fall back to 1.0; reference the function name getThresholdMultiplier and the env var GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER when adding the logging so someone reading failures can locate the change.internal/bench/helpers_test.go (1)
103-119: Note the capitalization mismatch between test block names and era names.
GetTestBlocks()returns blocks with capitalized names ("Byron","Shelley", etc.) whileEraNames()returns lowercase ("byron","shelley", etc.). This isn't a bug since they come from different sources, but worth being aware of if any code needs to correlate between the two — a case-insensitive comparison or normalization would be needed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/helpers_test.go` around lines 103 - 119, TestGetTestBlocks and EraNames use different capitalization (GetTestBlocks returns "Byron"/"Shelley" while EraNames returns "byron"/"shelley"), so update any comparison logic to be case-insensitive: when correlating between GetTestBlocks and EraNames (or in TestGetTestBlocks if you add such assertions), normalize both sides (e.g., strings.ToLower on block.Name and era name) or use strings.EqualFold to compare; reference GetTestBlocks, EraNames, and TestGetTestBlocks to locate the code to change.internal/bench/regression_test.go (1)
106-113: Silent fallback on invalidGOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER.If someone sets
GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER=abc, the function silently returns1.0with no warning. At.Logfisn't possible here since there's no*testing.T, but consider logging to stderr or returning an error to avoid confusion when the env var is set but misspelled/malformed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/regression_test.go` around lines 106 - 113, getThresholdMultiplier currently silently returns 1.0 when GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER is set but invalid; update getThresholdMultiplier to detect ParseFloat errors and non-positive values and emit a clear error message to stderr (e.g., via fmt.Fprintf(os.Stderr, ... ) or log.Printf) that includes the env var name and the bad value, then fall back to 1.0; reference the function name getThresholdMultiplier and the env var GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER when adding the logging so someone reading failures can locate the change.internal/bench/consensus_bench_test.go (1)
64-89: SharedactiveSlotCoeffmay be mutated byCertifiedNatThreshold.
BenchmarkConsensusThresholdcreates oneactiveSlotCoeffat line 65 and passes it through multiple sub-benchmarks sequentially. While sequential execution avoids a data race,CertifiedNatThresholdWithModeinternally callslnOneMinus(activeSlotCoeff)which may read theRat's denominator viaDenom()— andDenom()can normalize internal state. If the internal implementation ever wraps this in aMulor similar operation that takes the same pointer as both receiver and argument, the shared value could be silently corrupted between sub-benchmarks.Consider creating a fresh
big.NewRat(1, 20)inside each sub-benchmark closure to be safe, consistent with howBenchmarkConsensusThresholdWithMode(line 94) already does it at the function level.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/consensus_bench_test.go` around lines 64 - 89, The shared big.Rat activeSlotCoeff in BenchmarkConsensusThreshold can be mutated by functions like consensus.CertifiedNatThreshold (and its internal lnOneMinus which may call Denom/Mul); to fix, stop reusing the single activeSlotCoeff: create a new big.NewRat(1, 20) inside each b.Run sub-benchmark (or just inside the inner loop) and pass that fresh rat into consensus.CertifiedNatThreshold, mirroring the safe pattern used by BenchmarkConsensusThresholdWithMode; update references to activeSlotCoeff in the closure accordingly so each sub-benchmark gets its own independent Rat.internal/bench/block_bench_test.go (1)
527-607: Extract shared KES data extraction into a helper to eliminate duplication.The
kesDatatype definition and the era-specific header switch block (lines 527–607) is a near-exact copy of lines 279–358 inBenchmarkBlockKESVerification. Both iteratepostByronBlocks, type-switch on the header, callcbor.Encode(h.Body), and populate the same struct fields. A single helper likeextractKesData(blocks []blockData) []kesDatawould remove ~70 lines of duplication.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/block_bench_test.go` around lines 527 - 607, The duplicated kesData type and era-specific header switch in BenchmarkBlockKESVerification and the other benchmark should be extracted into a single helper: add a function extractKesData(blocks []blockData) []kesData that owns the kesData type (or move kesData to package scope) and implements the type-switch over header.(*shelley.ShelleyBlockHeader, *allegra.AllegraBlockHeader, *mary.MaryBlockHeader, *alonzo.AlonzoBlockHeader, *babbage.BabbageBlockHeader, *conway.ConwayBlockHeader) to cbor.Encode(h.Body), set signature, hotVkey, kesPeriod and slot, skipping on encode errors, and return the populated []kesData; then replace both duplicated blocks that iterate postByronBlocks with a call to extractKesData(postByronBlocks).internal/bench/profile_test.go (1)
189-261:BenchLedgerState()used instead ofMockLedgerState— guideline deviation.Lines 192-195 explain the rationale, but the coding guideline for
**/*_test.gostates: "UseMockLedgerStatefrominternal/test/ledger/ledger.gofor validation tests." SinceTestProfileTxValidationexercisesVerifyTransactionwith real era validation rules, it qualifies as a validation test. Consider switching toMockLedgerStateor documenting why it's unsuitable for profiling in the PR description.As per coding guidelines,
**/*_test.go: "UseMockLedgerStatefrominternal/test/ledger/ledger.gofor validation tests".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/profile_test.go` around lines 189 - 261, TestProfileTxValidation currently uses BenchLedgerState() contrary to the test-guideline that validation tests must use MockLedgerState; change the setup to call MockLedgerState() instead of BenchLedgerState() (so the VerifyTransaction(tx, slot, ls, nil, rules) calls run against the prescribed mock ledger), and if MockLedgerState is genuinely incompatible with profiling here, update the PR description to document why BenchLedgerState is required and reference TestProfileTxValidation, BenchLedgerState, MockLedgerState, and VerifyTransaction in that justification.
.github/workflows/benchmark.yml
Outdated
| - name: Run benchmarks | ||
| run: | | ||
| go test -bench=. -benchmem -count=5 -timeout=30m \ | ||
| ./vrf/... \ | ||
| ./kes/... \ | ||
| ./consensus/... \ | ||
| ./cbor/... \ | ||
| ./pipeline/... \ | ||
| ./ledger/... \ | ||
| 2>&1 | tee benchmark.txt || true |
There was a problem hiding this comment.
Workflow output doesn't match README documentation.
The README (lines 448–450) documents three artifacts (benchmark-current.txt, benchmark-base.txt, benchmark-comparison.txt) and describes regression thresholds (allocation +20%, bytes +50%, time +20%). However, this workflow only produces a single benchmark.txt with no comparison or threshold-checking step. Either update the README to match the actual workflow, or implement the missing comparison/regression logic.
Additionally, || true on line 35 silently swallows benchmark failures, so a completely broken benchmark run still uploads an empty or partial artifact and reports success.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/benchmark.yml around lines 26 - 35, Update the "Run
benchmarks" job so the workflow produces the three artifacts described in the
README and enforces regression thresholds: write the raw run output to
benchmark-current.txt (replace the single benchmark.txt), fetch or restore
benchmark-base.txt (e.g., from artifacts or main branch), run a small comparison
step that generates benchmark-comparison.txt and fails the job if any metric
exceeds the README thresholds (alloc +20%, bytes +50%, time +20%), and remove
the "|| true" on the benchmark run so failures surface instead of being
swallowed; ensure the steps and artifact names exactly match README references
(benchmark-current.txt, benchmark-base.txt, benchmark-comparison.txt) and name
the steps such that "Run benchmarks" is the producer of benchmark-current.txt.
There was a problem hiding this comment.
3 issues found across 15 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="internal/bench/consensus_bench_test.go">
<violation number="1" location="internal/bench/consensus_bench_test.go:81">
P3: Benchmark results are assigned to `_`, which allows the compiler to optimize away pure calls and skew allocations/time. Store results in a package-level sink variable to prevent elimination, per the benchmark README guidance.</violation>
</file>
<file name=".github/workflows/benchmark.yml">
<violation number="1" location=".github/workflows/benchmark.yml:27">
P2: The benchmark step masks failures (`|| true`) and pipes output without `pipefail`, so CI will succeed even if `go test` fails. This prevents the workflow from catching regressions or test failures.</violation>
</file>
<file name="internal/bench/helpers.go">
<violation number="1" location="internal/bench/helpers.go:140">
P3: `MustLoadTxFixture`/`LoadTxFixture` are never used anywhere in code (only defined here). Since `internal/bench` can’t be imported externally, these helpers are dead code unless a benchmark/test starts using them. Consider removing them or adding a benchmark that exercises them.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
.github/workflows/benchmark.yml
Outdated
| run: | | ||
| go test -bench=. -benchmem -count=5 -timeout=30m \ | ||
| ./vrf/... \ | ||
| ./kes/... \ | ||
| ./consensus/... \ | ||
| ./cbor/... \ | ||
| ./pipeline/... \ | ||
| ./ledger/... \ | ||
| 2>&1 | tee benchmark.txt || true |
There was a problem hiding this comment.
P2: The benchmark step masks failures (|| true) and pipes output without pipefail, so CI will succeed even if go test fails. This prevents the workflow from catching regressions or test failures.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/benchmark.yml, line 27:
<comment>The benchmark step masks failures (`|| true`) and pipes output without `pipefail`, so CI will succeed even if `go test` fails. This prevents the workflow from catching regressions or test failures.</comment>
<file context>
@@ -0,0 +1,42 @@
+ go-version: '1.24'
+
+ - name: Run benchmarks
+ run: |
+ go test -bench=. -benchmem -count=5 -timeout=30m \
+ ./vrf/... \
</file context>
| run: | | |
| go test -bench=. -benchmem -count=5 -timeout=30m \ | |
| ./vrf/... \ | |
| ./kes/... \ | |
| ./consensus/... \ | |
| ./cbor/... \ | |
| ./pipeline/... \ | |
| ./ledger/... \ | |
| 2>&1 | tee benchmark.txt || true | |
| run: | | |
| set -o pipefail | |
| go test -bench=. -benchmem -count=5 -timeout=30m \ | |
| ./vrf/... \ | |
| ./kes/... \ | |
| ./consensus/... \ | |
| ./cbor/... \ | |
| ./pipeline/... \ | |
| ./ledger/... \ | |
| 2>&1 | tee benchmark.txt |
c626b9c to
ef74249
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (6)
internal/bench/helpers_test.go (2)
104-121:TestGetTestBlockshardcodes7and will silently drift iftestdataadds a new era.The length assertion guards the count rather than the content; a new era in
testdatawould break this test. Consider asserting only on membership (which the subsequent map check already does), or deriving the expected length fromEraNames().♻️ Suggested improvement
func TestGetTestBlocks(t *testing.T) { blocks := GetTestBlocks() - assert.Len(t, blocks, 7) + assert.Len(t, blocks, len(EraNames())) // Verify all eras are present🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/helpers_test.go` around lines 104 - 121, TestGetTestBlocks currently asserts a fixed length of 7 which will break when testdata adds/removes eras; update the test to avoid hardcoding by either removing the assert.Len(t, blocks, 7) or replacing it with a dynamic check using EraNames() (e.g., compare len(blocks) to len(EraNames()) or only assert membership as the test already does). Locate the TestGetTestBlocks function and modify the assertion around GetTestBlocks() so the test validates membership via the eraNames map (or uses EraNames() for expected count) instead of a magic number.
25-30:BenchLedgerStateWithUtxosandBenchLedgerStateWithUtxoFuncare exported but untested.Both are used by benchmark setup code and can drift silently if the
ouroboros-mockbuilder API changes. Basic smoke tests (analogous toTestBenchLedgerState) would catch such regressions early.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/helpers_test.go` around lines 25 - 30, Add smoke tests for the exported helpers BenchLedgerStateWithUtxos and BenchLedgerStateWithUtxoFunc similar to TestBenchLedgerState: call each helper, assert the returned ledger state is non-nil (require.NotNil) and validate basic expectations (e.g., NetworkId() equals 1 and that the returned UTXO set or iterator is present/non-empty where applicable). For BenchLedgerStateWithUtxoFunc, pass a simple UTXO-creating function and assert the ledger contains the produced UTXOs; for BenchLedgerStateWithUtxos, provide a small slice of UTXO fixtures and assert they appear in the ledger. Use the existing require/assert helpers and reference the functions by name so the tests catch API drift in those builders.internal/bench/helpers.go (1)
129-150:TxTypeFromErais provided but there are noTxFixture,LoadTxFixture, orMustLoadTxFixturecounterparts.
BlockTypeFromErahas a full fixture-loading API (BlockFixture,LoadBlockFixture,MustLoadBlockFixture).TxTypeFromErais stranded without the corresponding fixture type and loaders, meaningtx_bench_test.gomust roll its own CBOR loading logic. If transaction fixture loading is in scope for this PR, the helper is incomplete; if it's deferred, aTODOcomment would clarify intent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/helpers.go` around lines 129 - 150, TxTypeFromEra currently exists without the companion fixture types/loaders (TxFixture, LoadTxFixture, MustLoadTxFixture) that BlockTypeFromEra has, so either implement the same fixture-loading API or mark intent; add TxFixture type and functions LoadTxFixture and MustLoadTxFixture mirroring the BlockFixture/LoadBlockFixture/MustLoadBlockFixture behavior (CBOR loading, error handling) so tx_bench_test.go can reuse them, or if transaction fixture loading is out of scope for this PR, add a clear TODO comment above TxTypeFromEra referencing tx_bench_test.go and noting that TxFixture/LoadTxFixture/MustLoadTxFixture will be added later.scripts/profile.sh (2)
62-63: Help extraction relies on hard-coded line numbers.
sed -n '16,43p' "$0"breaks silently if any lines are added or removed above line 16. A dedicatedprint_help()function is more maintainable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/profile.sh` around lines 62 - 63, Replace the fragile hard-coded sed slice (sed -n '16,43p' "$0") with a dedicated print_help() function: create a print_help() that outputs the script's help text (e.g., via a heredoc or by printing commented blocks), call print_help() where the sed invocation and exit 0 currently live, and remove the sed-based extraction; this centralizes help content, avoids dependence on line numbers, and ensures calls to print_help() replace the old sed -n '16,43p' "$0" usage.
165-174: Silent failure inallmode gives no indication of which components failed.
run_profile "$comp" || truediscards errors, and theview_profilesfunction silently skips missing profile files. Ifgo testfails for any component (e.g., build error, missing build tag), the user receives no failure summary.♻️ Suggested improvement: track and report failures
+FAILED_COMPONENTS=() for comp in "${COMPONENTS[@]}"; do - run_profile "$comp" || true + run_profile "$comp" || FAILED_COMPONENTS+=("$comp") done +if [[ ${`#FAILED_COMPONENTS`[@]} -gt 0 ]]; then + echo "WARNING: profiling failed for: ${FAILED_COMPONENTS[*]}" +fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/profile.sh` around lines 165 - 174, The loop that runs components currently swallows errors via `run_profile "$comp" || true` and `view_profiles` can silently skip missing files; change this to capture and report failures: invoke `run_profile "$comp"` and store its exit status, append failing component names to a failures array (e.g., FAILED_COMPONENTS+=("$comp")) instead of `|| true`, and after the loop if FAILED_COMPONENTS is non-empty print a concise failure summary listing those components and exit non-zero (or return non-zero) so CI/users see the error; also update `view_profiles` usage so when `VIEW_PROFILES` is true it either warns about missing profile files and opens the first successful component (e.g., choose the first element from COMPONENTS that succeeded) or skips opening while clearly reporting which components had no profiles.internal/bench/regression_test.go (1)
127-180: Hardcoded allocation baselines are platform/Go-version specific.The limits in
TestAllocationRegressionand the per-era caps inTestRegressionCBORwere measured on linux/arm64 with Go 1.24. Heap allocation counts can differ across CPU architectures and Go minor versions (escape-analysis improvements, GC heuristics, etc.). Running these tests on darwin/amd64 or with a newer Go toolchain may produce higher-or-lower counts, causing either spurious CI failures or silently missed regressions.Consider adding a build constraint (e.g.,
//go:build linux && arm64) or atesting.Short()/ platform check to skip the strict numerical assertions on non-baseline platforms, keeping them as informational viat.Logfonly. TheGOUROBOROS_ALLOC_THRESHOLD_MULTIPLIERenv var is a good escape hatch, but requiring manual intervention for every new environment degrades DX.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/regression_test.go` around lines 127 - 180, The allocation baselines in TestAllocationRegression (and related per-era caps referenced by TestRegressionCBOR) are platform- and Go-version-specific and should not assert strictly on non-baseline platforms; modify TestAllocationRegression to detect the baseline platform (linux && arm64) and/or honor testing.Short() and only perform the strict numeric assertion on the baseline platform (or when an explicit env override GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER is set); for other platforms, convert the check into a t.Logf informational message (or skip) so tests don’t fail spuriously—look for the TestAllocationRegression and TestRegressionCBOR symbols and gate the allocation comparison/ t.Errorf path behind a platform/build check or testing.Short() guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/benchmark.yml:
- Around line 28-35: The benchmark workflow is missing the new bench package
path so CI never runs the new benchmarks; update the benchmark command that
lists packages (the block that calls `go test -bench=. -benchmem -count=5
-timeout=30m` and enumerates packages like `./vrf/...`, `./kes/...`, etc.) to
include `./internal/bench/...` among the package arguments so tests such as
`BenchmarkBlockValidation`, `BenchmarkCBORDecodeBlock`,
`BenchmarkNativeScriptEvaluate`, and `BenchmarkTxDecode` are executed in CI.
In `@internal/bench/BASELINES.md`:
- Around line 3-5: The BASELINES.md currently lists "Platform: linux/arm64"
which mismatches CI ("ubuntu-latest" / linux/amd64); update this by either
re-running the benchmark capture on an amd64 CI-equivalent environment and
replacing or adding an "amd64" section with new baselines, or clearly annotate
the existing section as "arm64" and add a separate "amd64" subsection with
corresponding allocation/byte/time baselines; ensure the metadata lines ("Last
Updated", "Go Version", "Platform") are updated for each platform and add a
short note describing which CI job these amd64 baselines map to.
In `@internal/bench/PROFILING.md`:
- Around line 57-83: Update the example commands in PROFILING.md that reference
non-existent packages/benchmarks: replace occurrences of "./pipeline/..." and
"./vrf/..." and the benchmark names "BenchmarkBlockPipeline" and
"BenchmarkParallelVerify" with valid targets (e.g., use "./internal/bench/..."
as the package path and an existing benchmark such as
"BenchmarkCBORDecodeBlockParallel" for the parallel/mutex examples) so the go
test examples run without "no packages loaded" or "no matching benchmarks"
errors.
In `@internal/bench/regression_test.go`:
- Around line 91-101: The init() currently panics if the Conway block isn’t
found; remove that panic and make the lookup lazy: stop failing in init() (leave
conwayBlockCbor nil if not found) and move the search for the Conway CBOR into
the helper cborBlockDecodeOnce so that it performs testdata.GetTestBlocks()
lookup on first use; if the block is still missing, have cborBlockDecodeOnce
return an error (or call testing.Skip when running inside a test) instead of
crashing, referencing conwayBlockCbor, init(), cborBlockDecodeOnce, and
testdata.GetTestBlocks() to locate the changes.
---
Duplicate comments:
In @.github/workflows/benchmark.yml:
- Around line 37-42: The workflow step "Upload benchmark results" currently
uploads a single artifact named benchmark.txt but README expects
benchmark-current.txt, benchmark-base.txt, and benchmark-comparison.txt and a
regression threshold check; update the job that produces benchmarks to output
those three files (create/rename benchmark-current.txt and benchmark-base.txt,
generate benchmark-comparison.txt by diffing/summary), then change the "Upload
benchmark results" step (actions/upload-artifact) to upload all three paths
(e.g., benchmark-current.txt, benchmark-base.txt, benchmark-comparison.txt)
using the same name pattern (benchmark-${{ github.sha }}-${{ github.run_number
}}); also add a new step before upload that enforces the regression threshold
(fail the job if the comparison shows regressions beyond the configured
threshold) and surface the failure via a non-zero exit so CI fails when the
threshold is exceeded.
In `@internal/bench/BASELINES.md`:
- Around line 185-191: The documented regression thresholds in BASELINES.md are
not enforced by the benchmark CI workflow; add a comparison step that reads the
uploaded/current benchmark.txt, computes percent deltas for Allocation Count,
Bytes Allocated, and Time (ns/op) versus the stored baseline, and fails the job
when any metric exceeds the thresholds (+20% allocs, +50% bytes, +20% time).
Implement this as a new workflow step (or action) that invokes a small
comparator script (e.g., compare_benchmarks) which accepts the current
benchmark.txt and the baseline, parses the named metrics, applies the
thresholds, and returns non-zero/prints a clear failure message referencing the
metric(s) that exceeded limits so the CI will fail the PR.
In `@internal/bench/profile_test.go`:
- Around line 189-261: The test TestProfileTxValidation uses BenchLedgerState()
but per test guidelines validation tests must use MockLedgerState; replace the
call ls := BenchLedgerState() with ls := MockLedgerState() (from
internal/test/ledger) and update the surrounding comment to reflect that this
uses the shared MockLedgerState for validation profiling (remove the incorrect
rationale about avoiding test framework dependencies); ensure any imports or
helper references in TestProfileTxValidation are adjusted to reference
MockLedgerState and its builder if needed.
In `@internal/bench/README.md`:
- Around line 428-461: Update the CI Benchmark Results README to reflect actual
artifact names and enforcement: replace references to `benchmark-current.txt`,
`benchmark-base.txt`, and `benchmark-comparison.txt` with the single
`benchmark.txt` produced by the workflow, and either remove or mark the listed
regression thresholds (allocation count >20%, bytes >50%, time ns/op >20%) as
not enforced — or add a note pointing to `.github/workflows/benchmark.yml` and
instruct to update that workflow to implement enforcement if you want them
enforced; ensure the README's “Contents” and “Regression Thresholds” sections
match the real artifact name `benchmark.txt` and the current workflow behavior.
---
Nitpick comments:
In `@internal/bench/helpers_test.go`:
- Around line 104-121: TestGetTestBlocks currently asserts a fixed length of 7
which will break when testdata adds/removes eras; update the test to avoid
hardcoding by either removing the assert.Len(t, blocks, 7) or replacing it with
a dynamic check using EraNames() (e.g., compare len(blocks) to len(EraNames())
or only assert membership as the test already does). Locate the
TestGetTestBlocks function and modify the assertion around GetTestBlocks() so
the test validates membership via the eraNames map (or uses EraNames() for
expected count) instead of a magic number.
- Around line 25-30: Add smoke tests for the exported helpers
BenchLedgerStateWithUtxos and BenchLedgerStateWithUtxoFunc similar to
TestBenchLedgerState: call each helper, assert the returned ledger state is
non-nil (require.NotNil) and validate basic expectations (e.g., NetworkId()
equals 1 and that the returned UTXO set or iterator is present/non-empty where
applicable). For BenchLedgerStateWithUtxoFunc, pass a simple UTXO-creating
function and assert the ledger contains the produced UTXOs; for
BenchLedgerStateWithUtxos, provide a small slice of UTXO fixtures and assert
they appear in the ledger. Use the existing require/assert helpers and reference
the functions by name so the tests catch API drift in those builders.
In `@internal/bench/helpers.go`:
- Around line 129-150: TxTypeFromEra currently exists without the companion
fixture types/loaders (TxFixture, LoadTxFixture, MustLoadTxFixture) that
BlockTypeFromEra has, so either implement the same fixture-loading API or mark
intent; add TxFixture type and functions LoadTxFixture and MustLoadTxFixture
mirroring the BlockFixture/LoadBlockFixture/MustLoadBlockFixture behavior (CBOR
loading, error handling) so tx_bench_test.go can reuse them, or if transaction
fixture loading is out of scope for this PR, add a clear TODO comment above
TxTypeFromEra referencing tx_bench_test.go and noting that
TxFixture/LoadTxFixture/MustLoadTxFixture will be added later.
In `@internal/bench/regression_test.go`:
- Around line 127-180: The allocation baselines in TestAllocationRegression (and
related per-era caps referenced by TestRegressionCBOR) are platform- and
Go-version-specific and should not assert strictly on non-baseline platforms;
modify TestAllocationRegression to detect the baseline platform (linux && arm64)
and/or honor testing.Short() and only perform the strict numeric assertion on
the baseline platform (or when an explicit env override
GOUROBOROS_ALLOC_THRESHOLD_MULTIPLIER is set); for other platforms, convert the
check into a t.Logf informational message (or skip) so tests don’t fail
spuriously—look for the TestAllocationRegression and TestRegressionCBOR symbols
and gate the allocation comparison/ t.Errorf path behind a platform/build check
or testing.Short() guard.
In `@scripts/profile.sh`:
- Around line 62-63: Replace the fragile hard-coded sed slice (sed -n '16,43p'
"$0") with a dedicated print_help() function: create a print_help() that outputs
the script's help text (e.g., via a heredoc or by printing commented blocks),
call print_help() where the sed invocation and exit 0 currently live, and remove
the sed-based extraction; this centralizes help content, avoids dependence on
line numbers, and ensures calls to print_help() replace the old sed -n '16,43p'
"$0" usage.
- Around line 165-174: The loop that runs components currently swallows errors
via `run_profile "$comp" || true` and `view_profiles` can silently skip missing
files; change this to capture and report failures: invoke `run_profile "$comp"`
and store its exit status, append failing component names to a failures array
(e.g., FAILED_COMPONENTS+=("$comp")) instead of `|| true`, and after the loop if
FAILED_COMPONENTS is non-empty print a concise failure summary listing those
components and exit non-zero (or return non-zero) so CI/users see the error;
also update `view_profiles` usage so when `VIEW_PROFILES` is true it either
warns about missing profile files and opens the first successful component
(e.g., choose the first element from COMPONENTS that succeeded) or skips opening
while clearly reporting which components had no profiles.
ef74249 to
caf22d1
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (8)
internal/bench/helpers.go (1)
180-199:blockCborFromEracan panic despite returning([]byte, error).
testdata.MustDecodeHexpanics on invalid hex rather than returning an error, making the function's error return contract misleading. While static testdata makes a panic unlikely in practice, callers (e.g.,LoadBlockFixture) wrap this in error handling that can never be reached for the hex-decode failure case.♻️ Suggested fix
-func blockCborFromEra(era string) ([]byte, error) { +func blockCborFromEra(era string) (_ []byte, retErr error) { + defer func() { + if r := recover(); r != nil { + retErr = fmt.Errorf("decode hex for era %s: %v", era, r) + } + }() switch strings.ToLower(era) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/helpers.go` around lines 180 - 199, The function blockCborFromEra uses testdata.MustDecodeHex which panics on bad input, violating its ([]byte, error) contract; change blockCborFromEra to perform a non-panicking decode (e.g., call a testdata.DecodeHex that returns (bytes, error) or use encoding/hex.DecodeString on testdata.ByronBlockHex, testdata.ShelleyBlockHex, etc.), capture and return any decode error instead of panicking, and update each case branch to return (decodedBytes, err) so callers (like LoadBlockFixture) can handle errors normally.internal/bench/helpers_test.go (1)
32-61:TxTypeFromEra(and theWithUtxos/WithUtxoFuncledger state helpers) have no test coverage.
TxTypeFromErainhelpers.gois exported and has the same structure asBlockTypeFromEra, but is not tested here.BenchLedgerStateWithUtxosandBenchLedgerStateWithUtxoFuncare also untested.♻️ Suggested additions
func TestTxTypeFromEra(t *testing.T) { tests := []struct { era string want uint wantErr bool }{ {"byron", ledger.TxTypeByron, false}, {"shelley", ledger.TxTypeShelley, false}, {"allegra", ledger.TxTypeAllegra, false}, {"mary", ledger.TxTypeMary, false}, {"alonzo", ledger.TxTypeAlonzo, false}, {"babbage", ledger.TxTypeBabbage, false}, {"conway", ledger.TxTypeConway, false}, {"Conway", ledger.TxTypeConway, false}, {"unknown", 0, true}, } for _, tc := range tests { t.Run(tc.era, func(t *testing.T) { txType, err := TxTypeFromEra(tc.era) if tc.wantErr { require.Error(t, err) } else { require.NoError(t, err) assert.Equal(t, tc.want, txType) } }) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/helpers_test.go` around lines 32 - 61, Add unit tests mirroring TestBlockTypeFromEra for the exported TxTypeFromEra: create TestTxTypeFromEra with cases for "byron", "shelley", "allegra", "mary", "alonzo", "babbage", "conway" (case-insensitive) and an "unknown" case asserting error; call TxTypeFromEra and assert expected uint or error. Also add small tests for BenchLedgerStateWithUtxos and BenchLedgerStateWithUtxoFunc: invoke BenchLedgerStateWithUtxos with a sample UTXO slice and assert the returned ledger state contains those UTXOs, and invoke BenchLedgerStateWithUtxoFunc with a function that produces known UTXOs and assert the ledger state reflects them (or that the provided function was called and its results are present).internal/bench/cbor_bench_test.go (1)
30-38: Consider verifying that the hardcoded hex fixtures decode to valid transactions.The hex strings are loaded in
init()viamustDecodeHex, which only validates hex encoding — not CBOR validity. If a hex string is truncated or corrupted, the error will surface later as a confusing benchmark failure. A quick validation pass ininit()(decoding each tx viaNewTransactionFromCborand discarding the result) would catch this early.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/cbor_bench_test.go` around lines 30 - 38, The hardcoded hex fixtures (byronTxCborHex, shelleyTxCborHex, etc.) are only checked for hex validity by mustDecodeHex in init(); add a validation step in init() that, after mustDecodeHex returns the bytes for each fixture, calls NewTransactionFromCbor (or the appropriate CBOR-to-transaction constructor) and handles any error (panic or log.Fatal) so corrupted/truncated CBOR is detected early; reference the existing init(), mustDecodeHex, and NewTransactionFromCbor symbols when adding this validation loop over each fixture.internal/bench/consensus_bench_test.go (1)
156-193: SharedbenchActiveSlotCoeffis safe here but fragile for future changes.
benchActiveSlotCoeff(*big.Rat) is passed toconsensus.IsSlotLeaderin the hot loop (line 184). This works because sub-benchmarks run sequentially. However, if anyone addsb.SetParallelismor refactors toRunParallel, this will race — as was already discovered inBenchmarkConsensusParallelThreshold. Consider adding a brief comment noting the sequential-only constraint, similar to the comment in the parallel benchmark at line 308.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/consensus_bench_test.go` around lines 156 - 193, Add a short comment above BenchmarkConsensusIsSlotLeader explaining that the shared benchActiveSlotCoeff (*big.Rat) is reused across sub-benchmarks and the benchmark relies on sequential execution (sub-benchmarks run one after another); note that enabling b.SetParallelism or switching to RunParallel would introduce races when passing benchActiveSlotCoeff into consensus.IsSlotLeader, and reference the similar comment style used in BenchmarkConsensusParallelThreshold to justify keeping the current non-parallel structure.internal/bench/block_bench_test.go (2)
147-171: Misleading comment:loadTestBlocksloads all blocks including Byron.Line 154 says "Skip Byron for VRF/KES benchmarks as it uses different validation", but the function actually loads all blocks. Byron filtering happens later in
getPostByronBlocks. The comment should describe what this function does, not what callers do.📝 Suggested comment fix
for _, tb := range testBlocks { - // Skip Byron for VRF/KES benchmarks as it uses different validation + // Parse each test block with bench-focused validation config block, err := ledger.NewBlockFromCbor(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/block_bench_test.go` around lines 147 - 171, Update the misleading function comment for loadTestBlocks to accurately state its behavior: it loads all test blocks (including Byron) and panics on load error; do not mention skipping Byron here—leave any filtering to getPostByronBlocks (or callers) as appropriate. Locate the comment above the loadTestBlocks function and replace the current “Skip Byron…” text with a concise description that this function returns all test blocks from testdata and panics if any block fails to load, referencing loadTestBlocks and getPostByronBlocks for clarity if needed.
289-361:BenchmarkBlockVRFVerificationdiscards VRF verification result without checking.Line 357:
_, _ = vrf.Verify(...)— the comment (lines 343-346) explains that verification will fail due to dummy eta0. However, the benchmark is measuring successful crypto path cost. Ifvrf.Verifyshort-circuits on some validation failure (e.g., early return on proof format check), you may be measuring less work than intended. Consider asserting in the setup (outside the loop) thatVerifyreturns a non-error to confirm the crypto path executes fully, even ifvalidis false.💡 Suggested pre-validation check
for _, vd := range vrfDataList { b.Run("Era_"+vd.name, func(b *testing.B) { // Pre-compute the VRF input message vrfMsg := vrf.MkInputVrf(int64(vd.slot), eta0) + // Verify the crypto path runs without error (result may be false + // due to dummy eta0, but the full computation should execute) + _, err := vrf.Verify(vd.vrfKey, vd.proof, vd.output, vrfMsg) + if err != nil { + b.Skipf("VRF Verify errors for %s (may lack matching nonce): %v", vd.name, err) + } b.ReportAllocs() b.ResetTimer()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/block_bench_test.go` around lines 289 - 361, BenchmarkBlockVRFVerification currently ignores the result of vrf.Verify inside the loop which can mask early short-circuits; add a pre-check per vrfData entry (using vrf.MkInputVrf with eta0 and the same vd.slot) before b.ResetTimer() to call vrf.Verify(vd.vrfKey, vd.proof, vd.output, vrfMsg) once and assert the call returns no error (fail the sub-benchmark with b.Fatalf or b.Fatalf-like check) so you guarantee the full cryptographic verification path is executed during benchmarking; keep this check outside the timed loop that iterates b.N.internal/bench/regression_test.go (1)
464-518:TestRegressionCBORusestestdata.GetTestBlocks()directly instead of the package helper.The
helpers.gofile (line 202-204) wraps this asGetTestBlocks(). Other code in this file (e.g.,init()at line 92) also callstestdata.GetTestBlocks()directly, so this is at least internally consistent — but using the wrapper would improve indirection if the data source ever changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/regression_test.go` around lines 464 - 518, TestRegressionCBOR calls testdata.GetTestBlocks() directly instead of the package wrapper GetTestBlocks() defined in helpers.go; replace the direct call in TestRegressionCBOR with the helper GetTestBlocks() (and update any other direct uses of testdata.GetTestBlocks() in this file such as the init() usage) so the code uses the indirection provided by the helper. Locate the TestRegressionCBOR function and change the block source to call GetTestBlocks() from the local package helper (not testdata.GetTestBlocks()), ensuring imports remain correct and tests still pass.internal/bench/tx_bench_test.go (1)
28-37: Silentcontinueon empty fixtures could mask broken test data.When
len(txs) == 0orlen(txCbor) == 0, the benchmark for that era is silently skipped with no log output. If a fixture regresses to having zero transactions, you won't notice. Consider addingb.Logf("skipping %s: no transactions in fixture", era)beforecontinue.💡 Example
txs := fixture.Block.Transactions() if len(txs) == 0 { + b.Logf("skipping %s: no transactions in fixture", era) continue } txCbor := txs[0].Cbor() if len(txCbor) == 0 { + b.Logf("skipping %s: empty tx CBOR", era) continue }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/tx_bench_test.go` around lines 28 - 37, The loop over eras using PostByronEraNames() silently skips fixtures with no transactions or empty CBOR; update the checks around MustLoadBlockFixture(...).Block.Transactions() and txs[0].Cbor() to log the reason before continuing (e.g., call b.Logf("skipping %s: no transactions in fixture", era) and b.Logf("skipping %s: empty tx cbor", era)) so missing or empty fixtures are visible during test runs; place these b.Logf calls immediately before each existing continue to preserve behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/bench/profile_test.go`:
- Around line 267-278: The test calls common.VerifyTransaction(tx, slot, ls,
nil, rules) passing a nil ProtocolParameters which can panic if rule ordering
changes; replace the nil with a safe zero-value/stub ProtocolParameters instance
(e.g., create a local zeroProtocolParameters or newProtocolParametersStub that
implements ProtocolParameters with safe defaults) and pass that into
VerifyTransaction (used in the initial err check and inside the hot loop that
iterates profileIterations using tx, slot, ls, rules) so validation rules that
reference protocol params do not dereference nil.
In `@internal/bench/PROFILING.md`:
- Around line 143-147: The example command uses a non-existent benchmark name
`BenchmarkVerifyBlock`; update the `go test -bench=...` example to reference an
actual benchmark such as `BenchmarkBlockValidation` or one of the block
serialization benchmarks (e.g. the ledger block serialization benchmark names)
so the command runs; replace `BenchmarkVerifyBlock` with
`BenchmarkBlockValidation` (or another real benchmark symbol) in the example and
keep the rest of the `-cpuprofile=cpu.prof ./...` invocation the same.
In `@internal/bench/testdata/README.md`:
- Around line 7-11: Update the three fenced code blocks in README.md (the
fixtures listing at the top and the two other code blocks around lines 38 and
50) to include a language specifier by changing the opening backticks to
```text; e.g., replace ``` with ```text for the blocks that show the testdata
directory and the two other examples so markdownlint MD040 is satisfied and
syntax highlighting is explicit.
In `@scripts/profile.sh`:
- Around line 199-201: The echo lines printing pprof commands currently escape
the COMPONENT variable (echo " go tool pprof -top
$OUTPUT_DIR/cpu_\${COMPONENT}.prof" and mem) so users see a literal
${COMPONENT}; change them to allow shell expansion by removing the backslash
before $COMPONENT so the examples use the actual COMPONENT value at runtime
(keep OUTPUT_DIR expansion intact and ensure this behavior aligns with the
existing all-case handling around the if [[ "$COMPONENT" == "all" ]] block).
---
Duplicate comments:
In `@internal/bench/BASELINES.md`:
- Around line 3-5: The baselines file lists Platform: linux/arm64 but CI uses
ubuntu-latest (linux/amd64), so update BASELINES.md to match CI or add
architecture-specific baselines; specifically edit the "Platform" line in
BASELINES.md to "linux/amd64" (or add separate entries/files for linux/amd64 and
linux/arm64 with clear labels) and regenerate or note which baseline set CI
should use so allocation/timing comparisons align; ensure the "Last Updated" is
refreshed when you make the change.
In `@internal/bench/profile_test.go`:
- Around line 213-217: The test is using BenchLedgerState() where the project's
test-file guideline requires using the canonical MockLedgerState from
internal/test/ledger/ledger.go for validation tests; replace the
BenchLedgerState() call with the MockLedgerState() helper (and any required
setup/teardown it expects) so the test uses the shared mock implementation,
update any surrounding comments to reflect MockLedgerState, and adjust imports
or initialization in the test function to match MockLedgerState's API.
In `@internal/bench/PROFILING.md`:
- Around line 57-83: The examples reference non-existent benchmarks and package
globs (BenchmarkBlockPipeline, BenchmarkParallelVerify, ./pipeline/...,
./vrf/..., trace.out) which will error; update the commands to point to real
packages/benchmarks or use generic patterns: replace ./pipeline/... and
./vrf/... with either ./... or the actual existing package paths in the repo,
and replace the concrete benchmark names with a valid benchmark selector (for
example -bench=. or the actual benchmark function name present) so the go test
commands run successfully; also standardize the trace/profile output filenames
(e.g., cpu.prof, block.prof, mutex.prof, trace.out) if needed and ensure the
example show viewing the trace with go tool trace trace.out.
In `@internal/bench/README.md`:
- Around line 437-456: Update the README's "Contents" and "Regression
Thresholds" to match the actual behavior defined in the benchmark workflow:
inspect the bench workflow definition (the job/step names and artifact names in
benchmark.yml) and replace the listed artifact filenames
(`benchmark-current.txt`, `benchmark-base.txt`, `benchmark-comparison.txt`) and
numeric thresholds (allocation count 20%, bytes 50%, time 20%) with the exact
artifact names and regression rules enforced by the workflow; ensure the
README's "Accessing Artifacts" instructions point to the correct job/run name
used by the workflow and mirror any differences in how base/current comparisons
are produced so the README and the workflow (benchmark.yml) are consistent.
In `@internal/bench/regression_test.go`:
- Around line 91-101: The current init() panics if the "Conway" block is
missing; instead remove the panic and move the lookup into the lazy loader used
by cborBlockDecodeOnce (or wrap it with a sync.Once inside cborBlockDecodeOnce).
Concretely: stop searching for "Conway" in init(), ensure conwayBlockCbor is
looked up on first use inside the function/method named cborBlockDecodeOnce (or
its helper), protect the lookup with sync.Once so it runs once, and return a
clear error from cborBlockDecodeOnce (or the test helper) when the block cannot
be found rather than panicking the whole package.
---
Nitpick comments:
In `@internal/bench/block_bench_test.go`:
- Around line 147-171: Update the misleading function comment for loadTestBlocks
to accurately state its behavior: it loads all test blocks (including Byron) and
panics on load error; do not mention skipping Byron here—leave any filtering to
getPostByronBlocks (or callers) as appropriate. Locate the comment above the
loadTestBlocks function and replace the current “Skip Byron…” text with a
concise description that this function returns all test blocks from testdata and
panics if any block fails to load, referencing loadTestBlocks and
getPostByronBlocks for clarity if needed.
- Around line 289-361: BenchmarkBlockVRFVerification currently ignores the
result of vrf.Verify inside the loop which can mask early short-circuits; add a
pre-check per vrfData entry (using vrf.MkInputVrf with eta0 and the same
vd.slot) before b.ResetTimer() to call vrf.Verify(vd.vrfKey, vd.proof,
vd.output, vrfMsg) once and assert the call returns no error (fail the
sub-benchmark with b.Fatalf or b.Fatalf-like check) so you guarantee the full
cryptographic verification path is executed during benchmarking; keep this check
outside the timed loop that iterates b.N.
In `@internal/bench/cbor_bench_test.go`:
- Around line 30-38: The hardcoded hex fixtures (byronTxCborHex,
shelleyTxCborHex, etc.) are only checked for hex validity by mustDecodeHex in
init(); add a validation step in init() that, after mustDecodeHex returns the
bytes for each fixture, calls NewTransactionFromCbor (or the appropriate
CBOR-to-transaction constructor) and handles any error (panic or log.Fatal) so
corrupted/truncated CBOR is detected early; reference the existing init(),
mustDecodeHex, and NewTransactionFromCbor symbols when adding this validation
loop over each fixture.
In `@internal/bench/consensus_bench_test.go`:
- Around line 156-193: Add a short comment above BenchmarkConsensusIsSlotLeader
explaining that the shared benchActiveSlotCoeff (*big.Rat) is reused across
sub-benchmarks and the benchmark relies on sequential execution (sub-benchmarks
run one after another); note that enabling b.SetParallelism or switching to
RunParallel would introduce races when passing benchActiveSlotCoeff into
consensus.IsSlotLeader, and reference the similar comment style used in
BenchmarkConsensusParallelThreshold to justify keeping the current non-parallel
structure.
In `@internal/bench/helpers_test.go`:
- Around line 32-61: Add unit tests mirroring TestBlockTypeFromEra for the
exported TxTypeFromEra: create TestTxTypeFromEra with cases for "byron",
"shelley", "allegra", "mary", "alonzo", "babbage", "conway" (case-insensitive)
and an "unknown" case asserting error; call TxTypeFromEra and assert expected
uint or error. Also add small tests for BenchLedgerStateWithUtxos and
BenchLedgerStateWithUtxoFunc: invoke BenchLedgerStateWithUtxos with a sample
UTXO slice and assert the returned ledger state contains those UTXOs, and invoke
BenchLedgerStateWithUtxoFunc with a function that produces known UTXOs and
assert the ledger state reflects them (or that the provided function was called
and its results are present).
In `@internal/bench/helpers.go`:
- Around line 180-199: The function blockCborFromEra uses testdata.MustDecodeHex
which panics on bad input, violating its ([]byte, error) contract; change
blockCborFromEra to perform a non-panicking decode (e.g., call a
testdata.DecodeHex that returns (bytes, error) or use encoding/hex.DecodeString
on testdata.ByronBlockHex, testdata.ShelleyBlockHex, etc.), capture and return
any decode error instead of panicking, and update each case branch to return
(decodedBytes, err) so callers (like LoadBlockFixture) can handle errors
normally.
In `@internal/bench/regression_test.go`:
- Around line 464-518: TestRegressionCBOR calls testdata.GetTestBlocks()
directly instead of the package wrapper GetTestBlocks() defined in helpers.go;
replace the direct call in TestRegressionCBOR with the helper GetTestBlocks()
(and update any other direct uses of testdata.GetTestBlocks() in this file such
as the init() usage) so the code uses the indirection provided by the helper.
Locate the TestRegressionCBOR function and change the block source to call
GetTestBlocks() from the local package helper (not testdata.GetTestBlocks()),
ensuring imports remain correct and tests still pass.
In `@internal/bench/tx_bench_test.go`:
- Around line 28-37: The loop over eras using PostByronEraNames() silently skips
fixtures with no transactions or empty CBOR; update the checks around
MustLoadBlockFixture(...).Block.Transactions() and txs[0].Cbor() to log the
reason before continuing (e.g., call b.Logf("skipping %s: no transactions in
fixture", era) and b.Logf("skipping %s: empty tx cbor", era)) so missing or
empty fixtures are visible during test runs; place these b.Logf calls
immediately before each existing continue to preserve behavior.
caf22d1 to
a9daa92
Compare
|
@cubic-dev-ai review |
@wolf31o2 I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
3 issues found across 15 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="internal/bench/consensus_bench_test.go">
<violation number="1" location="internal/bench/consensus_bench_test.go:86">
P3: Reset the benchmark timer after per-run setup so results measure only the operation under test (per internal/bench/README best practices).</violation>
</file>
<file name="scripts/profile.sh">
<violation number="1" location="scripts/profile.sh:200">
P3: When running with `component=all`, the script prints text-based analysis commands for `cpu_all.prof`/`mem_all.prof`, which are never generated. This misleads users after an `all` run. Consider substituting a concrete component (e.g., block) or clarifying the placeholder in these lines.</violation>
</file>
<file name=".github/workflows/benchmark.yml">
<violation number="1" location=".github/workflows/benchmark.yml:35">
P2: The pipeline exits with `tee`’s status, so `go test` failures won’t fail the workflow. Capture the pipeline status or enable pipefail to ensure benchmark failures stop the job.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| ./cbor/... \ | ||
| ./pipeline/... \ | ||
| ./ledger/... \ | ||
| 2>&1 | tee benchmark.txt |
There was a problem hiding this comment.
P2: The pipeline exits with tee’s status, so go test failures won’t fail the workflow. Capture the pipeline status or enable pipefail to ensure benchmark failures stop the job.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/benchmark.yml, line 35:
<comment>The pipeline exits with `tee`’s status, so `go test` failures won’t fail the workflow. Capture the pipeline status or enable pipefail to ensure benchmark failures stop the job.</comment>
<file context>
@@ -0,0 +1,42 @@
+ ./cbor/... \
+ ./pipeline/... \
+ ./ledger/... \
+ 2>&1 | tee benchmark.txt
+
+ - name: Upload benchmark results
</file context>
| for _, s := range stakes { | ||
| b.Run(s.name, func(b *testing.B) { | ||
| activeSlotCoeff := big.NewRat(1, 20) // fresh per sub-benchmark | ||
| b.ReportAllocs() |
There was a problem hiding this comment.
P3: Reset the benchmark timer after per-run setup so results measure only the operation under test (per internal/bench/README best practices).
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/consensus_bench_test.go, line 86:
<comment>Reset the benchmark timer after per-run setup so results measure only the operation under test (per internal/bench/README best practices).</comment>
<file context>
@@ -0,0 +1,324 @@
+ for _, s := range stakes {
+ b.Run(s.name, func(b *testing.B) {
+ activeSlotCoeff := big.NewRat(1, 20) // fresh per sub-benchmark
+ b.ReportAllocs()
+ for i := 0; i < b.N; i++ {
+ consensusSink = consensus.CertifiedNatThreshold(
</file context>
scripts/profile.sh
Outdated
| fi | ||
| echo "" | ||
| echo "For text-based analysis:" | ||
| echo " go tool pprof -top $OUTPUT_DIR/cpu_${COMPONENT}.prof" |
There was a problem hiding this comment.
P3: When running with component=all, the script prints text-based analysis commands for cpu_all.prof/mem_all.prof, which are never generated. This misleads users after an all run. Consider substituting a concrete component (e.g., block) or clarifying the placeholder in these lines.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At scripts/profile.sh, line 200:
<comment>When running with `component=all`, the script prints text-based analysis commands for `cpu_all.prof`/`mem_all.prof`, which are never generated. This misleads users after an `all` run. Consider substituting a concrete component (e.g., block) or clarifying the placeholder in these lines.</comment>
<file context>
@@ -0,0 +1,204 @@
+fi
+echo ""
+echo "For text-based analysis:"
+echo " go tool pprof -top $OUTPUT_DIR/cpu_${COMPONENT}.prof"
+echo " go tool pprof -top $OUTPUT_DIR/mem_${COMPONENT}.prof"
+echo ""
</file context>
a9daa92 to
4d70dd3
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (6)
.github/workflows/benchmark.yml (2)
29-36:⚠️ Potential issue | 🟠 Major
./internal/bench/...is still absent — CI never runs any of the benchmarks introduced by this PR.All new benchmarks (
BenchmarkBlockValidation,BenchmarkCBORDecodeBlock,BenchmarkNativeScriptEvaluate,BenchmarkTxDecode, etc.) live in./internal/bench/..., which is missing from the package list. Every benchmark this PR introduces is silently skipped in CI.🐛 Proposed fix
run: | set -o pipefail go test -bench=. -benchmem -count=5 -timeout=30m \ ./vrf/... \ ./kes/... \ ./consensus/... \ ./cbor/... \ ./pipeline/... \ ./ledger/... \ + ./internal/bench/... \ 2>&1 | tee benchmark.txt🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/benchmark.yml around lines 29 - 36, The workflow's go test package list omits the new benchmark packages so CI never runs the new benchmarks; update the go test invocation in the benchmark job (the multi-line command that starts with "go test -bench=. -benchmem -count=5 -timeout=30m") to include ./internal/bench/... alongside the other packages (so that benchmarks such as BenchmarkBlockValidation, BenchmarkCBORDecodeBlock, BenchmarkNativeScriptEvaluate, BenchmarkTxDecode are executed), then keep piping output to tee as before.
29-36:⚠️ Potential issue | 🟠 Major
./internal/bench/...is still missing from the CI benchmark run.All benchmarks introduced in this PR (
BenchmarkBlockValidation,BenchmarkCBORDecodeBlock,BenchmarkNativeScriptEvaluate,BenchmarkTxDecode, etc.) live in./internal/bench/..., which is absent from the package list. CI silently skips every benchmark this PR adds.🐛 Proposed fix
run: | set -o pipefail go test -bench=. -benchmem -count=5 -timeout=30m \ ./vrf/... \ ./kes/... \ ./consensus/... \ ./cbor/... \ ./pipeline/... \ ./ledger/... \ + ./internal/bench/... \ 2>&1 | tee benchmark.txt🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/benchmark.yml around lines 29 - 36, The CI benchmark job is missing the package that contains the new benchmarks; add the missing package glob ./internal/bench/... to the list of packages passed to the go test command (alongside ./vrf/..., ./kes/..., etc.) so the new benchmarks (e.g., BenchmarkBlockValidation, BenchmarkCBORDecodeBlock, BenchmarkNativeScriptEvaluate, BenchmarkTxDecode) are executed and recorded in benchmark.txt.internal/bench/PROFILING.md (2)
57-82:⚠️ Potential issue | 🟡 MinorBlock/mutex/trace profile examples still reference non-existent package paths and benchmarks.
Lines 62, 70, and 78 all reference
./pipeline/...(which may not exist) and benchmark namesBenchmarkBlockPipeline/BenchmarkParallelVerifythat are likely absent. Users following these examples will getno packages loadedorno matching benchmarkserrors.Consider replacing with real targets — e.g.,
./internal/bench/...andBenchmarkCBORDecodeBlockParallelfor the parallel/mutex case.💡 Suggested replacements
-go test -bench=BenchmarkBlockPipeline -blockprofile=block.prof ./pipeline/... +go test -bench=BenchmarkConsensusParallelThreshold -blockprofile=block.prof ./internal/bench/...-go test -bench=BenchmarkParallelVerify -mutexprofile=mutex.prof ./vrf/... +go test -bench=BenchmarkCBORDecodeBlockParallel -mutexprofile=mutex.prof ./internal/bench/...-go test -bench=BenchmarkBlockPipeline -trace=trace.out ./pipeline/... +go test -bench=BenchmarkCBORDecodeBlockParallel -trace=trace.out ./internal/bench/...#!/bin/bash # Verify whether ./pipeline exists and whether BenchmarkBlockPipeline / BenchmarkParallelVerify exist echo "=== Checking for pipeline package ===" fd -t d "pipeline" --max-depth 2 echo "" echo "=== Checking for BenchmarkBlockPipeline ===" rg -rn "func BenchmarkBlockPipeline" --type=go echo "" echo "=== Checking for BenchmarkParallelVerify ===" rg -rn "func BenchmarkParallelVerify" --type=go🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/PROFILING.md` around lines 57 - 82, Update the example commands in PROFILING.md to point at actual packages and benchmarks: replace occurrences of "./pipeline/..." with "./internal/bench/..." and swap the non-existent benchmark names (BenchmarkBlockPipeline, BenchmarkParallelVerify) for real benchmarks such as BenchmarkCBORDecodeBlockParallel (use this for the parallel/mutex examples) and the corresponding block-profile benchmark (e.g., BenchmarkCBORDecodeBlock or another existing block-related benchmark in internal/bench) so the examples run without "no packages loaded" or "no matching benchmarks" errors; locate and change the three example blocks that reference BenchmarkBlockPipeline/BenchmarkParallelVerify and ./pipeline/... accordingly.
57-82:⚠️ Potential issue | 🟡 MinorBlock/mutex/trace examples still reference non-existent
./pipeline/...and benchmark names.Lines 62, 70, and 78 all reference
./pipeline/...andBenchmarkBlockPipeline/BenchmarkParallelVerify. These paths/benchmarks likely don't exist, so users following these examples will getno packages loadedor silent no-match errors.💡 Suggested replacements
-go test -bench=BenchmarkBlockPipeline -blockprofile=block.prof ./pipeline/... +go test -bench=BenchmarkConsensusParallelThreshold -blockprofile=block.prof ./internal/bench/...-go test -bench=BenchmarkParallelVerify -mutexprofile=mutex.prof ./vrf/... +go test -bench=BenchmarkCBORDecodeBlockParallel -mutexprofile=mutex.prof ./internal/bench/...-go test -bench=BenchmarkBlockPipeline -trace=trace.out ./pipeline/... +go test -bench=BenchmarkCBORDecodeBlockParallel -trace=trace.out ./internal/bench/...#!/bin/bash # Verify pipeline package existence and non-existent benchmark names echo "=== pipeline package? ===" fd -t d "^pipeline$" --max-depth 3 echo "" echo "=== BenchmarkBlockPipeline ===" rg -rn "func BenchmarkBlockPipeline" --type=go echo "" echo "=== BenchmarkParallelVerify ===" rg -rn "func BenchmarkParallelVerify" --type=go🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/PROFILING.md` around lines 57 - 82, The examples reference non-existent packages/benchmarks (BenchmarkBlockPipeline, BenchmarkParallelVerify and ./pipeline/..././vrf...), so update the three commands to use either real packages/benchmarks or a generic safe form; replace ./pipeline/... and ./vrf/... with ./... (or specific existing package directories discovered via fd/rg) and swap the explicit BenchmarkBlockPipeline/BenchmarkParallelVerify names for either an actual benchmark found in the repo or a generic pattern such as -bench=. to ensure the commands run without "no packages loaded" or "no matching benchmarks" errors; update all occurrences in the Block Profile, Mutex Profile and Trace examples accordingly (look for the symbols BenchmarkBlockPipeline, BenchmarkParallelVerify, ./pipeline/... and ./vrf... to locate the lines to change).internal/bench/block_bench_test.go (1)
374-381: LGTM — uint64 underflow guard correctly implemented.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/block_bench_test.go` around lines 374 - 381, The uint64 underflow guard in the test (the check comparing currentKesPeriod and kd.kesPeriod that calls b.Skipf and returns) is correct; no code change required—keep the existing conditional that skips when currentKesPeriod < kd.kesPeriod (variables currentKesPeriod and kd.kesPeriod) as implemented.internal/bench/regression_test.go (1)
91-101:⚠️ Potential issue | 🟡 Minor
init()panic on missing Conway block will crash the entirebenchpackage — lazily load instead.If
testdata.GetTestBlocks()ever omits or renames the "Conway" entry, every test and benchmark in this package fails at startup. Move the lookup intocborBlockDecodeOnceso only the CBOR regression tests are affected.🛡️ Proposed fix
- // Initialize block test data from testdata package - blocks := testdata.GetTestBlocks() - for _, block := range blocks { - if block.Name == "Conway" { - conwayBlockCbor = block.Cbor - break - } - } - if conwayBlockCbor == nil { - panic("failed to find Conway block in testdata") - }func cborBlockDecodeOnce() any { + if conwayBlockCbor == nil { + for _, block := range testdata.GetTestBlocks() { + if block.Name == "Conway" { + conwayBlockCbor = block.Cbor + break + } + } + } + if conwayBlockCbor == nil { + panic("Conway block not found in testdata") + } block, err := ledger.NewBlockFromCbor( ledger.BlockTypeConway, conwayBlockCbor, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/regression_test.go` around lines 91 - 101, The init() currently panics if testdata.GetTestBlocks() lacks the "Conway" block; move that lookup into the existing cborBlockDecodeOnce so the lookup is done lazily only when cborBlockDecodeOnce runs. Remove the panic from init(), and inside the cborBlockDecodeOnce.Do(...) body perform the loop over testdata.GetTestBlocks(), set conwayBlockCbor when Name == "Conway", and handle a missing block by returning/setting an error or nil only for the CBOR regression path (no package-wide panic). Ensure the lookup is protected by the same sync.Once (cborBlockDecodeOnce) and update any callers to handle a missing conwayBlockCbor gracefully.
🧹 Nitpick comments (8)
scripts/profile.sh (2)
183-207: Post-run instructions unconditionally print both CPU and memory profile paths regardless ofPROFILE_TYPE.When run with
cpuormemonly, one of the two echoed paths won't exist on disk, which is confusing.💡 Suggested fix for the `else` block (lines 194–197); apply same pattern to lines 203–207
else echo "To analyze profiles manually:" - echo " go tool pprof -http=localhost:8080 $OUTPUT_DIR/cpu_${COMPONENT}.prof" - echo " go tool pprof -http=localhost:8081 $OUTPUT_DIR/mem_${COMPONENT}.prof" + if [[ "$PROFILE_TYPE" == "cpu" || "$PROFILE_TYPE" == "both" ]]; then + echo " go tool pprof -http=localhost:8080 $OUTPUT_DIR/cpu_${COMPONENT}.prof" + fi + if [[ "$PROFILE_TYPE" == "mem" || "$PROFILE_TYPE" == "both" ]]; then + echo " go tool pprof -http=localhost:8081 $OUTPUT_DIR/mem_${COMPONENT}.prof" + fi fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/profile.sh` around lines 183 - 207, The post-run help prints both CPU and memory profile paths even when PROFILE_TYPE is set to "cpu" or "mem"; update the echo logic to conditionally print only the relevant profile(s) based on the PROFILE_TYPE variable: when PROFILE_TYPE == "cpu" print only cpu_${COMPONENT}.prof, when "mem" print only mem_${COMPONENT}.prof, and when "all" or empty keep printing both. Change both the interactive example block (uses go tool pprof -http... $OUTPUT_DIR/cpu_${COMPONENT}.prof / mem_${COMPONENT}.prof) and the text-based analysis block (uses go tool pprof -top ...) to follow this same conditional branching, referencing the existing COMPONENT, PROFILE_TYPE and OUTPUT_DIR variables.
183-207: Post-run instructions unconditionally show both CPU and memory paths regardless ofPROFILE_TYPE.When the user runs with
-cpuonly or-memonly, the closing instructions always echo bothcpu_*.profandmem_*.profpaths — one of which won't exist. This is a minor UX issue.💡 Suggested fix
else echo "To analyze profiles manually:" - echo " go tool pprof -http=localhost:8080 $OUTPUT_DIR/cpu_${COMPONENT}.prof" - echo " go tool pprof -http=localhost:8081 $OUTPUT_DIR/mem_${COMPONENT}.prof" + if [[ "$PROFILE_TYPE" == "cpu" || "$PROFILE_TYPE" == "both" ]]; then + echo " go tool pprof -http=localhost:8080 $OUTPUT_DIR/cpu_${COMPONENT}.prof" + fi + if [[ "$PROFILE_TYPE" == "mem" || "$PROFILE_TYPE" == "both" ]]; then + echo " go tool pprof -http=localhost:8081 $OUTPUT_DIR/mem_${COMPONENT}.prof" + fi fiApply the same pattern to the text-based analysis block on lines 203–207.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/profile.sh` around lines 183 - 207, The post-run help always prints both CPU and memory profile paths even when PROFILE_TYPE is set to "cpu" or "mem"; update the echo blocks that print manual and text-based analysis instructions to respect PROFILE_TYPE: when PROFILE_TYPE == "cpu" only print cpu_${COMPONENT}.prof (using OUTPUT_DIR), when PROFILE_TYPE == "mem" only print mem_${COMPONENT}.prof, and when PROFILE_TYPE == "all" keep the existing dual-path outputs (handle both the "all" branch and the else branch logic for the text-based analysis block as well as the earlier manual-analysis block).internal/bench/profile_test.go (2)
71-199: Consider replacingt.Fatalfwithrequire.NoErrorfor unexpected-error paths.All
t.Fatalf(...)calls in setup paths are semantically equivalent torequire.NoError(t, err, "..."). Adoptingtestifyhere aligns with the project guideline. As per coding guidelines:**/*_test.go: "Usetestifyassertions (e.g.,assert.NoError(),assert.Equal()) for Go tests".The deliberate
t.Logfcalls for expected errors (dummy VRF nonce, missing UTXOs) are fine as-is.Also applies to: 210-283, 293-318, 328-401, 412-439, 450-477, 487-556
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/profile_test.go` around lines 71 - 199, Replace unexpected-error t.Fatalf usage in TestProfileBlockValidation setup with testify's require.NoError to follow project test guidelines: in the pre-decode loop where ledger.NewBlockFromCbor is called (and any other setup spots flagged), call require.NoError(t, err, "failed to pre-decode %s block", block.Name) (or equivalent) and import "github.com/stretchr/testify/require" at the top of the test file; keep the existing t.Logf calls for expected errors in VerifyBlock hot loops untouched.
71-199: Consider usingtestifyassertions for unexpected-error fatal paths.The profiling tests use
t.Fatalf(...)for unexpected errors (setup failures). Per the coding guidelines, Go test files should usetestifyassertions. The expected-error paths that uset.Logfare fine as-is, but setup failures could userequire.NoError(t, err, "...")for consistency.This is a low-priority change since the semantics are equivalent, but it aligns the file with the project's testing conventions. Based on coding guidelines:
**/*_test.go: "Usetestifyassertions (e.g.,assert.NoError(),assert.Equal()) for Go tests".Also applies to: 210-283, 293-318, 328-401, 412-439, 450-477, 487-556
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/profile_test.go` around lines 71 - 199, The test uses t.Fatalf for setup failures (e.g., in TestProfileBlockValidation when pre-decoding blocks via ledger.NewBlockFromCbor) which should be replaced with testify's require.NoError to follow project testing conventions; update the pre-decode error check to use require.NoError(t, err, "failed to pre-decode %s block", block.Name) and similarly replace other t.Fatalf occurrences in this file (search for t.Fatalf and setup failure checks around NewBlockFromCbor, any test setup sections, and the various TestProfile* blocks) with require.NoError assertions, ensuring you import github.com/stretchr/testify/require at the top of the test file.internal/bench/cbor_bench_test.go (1)
109-115:mustDecodeHexduplicatestestdata.MustDecodeHex— use the existing helper.
testdatais already imported at line 22 and providestestdata.MustDecodeHex. The local function can be removed and call sites updated.♻️ Proposed fix
-func mustDecodeHex(s string) []byte { - b, err := hex.DecodeString(s) - if err != nil { - panic("failed to decode hex: " + err.Error()) - } - return b -}Replace all
mustDecodeHex(...)call sites ininit()withtestdata.MustDecodeHex(...), and remove the"encoding/hex"import.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/cbor_bench_test.go` around lines 109 - 115, Remove the duplicate mustDecodeHex function and replace its usages in init() with the existing testdata.MustDecodeHex helper; update all call sites from mustDecodeHex(...) to testdata.MustDecodeHex(...), delete the now-unused mustDecodeHex definition, and remove the "encoding/hex" import so the file only relies on the imported testdata package.internal/bench/block_bench_test.go (1)
356-358: Consider assigning the VRF result tobenchSinkto guard against future inlining.Currently both return values are fully discarded. While Go won't eliminate a complex crypto call today, pinning the result to
benchSinkis a cheap, forward-safe hedge that makes the intent explicit.♻️ Proposed change
- _, _ = vrf.Verify(vd.vrfKey, vd.proof, vd.output, vrfMsg) + ok, err := vrf.Verify(vd.vrfKey, vd.proof, vd.output, vrfMsg) + benchSink = ok + _ = err🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/block_bench_test.go` around lines 356 - 358, Assign the VRF result to the package-level benchSink to prevent the compiler from inlining away the call: inside the loop that currently calls vrf.Verify(vd.vrfKey, vd.proof, vd.output, vrfMsg) and discards both return values, capture at least one returned value (e.g., the output or error) into benchSink (or set benchSink = someValueDerivedFrom(the returned value)) so the call is observable; update the loop using the existing vrf.Verify call site and variables vd.vrfKey, vd.proof, vd.output, vrfMsg to store the result rather than discarding it.internal/bench/helpers.go (2)
218-220:GetTestBlocksis a zero-value wrapper — consider removing it.The function is a pure delegation with no additional logic;
block_bench_test.goalready callstestdata.GetTestBlocks()directly (line 453), making this wrapper inconsistently used and unnecessary.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/helpers.go` around lines 218 - 220, Remove the redundant wrapper GetTestBlocks in helpers.go and update call sites to use testdata.GetTestBlocks() directly; specifically delete the GetTestBlocks function and replace any references to helpers.GetTestBlocks() (e.g., in block_bench_test.go) with testdata.GetTestBlocks(), and then run tests/build to ensure no unused imports or compilation errors remain.
124-143: Three parallel era-switch functions share identical case sets — consolidate into a lookup table.
BlockTypeFromEra,TxTypeFromEra, andblockCborFromEraall switch over the same seven era strings. Adding a new era (e.g., Leios) requires updating all three independently.♻️ Suggested refactor sketch
+type eraInfo struct { + BlockType uint + TxType uint + CborFn func() []byte +} + +var eraRegistry = map[string]eraInfo{ + "byron": {ledger.BlockTypeByronMain, ledger.TxTypeByron, func() []byte { return testdata.MustDecodeHex(testdata.ByronBlockHex) }}, + "shelley": {ledger.BlockTypeShelley, ledger.TxTypeShelley, func() []byte { return testdata.MustDecodeHex(testdata.ShelleyBlockHex) }}, + // ... remaining eras +} + +func BlockTypeFromEra(era string) (uint, error) { + info, ok := eraRegistry[strings.ToLower(era)] + if !ok { + return 0, fmt.Errorf("unknown era: %s", era) + } + return info.BlockType, nil +}Also applies to: 147-166, 196-215
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/helpers.go` around lines 124 - 143, Multiple functions (BlockTypeFromEra, TxTypeFromEra, blockCborFromEra) duplicate the same era switch; replace the repeated switches with a single shared lookup map keyed by strings.ToLower(era) (e.g., var eraToBlockType map[string]uint / eraToTxType map[string]uint / eraToCbor map[string]SomeType or a single composite map) defined at package scope, then have each function simply lookup the era, return the mapped value or the same fmt.Errorf("unknown era: %s", era) on miss; update all three functions to use the shared map(s) so adding a new era requires a single map change and preserve current error behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/bench/helpers.go`:
- Line 93: The local variable named `cbor` in LoadBlockFixture shadows the
imported package `cbor`; rename the local variable (e.g., to `cborBytes` or
`blockCbor`) wherever it is assigned from blockCborFromEra and where it is
subsequently used inside LoadBlockFixture to avoid shadowing; update all usages
in that function to the new name and keep the call to blockCborFromEra
unchanged.
In `@internal/bench/profile_test.go`:
- Around line 462-475: The comment incorrectly states that NewBlockFromCbor
validates the body hash; update the comment to accurately describe what's being
measured: decoding via ledger.NewBlockFromCbor followed by accessing
decoded.BlockBodyHash() (i.e., compute/access the body hash but not validate
it), and note that full body-hash validation is performed by ledger.VerifyBlock
if you want to profile that path; keep the loop and calls to NewBlockFromCbor,
decoded.BlockBodyHash(), and profileIterations unchanged.
- Around line 462-475: Update the misleading comment: replace the sentence
claiming "NewBlockFromCbor with default config validates body hash" with a
concise note that NewBlockFromCbor only decodes CBOR and does not validate the
body hash, and clarify that body hash validation is performed by
ledger.VerifyBlock; mention that the loop over profileIterations is profiling
decode + accessing BlockBodyHash() rather than verifying the hash so readers
understand what is measured.
---
Duplicate comments:
In @.github/workflows/benchmark.yml:
- Around line 29-36: The workflow's go test package list omits the new benchmark
packages so CI never runs the new benchmarks; update the go test invocation in
the benchmark job (the multi-line command that starts with "go test -bench=.
-benchmem -count=5 -timeout=30m") to include ./internal/bench/... alongside the
other packages (so that benchmarks such as BenchmarkBlockValidation,
BenchmarkCBORDecodeBlock, BenchmarkNativeScriptEvaluate, BenchmarkTxDecode are
executed), then keep piping output to tee as before.
- Around line 29-36: The CI benchmark job is missing the package that contains
the new benchmarks; add the missing package glob ./internal/bench/... to the
list of packages passed to the go test command (alongside ./vrf/..., ./kes/...,
etc.) so the new benchmarks (e.g., BenchmarkBlockValidation,
BenchmarkCBORDecodeBlock, BenchmarkNativeScriptEvaluate, BenchmarkTxDecode) are
executed and recorded in benchmark.txt.
In `@internal/bench/block_bench_test.go`:
- Around line 374-381: The uint64 underflow guard in the test (the check
comparing currentKesPeriod and kd.kesPeriod that calls b.Skipf and returns) is
correct; no code change required—keep the existing conditional that skips when
currentKesPeriod < kd.kesPeriod (variables currentKesPeriod and kd.kesPeriod) as
implemented.
In `@internal/bench/PROFILING.md`:
- Around line 57-82: Update the example commands in PROFILING.md to point at
actual packages and benchmarks: replace occurrences of "./pipeline/..." with
"./internal/bench/..." and swap the non-existent benchmark names
(BenchmarkBlockPipeline, BenchmarkParallelVerify) for real benchmarks such as
BenchmarkCBORDecodeBlockParallel (use this for the parallel/mutex examples) and
the corresponding block-profile benchmark (e.g., BenchmarkCBORDecodeBlock or
another existing block-related benchmark in internal/bench) so the examples run
without "no packages loaded" or "no matching benchmarks" errors; locate and
change the three example blocks that reference
BenchmarkBlockPipeline/BenchmarkParallelVerify and ./pipeline/... accordingly.
- Around line 57-82: The examples reference non-existent packages/benchmarks
(BenchmarkBlockPipeline, BenchmarkParallelVerify and ./pipeline/..././vrf...),
so update the three commands to use either real packages/benchmarks or a generic
safe form; replace ./pipeline/... and ./vrf/... with ./... (or specific existing
package directories discovered via fd/rg) and swap the explicit
BenchmarkBlockPipeline/BenchmarkParallelVerify names for either an actual
benchmark found in the repo or a generic pattern such as -bench=. to ensure the
commands run without "no packages loaded" or "no matching benchmarks" errors;
update all occurrences in the Block Profile, Mutex Profile and Trace examples
accordingly (look for the symbols BenchmarkBlockPipeline,
BenchmarkParallelVerify, ./pipeline/... and ./vrf... to locate the lines to
change).
In `@internal/bench/regression_test.go`:
- Around line 91-101: The init() currently panics if testdata.GetTestBlocks()
lacks the "Conway" block; move that lookup into the existing cborBlockDecodeOnce
so the lookup is done lazily only when cborBlockDecodeOnce runs. Remove the
panic from init(), and inside the cborBlockDecodeOnce.Do(...) body perform the
loop over testdata.GetTestBlocks(), set conwayBlockCbor when Name == "Conway",
and handle a missing block by returning/setting an error or nil only for the
CBOR regression path (no package-wide panic). Ensure the lookup is protected by
the same sync.Once (cborBlockDecodeOnce) and update any callers to handle a
missing conwayBlockCbor gracefully.
---
Nitpick comments:
In `@internal/bench/block_bench_test.go`:
- Around line 356-358: Assign the VRF result to the package-level benchSink to
prevent the compiler from inlining away the call: inside the loop that currently
calls vrf.Verify(vd.vrfKey, vd.proof, vd.output, vrfMsg) and discards both
return values, capture at least one returned value (e.g., the output or error)
into benchSink (or set benchSink = someValueDerivedFrom(the returned value)) so
the call is observable; update the loop using the existing vrf.Verify call site
and variables vd.vrfKey, vd.proof, vd.output, vrfMsg to store the result rather
than discarding it.
In `@internal/bench/cbor_bench_test.go`:
- Around line 109-115: Remove the duplicate mustDecodeHex function and replace
its usages in init() with the existing testdata.MustDecodeHex helper; update all
call sites from mustDecodeHex(...) to testdata.MustDecodeHex(...), delete the
now-unused mustDecodeHex definition, and remove the "encoding/hex" import so the
file only relies on the imported testdata package.
In `@internal/bench/helpers.go`:
- Around line 218-220: Remove the redundant wrapper GetTestBlocks in helpers.go
and update call sites to use testdata.GetTestBlocks() directly; specifically
delete the GetTestBlocks function and replace any references to
helpers.GetTestBlocks() (e.g., in block_bench_test.go) with
testdata.GetTestBlocks(), and then run tests/build to ensure no unused imports
or compilation errors remain.
- Around line 124-143: Multiple functions (BlockTypeFromEra, TxTypeFromEra,
blockCborFromEra) duplicate the same era switch; replace the repeated switches
with a single shared lookup map keyed by strings.ToLower(era) (e.g., var
eraToBlockType map[string]uint / eraToTxType map[string]uint / eraToCbor
map[string]SomeType or a single composite map) defined at package scope, then
have each function simply lookup the era, return the mapped value or the same
fmt.Errorf("unknown era: %s", era) on miss; update all three functions to use
the shared map(s) so adding a new era requires a single map change and preserve
current error behavior.
In `@internal/bench/profile_test.go`:
- Around line 71-199: Replace unexpected-error t.Fatalf usage in
TestProfileBlockValidation setup with testify's require.NoError to follow
project test guidelines: in the pre-decode loop where ledger.NewBlockFromCbor is
called (and any other setup spots flagged), call require.NoError(t, err, "failed
to pre-decode %s block", block.Name) (or equivalent) and import
"github.com/stretchr/testify/require" at the top of the test file; keep the
existing t.Logf calls for expected errors in VerifyBlock hot loops untouched.
- Around line 71-199: The test uses t.Fatalf for setup failures (e.g., in
TestProfileBlockValidation when pre-decoding blocks via ledger.NewBlockFromCbor)
which should be replaced with testify's require.NoError to follow project
testing conventions; update the pre-decode error check to use require.NoError(t,
err, "failed to pre-decode %s block", block.Name) and similarly replace other
t.Fatalf occurrences in this file (search for t.Fatalf and setup failure checks
around NewBlockFromCbor, any test setup sections, and the various TestProfile*
blocks) with require.NoError assertions, ensuring you import
github.com/stretchr/testify/require at the top of the test file.
In `@scripts/profile.sh`:
- Around line 183-207: The post-run help prints both CPU and memory profile
paths even when PROFILE_TYPE is set to "cpu" or "mem"; update the echo logic to
conditionally print only the relevant profile(s) based on the PROFILE_TYPE
variable: when PROFILE_TYPE == "cpu" print only cpu_${COMPONENT}.prof, when
"mem" print only mem_${COMPONENT}.prof, and when "all" or empty keep printing
both. Change both the interactive example block (uses go tool pprof -http...
$OUTPUT_DIR/cpu_${COMPONENT}.prof / mem_${COMPONENT}.prof) and the text-based
analysis block (uses go tool pprof -top ...) to follow this same conditional
branching, referencing the existing COMPONENT, PROFILE_TYPE and OUTPUT_DIR
variables.
- Around line 183-207: The post-run help always prints both CPU and memory
profile paths even when PROFILE_TYPE is set to "cpu" or "mem"; update the echo
blocks that print manual and text-based analysis instructions to respect
PROFILE_TYPE: when PROFILE_TYPE == "cpu" only print cpu_${COMPONENT}.prof (using
OUTPUT_DIR), when PROFILE_TYPE == "mem" only print mem_${COMPONENT}.prof, and
when PROFILE_TYPE == "all" keep the existing dual-path outputs (handle both the
"all" branch and the else branch logic for the text-based analysis block as well
as the earlier manual-analysis block).
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
.github/workflows/benchmark.ymlinternal/bench/BASELINES.mdinternal/bench/PROFILING.mdinternal/bench/README.mdinternal/bench/block_bench_test.gointernal/bench/cbor_bench_test.gointernal/bench/consensus_bench_test.gointernal/bench/helpers.gointernal/bench/helpers_test.gointernal/bench/profile_test.gointernal/bench/regression_test.gointernal/bench/script_bench_test.gointernal/bench/testdata/README.mdinternal/bench/tx_bench_test.goscripts/profile.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/bench/testdata/README.md
- internal/bench/consensus_bench_test.go
3c78cf9 to
adf1591
Compare
|
@cubic-dev-ai review |
@wolf31o2 I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
8 issues found across 15 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="internal/bench/PROFILING.md">
<violation number="1" location="internal/bench/PROFILING.md:19">
P2: Benchmark profiling commands should disable normal tests (`-run=^$`) so profiles capture benchmark workload only.</violation>
</file>
<file name=".github/workflows/benchmark.yml">
<violation number="1" location=".github/workflows/benchmark.yml:29">
P2: This benchmark job also runs normal tests because `-run` is not set. Restrict test execution (`-run=^$`) so only benchmark code is measured.</violation>
</file>
<file name="internal/bench/script_bench_test.go">
<violation number="1" location="internal/bench/script_bench_test.go:546">
P2: The native-script "Encode" benchmark is measuring cached-byte return, not real CBOR serialization, because the script already carries stored CBOR bytes.</violation>
</file>
<file name="internal/bench/BASELINES.md">
<violation number="1" location="internal/bench/BASELINES.md:185">
P2: This line documents CI behavior that conflicts with current project docs: benchmark CI is described as failing PRs on threshold regressions, but the benchmark README says no automatic CI threshold checking exists. Align this section to describe these as aspirational/manual thresholds until enforcement is implemented.</violation>
</file>
<file name="scripts/profile.sh">
<violation number="1" location="scripts/profile.sh:168">
P2: Errors are swallowed when profiling `all` components, so the script can report success even when profiling commands fail.</violation>
</file>
<file name="internal/bench/block_bench_test.go">
<violation number="1" location="internal/bench/block_bench_test.go:67">
P2: Setup-time encode/type failures are silently skipped, which can hide missing-era benchmark coverage instead of failing fast.</violation>
<violation number="2" location="internal/bench/block_bench_test.go:213">
P2: KES verification result is ignored, so failed verification (`false, nil`) is treated as success in the validation benchmark.</violation>
<violation number="3" location="internal/bench/block_bench_test.go:266">
P2: Pre-parsed validation benchmark also ignores the boolean KES verification result, allowing silent verification failures.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
internal/bench/PROFILING.md
Outdated
|
|
||
| ```bash | ||
| # Generate CPU profile | ||
| go test -bench=BenchmarkBlockValidation -cpuprofile=cpu.prof ./internal/bench/... |
There was a problem hiding this comment.
P2: Benchmark profiling commands should disable normal tests (-run=^$) so profiles capture benchmark workload only.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/PROFILING.md, line 19:
<comment>Benchmark profiling commands should disable normal tests (`-run=^$`) so profiles capture benchmark workload only.</comment>
<file context>
@@ -0,0 +1,399 @@
+
+```bash
+# Generate CPU profile
+go test -bench=BenchmarkBlockValidation -cpuprofile=cpu.prof ./internal/bench/...
+
+# Generate memory profile
</file context>
.github/workflows/benchmark.yml
Outdated
| - name: Run benchmarks | ||
| run: | | ||
| set -o pipefail | ||
| go test -bench=. -benchmem -count=5 -timeout=30m \ |
There was a problem hiding this comment.
P2: This benchmark job also runs normal tests because -run is not set. Restrict test execution (-run=^$) so only benchmark code is measured.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/benchmark.yml, line 29:
<comment>This benchmark job also runs normal tests because `-run` is not set. Restrict test execution (`-run=^$`) so only benchmark code is measured.</comment>
<file context>
@@ -0,0 +1,44 @@
+ - name: Run benchmarks
+ run: |
+ set -o pipefail
+ go test -bench=. -benchmem -count=5 -timeout=30m \
+ ./vrf/... \
+ ./kes/... \
</file context>
| b.ReportAllocs() | ||
| b.ResetTimer() | ||
| for i := 0; i < b.N; i++ { | ||
| _, err := cbor.Encode(&script) |
There was a problem hiding this comment.
P2: The native-script "Encode" benchmark is measuring cached-byte return, not real CBOR serialization, because the script already carries stored CBOR bytes.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/script_bench_test.go, line 546:
<comment>The native-script "Encode" benchmark is measuring cached-byte return, not real CBOR serialization, because the script already carries stored CBOR bytes.</comment>
<file context>
@@ -0,0 +1,598 @@
+ b.ReportAllocs()
+ b.ResetTimer()
+ for i := 0; i < b.N; i++ {
+ _, err := cbor.Encode(&script)
+ if err != nil {
+ b.Fatalf("failed to encode: %v", err)
</file context>
internal/bench/BASELINES.md
Outdated
|
|
||
| ### CI Failure Criteria | ||
|
|
||
| The benchmark CI workflow fails a PR if any of these thresholds are exceeded: |
There was a problem hiding this comment.
P2: This line documents CI behavior that conflicts with current project docs: benchmark CI is described as failing PRs on threshold regressions, but the benchmark README says no automatic CI threshold checking exists. Align this section to describe these as aspirational/manual thresholds until enforcement is implemented.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/BASELINES.md, line 185:
<comment>This line documents CI behavior that conflicts with current project docs: benchmark CI is described as failing PRs on threshold regressions, but the benchmark README says no automatic CI threshold checking exists. Align this section to describe these as aspirational/manual thresholds until enforcement is implemented.</comment>
<file context>
@@ -0,0 +1,231 @@
+
+### CI Failure Criteria
+
+The benchmark CI workflow fails a PR if any of these thresholds are exceeded:
+
+| Metric | Threshold | Rationale |
</file context>
scripts/profile.sh
Outdated
| if [[ "$COMPONENT" == "all" ]]; then | ||
| COMPONENTS=(block cbor vrf consensus bodyhash script) | ||
| for comp in "${COMPONENTS[@]}"; do | ||
| run_profile "$comp" || true |
There was a problem hiding this comment.
P2: Errors are swallowed when profiling all components, so the script can report success even when profiling commands fail.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At scripts/profile.sh, line 168:
<comment>Errors are swallowed when profiling `all` components, so the script can report success even when profiling commands fail.</comment>
<file context>
@@ -0,0 +1,231 @@
+if [[ "$COMPONENT" == "all" ]]; then
+ COMPONENTS=(block cbor vrf consensus bodyhash script)
+ for comp in "${COMPONENTS[@]}"; do
+ run_profile "$comp" || true
+ done
+
</file context>
| case *shelley.ShelleyBlockHeader: | ||
| bodyCbor, err = cbor.Encode(h.Body) | ||
| if err != nil { | ||
| continue |
There was a problem hiding this comment.
P2: Setup-time encode/type failures are silently skipped, which can hide missing-era benchmark coverage instead of failing fast.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/block_bench_test.go, line 67:
<comment>Setup-time encode/type failures are silently skipped, which can hide missing-era benchmark coverage instead of failing fast.</comment>
<file context>
@@ -0,0 +1,578 @@
+ case *shelley.ShelleyBlockHeader:
+ bodyCbor, err = cbor.Encode(h.Body)
+ if err != nil {
+ continue
+ }
+ sig = h.Signature
</file context>
internal/bench/block_bench_test.go
Outdated
| b.ResetTimer() | ||
| for i := 0; i < b.N; i++ { | ||
| // KES verification | ||
| _, err := ledger.VerifyKes( |
There was a problem hiding this comment.
P2: Pre-parsed validation benchmark also ignores the boolean KES verification result, allowing silent verification failures.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/block_bench_test.go, line 266:
<comment>Pre-parsed validation benchmark also ignores the boolean KES verification result, allowing silent verification failures.</comment>
<file context>
@@ -0,0 +1,578 @@
+ b.ResetTimer()
+ for i := 0; i < b.N; i++ {
+ // KES verification
+ _, err := ledger.VerifyKes(
+ bd.block.Header(),
+ defaultSlotsPerKesPeriod,
</file context>
internal/bench/block_bench_test.go
Outdated
| } | ||
|
|
||
| // KES verification | ||
| _, err = ledger.VerifyKes( |
There was a problem hiding this comment.
P2: KES verification result is ignored, so failed verification (false, nil) is treated as success in the validation benchmark.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/block_bench_test.go, line 213:
<comment>KES verification result is ignored, so failed verification (`false, nil`) is treated as success in the validation benchmark.</comment>
<file context>
@@ -0,0 +1,578 @@
+ }
+
+ // KES verification
+ _, err = ledger.VerifyKes(
+ block.Header(),
+ defaultSlotsPerKesPeriod,
</file context>
adf1591 to
e663c62
Compare
|
@cubic-dev-ai review |
@wolf31o2 I have started the AI code review. It will take a few minutes to complete. |
|
/benchmark |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
internal/bench/BASELINES.md (1)
11-11:⚠️ Potential issue | 🟡 MinorRegression-enforcement statement is inconsistent within the same document.
Line 11 says CI currently fails on threshold breaches, but Lines 185-187 say thresholds are aspirational and not CI-enforced yet. Please align the overview wording to current behavior to avoid confusion.
🛠️ Suggested doc fix
-1. **Regression Detection**: CI fails if allocation count increases by >20%, time increases by >20%, or bytes allocated increases by >50% (see [Regression Thresholds](`#regression-thresholds`)) +1. **Regression Detection**: Thresholds guide manual review today and are intended for future CI enforcement (see [Regression Thresholds](`#regression-thresholds`))Also applies to: 185-187
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/BASELINES.md` at line 11, Update the inconsistent enforcement language so the "Regression Detection" overview and the later "Regression Thresholds" section agree: either change the sentence under the "Regression Detection" heading that currently reads "CI fails if allocation count increases by >20%..." to state these thresholds are aspirational/not CI-enforced yet, or conversely change the wording in the paragraphs around lines 185-187 to say the thresholds are enforced by CI. Edit the Markdown under the "Regression Detection" heading and the "Regression Thresholds" section so both describe the same current behavior and use consistent phrasing (refer to the "Regression Detection" heading and the "Regression Thresholds" anchor when locating the text).
🧹 Nitpick comments (3)
internal/bench/script_bench_test.go (1)
49-154: Consider consolidating repeated script-wrap logic into one helper.
createPubkeyScript,createAllScript,createAnyScript,createNofKScript, and both timelock helpers all repeat the same encode→unmarshal→panic flow. A shared helper would reduce drift risk.♻️ Refactor sketch
+func mustWrapNativeScript(v any, label string) common.NativeScript { + data, err := cbor.Encode(v) + if err != nil { + panic("failed to encode " + label + " script: " + err.Error()) + } + var ns common.NativeScript + if err := ns.UnmarshalCBOR(data); err != nil { + panic("failed to unmarshal " + label + " script: " + err.Error()) + } + return ns +} + func createPubkeyScript(keyHash common.Blake2b224) common.NativeScript { script := common.NativeScriptPubkey{ Type: 0, Hash: keyHash[:], } - data, err := cbor.Encode(&script) - if err != nil { - panic("failed to encode pubkey script: " + err.Error()) - } - var ns common.NativeScript - if err := ns.UnmarshalCBOR(data); err != nil { - panic("failed to unmarshal pubkey script: " + err.Error()) - } - return ns + return mustWrapNativeScript(&script, "pubkey") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/script_bench_test.go` around lines 49 - 154, Multiple factory funcs (createPubkeyScript, createAllScript, createAnyScript, createNofKScript, createInvalidBeforeScript, createInvalidHereafterScript) duplicate the same encode→unmarshal→panic pattern; introduce a single helper (e.g., wrapNativeScript(script interface{}) common.NativeScript) that takes the concrete CBOR struct, calls cbor.Encode, checks error, unmarshals into a common.NativeScript via ns.UnmarshalCBOR, and returns ns (preserving the same panic-on-error behavior and clear messages), then replace each factory to build the specific struct and call wrapNativeScript instead to remove repetition and centralize error handling.internal/bench/tx_bench_test.go (1)
27-281: Era/fixture setup boilerplate is heavily repeated across benchmarks.The same
PostByronEraNames()+MustLoadBlockFixture()+ first-tx extraction pattern is duplicated many times. A shared helper would reduce maintenance overhead and keep per-benchmark logic focused.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/tx_bench_test.go` around lines 27 - 281, The benchmarks duplicate era/fixture setup (PostByronEraNames, MustLoadBlockFixture, fixture.Block.Transactions, selecting txs[0], txCbor) across many Benchmark* functions; factor this into a small helper (e.g., iterateFixtureTxs or forEachEraFixtureFirstTx) that iterates PostByronEraNames(), calls MustLoadBlockFixture(era, "default"), skips when txs empty, and invokes a callback with era and the first tx (or txCbor when needed); then update each BenchmarkTxDecode, BenchmarkTxHash, BenchmarkTxInputs, BenchmarkTxOutputs, BenchmarkTxFee, BenchmarkTxMetadata, BenchmarkTxCbor, BenchmarkTxUtxorpc, BenchmarkTxProducedUtxos, BenchmarkTxConsumedUtxos, and BenchmarkTxBlockIteration to call that helper and run per-era b.Run bodies, keeping benchSink and pre-validation logic inside the callback.internal/bench/helpers.go (1)
160-184: Consider deriving era lists from theeraLookupmap.
EraNames()andPostByronEraNames()return hardcoded slices whileeraLookupis the source of truth for era information. If a new era is added toeraLookup, these functions would need manual updates. For benchmark code this is acceptable, but consider adding a comment noting the intentional ordering or deriving from the map if order doesn't matter.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/helpers.go` around lines 160 - 184, Replace the hardcoded era slices in EraNames() and PostByronEraNames() by deriving the lists from the canonical eraLookup map (or if ordering is important, add a clarifying comment); specifically, iterate over eraLookup to build EraNames() (and filter out "byron" for PostByronEraNames()), or add a comment above EraNames() and PostByronEraNames() stating that the ordering is intentional and must be updated alongside eraLookup when new eras are added so the source-of-truth (eraLookup) remains clear.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/bench/block_bench_test.go`:
- Around line 372-373: The benchmark currently discards the error returned by
vrf.Verify; change the call to capture both return values (valid, err :=
vrf.Verify(...)) and handle err instead of ignoring it: if err != nil then fail
the benchmark/run (e.g., b.Fatalf or t.Fatalf) or log the error and stop,
otherwise assign benchSink = valid. Update the invocation that uses vrf.Verify
with vd.vrfKey, vd.proof, vd.output, vrfMsg and ensure benchSink remains used to
avoid compiler optimizations.
- Around line 566-573: The benchmark currently ignores the results of
ledger.VerifyKesComponents causing failures to be suppressed; in
BenchmarkVerifyKesComponents capture the two return values from
ledger.VerifyKesComponents (the verification bool and error), and if the error
is non-nil or the verification bool is false call the benchmark's failure helper
(e.g., b.Fatalf/b.Fatal/b.Fatalf with a descriptive message including the error
and relevant inputs like kd.bodyCbor, kd.signature, kd.hotVkey, kd.kesPeriod,
kd.slot) so failing verification paths are surfaced during the benchmark run.
---
Duplicate comments:
In `@internal/bench/BASELINES.md`:
- Line 11: Update the inconsistent enforcement language so the "Regression
Detection" overview and the later "Regression Thresholds" section agree: either
change the sentence under the "Regression Detection" heading that currently
reads "CI fails if allocation count increases by >20%..." to state these
thresholds are aspirational/not CI-enforced yet, or conversely change the
wording in the paragraphs around lines 185-187 to say the thresholds are
enforced by CI. Edit the Markdown under the "Regression Detection" heading and
the "Regression Thresholds" section so both describe the same current behavior
and use consistent phrasing (refer to the "Regression Detection" heading and the
"Regression Thresholds" anchor when locating the text).
---
Nitpick comments:
In `@internal/bench/helpers.go`:
- Around line 160-184: Replace the hardcoded era slices in EraNames() and
PostByronEraNames() by deriving the lists from the canonical eraLookup map (or
if ordering is important, add a clarifying comment); specifically, iterate over
eraLookup to build EraNames() (and filter out "byron" for PostByronEraNames()),
or add a comment above EraNames() and PostByronEraNames() stating that the
ordering is intentional and must be updated alongside eraLookup when new eras
are added so the source-of-truth (eraLookup) remains clear.
In `@internal/bench/script_bench_test.go`:
- Around line 49-154: Multiple factory funcs (createPubkeyScript,
createAllScript, createAnyScript, createNofKScript, createInvalidBeforeScript,
createInvalidHereafterScript) duplicate the same encode→unmarshal→panic pattern;
introduce a single helper (e.g., wrapNativeScript(script interface{})
common.NativeScript) that takes the concrete CBOR struct, calls cbor.Encode,
checks error, unmarshals into a common.NativeScript via ns.UnmarshalCBOR, and
returns ns (preserving the same panic-on-error behavior and clear messages),
then replace each factory to build the specific struct and call wrapNativeScript
instead to remove repetition and centralize error handling.
In `@internal/bench/tx_bench_test.go`:
- Around line 27-281: The benchmarks duplicate era/fixture setup
(PostByronEraNames, MustLoadBlockFixture, fixture.Block.Transactions, selecting
txs[0], txCbor) across many Benchmark* functions; factor this into a small
helper (e.g., iterateFixtureTxs or forEachEraFixtureFirstTx) that iterates
PostByronEraNames(), calls MustLoadBlockFixture(era, "default"), skips when txs
empty, and invokes a callback with era and the first tx (or txCbor when needed);
then update each BenchmarkTxDecode, BenchmarkTxHash, BenchmarkTxInputs,
BenchmarkTxOutputs, BenchmarkTxFee, BenchmarkTxMetadata, BenchmarkTxCbor,
BenchmarkTxUtxorpc, BenchmarkTxProducedUtxos, BenchmarkTxConsumedUtxos, and
BenchmarkTxBlockIteration to call that helper and run per-era b.Run bodies,
keeping benchSink and pre-validation logic inside the callback.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
.github/workflows/benchmark.ymlinternal/bench/BASELINES.mdinternal/bench/PROFILING.mdinternal/bench/README.mdinternal/bench/block_bench_test.gointernal/bench/cbor_bench_test.gointernal/bench/consensus_bench_test.gointernal/bench/helpers.gointernal/bench/helpers_test.gointernal/bench/profile_test.gointernal/bench/regression_test.gointernal/bench/script_bench_test.gointernal/bench/testdata/README.mdinternal/bench/tx_bench_test.goscripts/profile.sh
🚧 Files skipped from review as they are similar to previous changes (3)
- internal/bench/profile_test.go
- internal/bench/testdata/README.md
- .github/workflows/benchmark.yml
There was a problem hiding this comment.
7 issues found across 15 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="internal/bench/profile_test.go">
<violation number="1" location="internal/bench/profile_test.go:466">
P2: `TestProfileBodyHash` profiles a hash-field accessor instead of actual body-hash validation, so the generated profile is not measuring the intended code path.</violation>
</file>
<file name="internal/bench/consensus_bench_test.go">
<violation number="1" location="internal/bench/consensus_bench_test.go:316">
P2: The parallel threshold benchmark includes `atomic.Value.Store` in the hot loop, which materially distorts the measured performance of `CertifiedNatThreshold`.</violation>
</file>
<file name="internal/bench/block_bench_test.go">
<violation number="1" location="internal/bench/block_bench_test.go:65">
P2: Era-specific benchmark switches are missing Leios support and silently skip that era, causing incomplete benchmark coverage.</violation>
<violation number="2" location="internal/bench/block_bench_test.go:372">
P2: Handle the error from `vrf.Verify` so the benchmark fails if verification errors occur and avoids measuring an unintended error path.</violation>
<violation number="3" location="internal/bench/block_bench_test.go:566">
P2: Check the `VerifyKesComponents` result and error to ensure the benchmark measures the success path and fails on verification errors.</violation>
</file>
<file name="internal/bench/BASELINES.md">
<violation number="1" location="internal/bench/BASELINES.md:11">
P2: The baseline doc claims CI enforces regression thresholds, but later sections (and benchmark README) state CI does not currently enforce them. This inconsistency can cause incorrect expectations about benchmark gate behavior.</violation>
</file>
<file name="internal/bench/helpers.go">
<violation number="1" location="internal/bench/helpers.go:162">
P3: Era names are hardcoded in multiple places (`eraLookup`, `EraNames`, `PostByronEraNames`), which increases maintenance risk and can cause future inconsistencies.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| b.RunParallel(func(pb *testing.PB) { | ||
| // Each goroutine gets its own Rat to avoid data races. | ||
| activeSlotCoeff := big.NewRat(1, 20) | ||
| for pb.Next() { |
There was a problem hiding this comment.
P2: The parallel threshold benchmark includes atomic.Value.Store in the hot loop, which materially distorts the measured performance of CertifiedNatThreshold.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/consensus_bench_test.go, line 316:
<comment>The parallel threshold benchmark includes `atomic.Value.Store` in the hot loop, which materially distorts the measured performance of `CertifiedNatThreshold`.</comment>
<file context>
@@ -0,0 +1,326 @@
+ b.RunParallel(func(pb *testing.PB) {
+ // Each goroutine gets its own Rat to avoid data races.
+ activeSlotCoeff := big.NewRat(1, 20)
+ for pb.Next() {
+ consensusParallelSink.Store(
+ consensus.CertifiedNatThreshold(
</file context>
internal/bench/block_bench_test.go
Outdated
| b.ReportAllocs() | ||
| b.ResetTimer() | ||
| for i := 0; i < b.N; i++ { | ||
| _, _ = ledger.VerifyKesComponents( |
There was a problem hiding this comment.
P2: Check the VerifyKesComponents result and error to ensure the benchmark measures the success path and fails on verification errors.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/block_bench_test.go, line 566:
<comment>Check the `VerifyKesComponents` result and error to ensure the benchmark measures the success path and fails on verification errors.</comment>
<file context>
@@ -0,0 +1,599 @@
+ b.ReportAllocs()
+ b.ResetTimer()
+ for i := 0; i < b.N; i++ {
+ _, _ = ledger.VerifyKesComponents(
+ kd.bodyCbor,
+ kd.signature,
</file context>
internal/bench/block_bench_test.go
Outdated
| b.ReportAllocs() | ||
| b.ResetTimer() | ||
| for i := 0; i < b.N; i++ { | ||
| valid, _ := vrf.Verify(vd.vrfKey, vd.proof, vd.output, vrfMsg) |
There was a problem hiding this comment.
P2: Handle the error from vrf.Verify so the benchmark fails if verification errors occur and avoids measuring an unintended error path.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/block_bench_test.go, line 372:
<comment>Handle the error from `vrf.Verify` so the benchmark fails if verification errors occur and avoids measuring an unintended error path.</comment>
<file context>
@@ -0,0 +1,599 @@
+ b.ReportAllocs()
+ b.ResetTimer()
+ for i := 0; i < b.N; i++ {
+ valid, _ := vrf.Verify(vd.vrfKey, vd.proof, vd.output, vrfMsg)
+ benchSink = valid
+ }
</file context>
internal/bench/BASELINES.md
Outdated
|
|
||
| This document tracks memory allocation baselines for key validation paths in gouroboros. These baselines serve three purposes: | ||
|
|
||
| 1. **Regression Detection**: CI fails if allocation count increases by >20%, time increases by >20%, or bytes allocated increases by >50% (see [Regression Thresholds](#regression-thresholds)) |
There was a problem hiding this comment.
P2: The baseline doc claims CI enforces regression thresholds, but later sections (and benchmark README) state CI does not currently enforce them. This inconsistency can cause incorrect expectations about benchmark gate behavior.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/BASELINES.md, line 11:
<comment>The baseline doc claims CI enforces regression thresholds, but later sections (and benchmark README) state CI does not currently enforce them. This inconsistency can cause incorrect expectations about benchmark gate behavior.</comment>
<file context>
@@ -0,0 +1,233 @@
+
+This document tracks memory allocation baselines for key validation paths in gouroboros. These baselines serve three purposes:
+
+1. **Regression Detection**: CI fails if allocation count increases by >20%, time increases by >20%, or bytes allocated increases by >50% (see [Regression Thresholds](#regression-thresholds))
+2. **Optimization Tracking**: Measure impact of performance improvements
+3. **Contributor Guidance**: Set expectations for new code
</file context>
| var kesPeriod uint64 | ||
| var err error | ||
|
|
||
| switch h := header.(type) { |
There was a problem hiding this comment.
P2: Era-specific benchmark switches are missing Leios support and silently skip that era, causing incomplete benchmark coverage.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/block_bench_test.go, line 65:
<comment>Era-specific benchmark switches are missing Leios support and silently skip that era, causing incomplete benchmark coverage.</comment>
<file context>
@@ -0,0 +1,599 @@
+ var kesPeriod uint64
+ var err error
+
+ switch h := header.(type) {
+ case *shelley.ShelleyBlockHeader:
+ bodyCbor, err = cbor.Encode(h.Body)
</file context>
internal/bench/helpers.go
Outdated
|
|
||
| // EraNames returns the list of supported era names for benchmarking. | ||
| func EraNames() []string { | ||
| return []string{ |
There was a problem hiding this comment.
P3: Era names are hardcoded in multiple places (eraLookup, EraNames, PostByronEraNames), which increases maintenance risk and can cause future inconsistencies.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/helpers.go, line 162:
<comment>Era names are hardcoded in multiple places (`eraLookup`, `EraNames`, `PostByronEraNames`), which increases maintenance risk and can cause future inconsistencies.</comment>
<file context>
@@ -0,0 +1,194 @@
+
+// EraNames returns the list of supported era names for benchmarking.
+func EraNames() []string {
+ return []string{
+ "byron",
+ "shelley",
</file context>
e663c62 to
a29c785
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
internal/bench/helpers.go (1)
160-186: Single-source supported era names to prevent drift.
EraNames()andPostByronEraNames()duplicate literals already represented in
eraLookup. A shared ordered slice would reduce maintenance mistakes when new
eras are added.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/bench/helpers.go` around lines 160 - 186, Replace the duplicated literal lists with a single ordered slice (e.g., supportedEraNames) that is the authoritative ordering used by EraNames and PostByronEraNames; have EraNames return that slice directly and have PostByronEraNames derive its result by slicing or filtering that single slice to exclude "byron" (e.g., supportedEraNames[1:] or filter out "byron") and ensure the single slice stays consistent with eraLookup so adding a new era only requires updating one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/bench/block_bench_test.go`:
- Around line 207-245: The benchmark currently double-validates the block body
hash because benchConfig() returns a cfg used by ledger.NewBlockFromCbor (which
does validation when SkipBodyHashValidation is false) and the test then calls
common.ValidateBlockBodyHash again; fix by setting cfg.SkipBodyHashValidation =
true on the cfg returned from benchConfig() before calling
ledger.NewBlockFromCbor in BenchmarkBlockValidation so NewBlockFromCbor skips
internal body-hash validation and only the explicit common.ValidateBlockBodyHash
call is measured.
- Around line 394-415: The benchmark currently discards the bool returned by
kes.VerifySignedKES which can hide failing verification; add a pre-check using
kes.VerifySignedKES(kd.hotVkey, t, kd.bodyCbor, kd.signature) before starting
the timed loop and call b.Fatalf or b.Skip if it returns false, then inside the
loop assign the result to a package-level sink (e.g., benchSink) instead of
discarding it so the compiler cannot optimize the call away; locate the call to
kes.VerifySignedKES in the b.Run("Era_"+kd.name) closure and update it
accordingly.
In `@internal/bench/helpers.go`:
- Around line 190-196: Replace the panic-prone call to testdata.MustDecodeHex in
blockCborFromEra with a non-panicking decode and propagate any decode error via
the function's error return; locate blockCborFromEra and swap MustDecodeHex for
the decode function that returns (bytes, error) (or add one in the testdata
package), check the error and return fmt.Errorf(...) or the decode error
wrapped, so callers like LoadBlockFixture receive an error instead of a panic.
---
Nitpick comments:
In `@internal/bench/helpers.go`:
- Around line 160-186: Replace the duplicated literal lists with a single
ordered slice (e.g., supportedEraNames) that is the authoritative ordering used
by EraNames and PostByronEraNames; have EraNames return that slice directly and
have PostByronEraNames derive its result by slicing or filtering that single
slice to exclude "byron" (e.g., supportedEraNames[1:] or filter out "byron") and
ensure the single slice stays consistent with eraLookup so adding a new era only
requires updating one place.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
.github/workflows/benchmark.ymlinternal/bench/BASELINES.mdinternal/bench/PROFILING.mdinternal/bench/README.mdinternal/bench/block_bench_test.gointernal/bench/cbor_bench_test.gointernal/bench/consensus_bench_test.gointernal/bench/helpers.gointernal/bench/helpers_test.gointernal/bench/profile_test.gointernal/bench/regression_test.gointernal/bench/script_bench_test.gointernal/bench/testdata/README.mdinternal/bench/tx_bench_test.goscripts/profile.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/benchmark.yml
- internal/bench/testdata/README.md
| func blockCborFromEra(era string) ([]byte, error) { | ||
| info, ok := eraLookup[strings.ToLower(era)] | ||
| if !ok { | ||
| return nil, fmt.Errorf("unknown era: %s", era) | ||
| } | ||
| return testdata.MustDecodeHex(info.blockHex), nil | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, locate and read the testdata.MustDecodeHex implementation
find . -name "*.go" -path "*/testdata/*" | head -20Repository: blinklabs-io/gouroboros
Length of output: 96
🏁 Script executed:
#!/bin/bash
# Search for MustDecodeHex definition and usage
rg -n "func MustDecodeHex" --type go
rg -n "MustDecodeHex" --type go | head -30Repository: blinklabs-io/gouroboros
Length of output: 1961
🏁 Script executed:
#!/bin/bash
# Examine the blockCborFromEra function and surrounding context
cat -n internal/bench/helpers.go | sed -n '130,200p'Repository: blinklabs-io/gouroboros
Length of output: 2598
🏁 Script executed:
#!/bin/bash
# Search for callers of blockCborFromEra to understand error handling expectations
rg -n "blockCborFromEra" --type goRepository: blinklabs-io/gouroboros
Length of output: 309
🏁 Script executed:
#!/bin/bash
# Check the imports section of internal/bench/helpers.go
cat -n internal/bench/helpers.go | head -30Repository: blinklabs-io/gouroboros
Length of output: 1328
🏁 Script executed:
#!/bin/bash
# Check if encoding/hex is already imported
rg -n "import|encoding/hex" internal/bench/helpers.go | head -20Repository: blinklabs-io/gouroboros
Length of output: 78
🏁 Script executed:
#!/bin/bash
# Verify the exact context around blockCborFromEra caller
cat -n internal/bench/helpers.go | sed -n '85,100p'Repository: blinklabs-io/gouroboros
Length of output: 665
🏁 Script executed:
#!/bin/bash
# Get the full MustDecodeHex implementation to confirm panic behavior
cat -n internal/testdata/blocks.go | sed -n '103,110p'Repository: blinklabs-io/gouroboros
Length of output: 323
Replace MustDecodeHex panic with proper error handling in error-returning function.
Line 195 uses testdata.MustDecodeHex, which panics on invalid hex, bypassing the error return contract of blockCborFromEra. The caller LoadBlockFixture (line 93) expects error handling via the returned error value, not a panic.
Proposed fix
import (
+ "encoding/hex"
"fmt"
"math/big"
"strings" func blockCborFromEra(era string) ([]byte, error) {
info, ok := eraLookup[strings.ToLower(era)]
if !ok {
return nil, fmt.Errorf("unknown era: %s", era)
}
- return testdata.MustDecodeHex(info.blockHex), nil
+ data, err := hex.DecodeString(info.blockHex)
+ if err != nil {
+ return nil, fmt.Errorf("decode %s fixture hex: %w", era, err)
+ }
+ return data, nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func blockCborFromEra(era string) ([]byte, error) { | |
| info, ok := eraLookup[strings.ToLower(era)] | |
| if !ok { | |
| return nil, fmt.Errorf("unknown era: %s", era) | |
| } | |
| return testdata.MustDecodeHex(info.blockHex), nil | |
| } | |
| func blockCborFromEra(era string) ([]byte, error) { | |
| info, ok := eraLookup[strings.ToLower(era)] | |
| if !ok { | |
| return nil, fmt.Errorf("unknown era: %s", era) | |
| } | |
| data, err := hex.DecodeString(info.blockHex) | |
| if err != nil { | |
| return nil, fmt.Errorf("decode %s fixture hex: %w", era, err) | |
| } | |
| return data, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/bench/helpers.go` around lines 190 - 196, Replace the panic-prone
call to testdata.MustDecodeHex in blockCborFromEra with a non-panicking decode
and propagate any decode error via the function's error return; locate
blockCborFromEra and swap MustDecodeHex for the decode function that returns
(bytes, error) (or add one in the testdata package), check the error and return
fmt.Errorf(...) or the decode error wrapped, so callers like LoadBlockFixture
receive an error instead of a panic.
Signed-off-by: Chris Gianelloni <wolf31o2@blinklabs.io>
a29c785 to
854b7c2
Compare
|
@cubic-dev-ai review |
@wolf31o2 I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
1 issue found across 15 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="internal/bench/README.md">
<violation number="1" location="internal/bench/README.md:437">
P3: The documented CI benchmark package list is incomplete and omits `internal/bench`, which the workflow actually benchmarks.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
|
||
| Each run produces a single file, `benchmark.txt`, containing the output of | ||
| `go test -bench=. -benchmem -count=5` across the core packages (vrf, kes, | ||
| consensus, cbor, pipeline, ledger). |
There was a problem hiding this comment.
P3: The documented CI benchmark package list is incomplete and omits internal/bench, which the workflow actually benchmarks.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/bench/README.md, line 437:
<comment>The documented CI benchmark package list is incomplete and omits `internal/bench`, which the workflow actually benchmarks.</comment>
<file context>
@@ -0,0 +1,474 @@
+
+Each run produces a single file, `benchmark.txt`, containing the output of
+`go test -bench=. -benchmem -count=5` across the core packages (vrf, kes,
+consensus, cbor, pipeline, ledger).
+
+1. **Location**: GitHub Actions > Workflows > Benchmarks > Run > Artifacts
</file context>
Summary by cubic
Adds a memory profiling and benchmark suite across core validation paths, with docs and CI to run, compare, and profile to catch regressions.
Written for commit 854b7c2. Summary will update on new commits.
Summary by CodeRabbit
Tests
Chores
Documentation