Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

@railway-app
Copy link

railway-app bot commented Jan 9, 2026

🚅 Deployed to the rivet-pr-3796 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-inspector 😴 Sleeping (View Logs) Web Jan 9, 2026 at 12:08 am
frontend-cloud-prod 😴 Sleeping (View Logs) Web Jan 9, 2026 at 12:07 am
frontend-cloud-staging 😴 Sleeping (View Logs) Web Jan 9, 2026 at 12:07 am
website ❌ Build Failed (View Logs) Web Jan 9, 2026 at 12:01 am

Copy link
Contributor Author

MasterPtato commented Jan 9, 2026


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Jan 9, 2026

PR Review: fix(gas): dont include error iters in metric

Summary

This PR fixes a metric recording issue where loop iteration durations were being recorded even when the iteration callback failed with an error. The fix ensures that the LOOP_ITERATION_DURATION metric only captures successful iteration timings.

Changes Analysis

What Changed:

  1. iteration_dt changed from Duration to Option<Duration> (line 120)
  2. Duration is now only calculated when cb_res.is_ok() (line 203)
  3. Metric recording is now conditional on iteration_dt.is_some() (lines 189-194)

✅ Positives

  1. Correct Metric Semantics: The change ensures that only successful iteration durations are recorded, which is the correct behavior for performance metrics. Error paths may have different timing characteristics and would skew the metric distribution.

  2. Clean Implementation: Using Option<Duration> with then() is idiomatic Rust and makes the conditional nature explicit.

  3. Preserves Performance: The optimization structure remains intact - the metric is still recorded in the parallelized join block.

🔍 Code Quality Observations

Good:

  • Variable rename from loop_res to cb_res (line 158, 215) improves clarity
  • The logic is sound and easy to follow
  • No unnecessary changes to surrounding code

Minor Observations:

  1. Initialization Pattern: The iteration_dt is initialized as None and only set after the first iteration completes (line 213). This means:

    • First iteration: iteration_dt = None
    • Second iteration: Records duration from first iteration
    • This is correct but subtle - the metric recording happens one iteration delayed
  2. Replay Path: When iterations are replays (line 180 condition), the metric is correctly skipped because iteration_dt will be None for error iterations. This is the intended behavior.

🐛 Potential Issues

None identified - The logic appears sound.

🔒 Security Considerations

No security concerns - this is purely a metrics recording change.

⚡ Performance Considerations

  • No performance impact - the change only adds a conditional check around metric recording
  • The parallelization structure remains intact
  • The then() call is cheap and happens inline

🧪 Test Coverage

Recommendation: Consider adding a test case that:

  1. Runs a loop where some iterations fail
  2. Verifies that the LOOP_ITERATION_DURATION metric only records successful iterations
  3. Checks that error iterations don't pollute the metric

Example test scenario:

// Loop that fails on iteration 2, succeeds on 1 and 3
// Verify metric only has 2 recordings (for iterations 1 and 3)

However, this may require exposing metric values in tests, which might not be straightforward.

📝 Suggestions

  1. Comment Clarity: The comment on line 190 says "Only record iteration duration if its not a replay" but the conditional now also excludes errors. Consider updating to:

    // Only record iteration duration if it's not a replay and was successful
  2. Documentation: Consider adding a doc comment or inline comment explaining that iteration_dt is None for:

    • The first iteration (before any callback has run)
    • Any iteration where the callback returned an error

✅ Verdict

LGTM with minor suggestions. The fix is correct and improves metric accuracy. The suggested comment update would improve code clarity but is not blocking.

Score: 9/10 - Solid fix with good implementation, minor documentation improvement opportunity.

@claude
Copy link

claude bot commented Jan 9, 2026

Pull Request Review

Summary

This PR fixes a bug where the LOOP_ITERATION_DURATION metric was incorrectly including failed iterations. The fix changes the metric to only record duration for successful iterations by making the duration optional and only setting it when the callback succeeds.

Changes

  • Changed iteration_dt from Duration to Option<Duration> at line 120
  • Modified the callback execution to only record duration on success using .then() at line 203
  • Updated metric recording to check if duration exists before observing at lines 189-194

Code Quality ✅

Strengths:

  1. Correct approach: Using Option<Duration> and cb_res.is_ok().then(|| ...) is idiomatic Rust for conditional value creation
  2. Minimal change: The fix is surgical and doesn't introduce unnecessary complexity
  3. Preserves parallelization: The change maintains the existing parallel execution pattern without adding latency
  4. Consistent pattern: The conditional metric recording pattern is already used elsewhere (line 189 checks for replay)

Variable naming consistency:

  • Line 158: Variable renamed from loop_res to cb_res - good clarification since it's the callback result, not a "loop result"

Potential Issues 🔍

1. Initial iteration always skips metric (minor)
On line 120, iteration_dt is initialized to None, which means the first iteration will never have a duration recorded even if it succeeds. This is because:

  • Line 213 sets iteration_dt = cb_dt (the duration from current iteration)
  • Line 189-194 records the metric using the previous iteration's duration (before committing the branch event)

This appears intentional (to avoid recording replayed iterations), but it means iteration 0's duration is never recorded even on first run. Consider if this is the desired behavior.

2. Error iterations still affect timing (edge case)
If an iteration errors, cb_dt becomes None, but the next successful iteration will have no prior duration to record. This means:

  • Iteration N errors → iteration_dt = None
  • Iteration N+1 succeeds → records its duration
  • But iteration N's timing impact (failed work) isn't captured

This is likely acceptable since we don't want to skew metrics with error cases, but worth documenting.

Performance ✅

No performance concerns. The change:

  • Uses zero-cost abstractions (Option, .then())
  • Maintains the existing parallelization strategy
  • Adds negligible overhead (one is_ok() check)

Security ✅

No security concerns identified.

Test Coverage ⚠️

Recommendation: Add a test case that verifies error iterations don't record metrics. The existing loop_test.rs only tests successful iterations.

Suggested test scenario:

// Test that error iterations don't pollute LOOP_ITERATION_DURATION
// 1. Run loop with iterations that error periodically
// 2. Verify only successful iterations contribute to metric

Metrics Semantics 📊

The metric gasoline_loop_iteration_duration now measures:

  • Before: Time elapsed for all iterations (including errors)
  • After: Time elapsed only for successful iterations

This is a breaking change in metric semantics. Consider:

  1. Documenting this in the metric description (currently "Total duration of a single loop iteration")
  2. Alerting teams that monitor this metric about the behavior change
  3. Possibly renaming to gasoline_loop_iteration_duration_success for clarity (optional)

Recommendation: Approve with minor suggestions

The fix correctly addresses the issue. Consider:

  1. Adding a test for error case metric behavior
  2. Updating metric documentation to clarify it only measures successful iterations
  3. Verifying the first-iteration behavior is intentional (or adjust if needed)

Great work on the surgical fix! The use of Option and conditional timing is the right approach.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Jan 9, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3796

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3796

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3796

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3796

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3796

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3796

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3796

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@3796

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3796

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3796

commit: a6a1a86

@MasterPtato MasterPtato closed this Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants