fix(gas): dont include error iters in metric #3796

MasterPtato · 2026-01-09T00:00:23Z

No description provided.

railway-app · 2026-01-09T00:00:28Z

🚅 Deployed to the rivet-pr-3796 environment in rivet-frontend

Service	Status	Web	Updated (UTC)
frontend-inspector	😴 Sleeping (View Logs)	Web	Jan 9, 2026 at 12:08 am
frontend-cloud-prod	😴 Sleeping (View Logs)	Web	Jan 9, 2026 at 12:07 am
frontend-cloud-staging	😴 Sleeping (View Logs)	Web	Jan 9, 2026 at 12:07 am
website	❌ Build Failed (View Logs)	Web	Jan 9, 2026 at 12:01 am

MasterPtato · 2026-01-09T00:00:40Z

fix(gas): dont include error iters in metric #3796 👈 (View in Graphite)
main

How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2026-01-09T00:02:07Z

PR Review: fix(gas): dont include error iters in metric

Summary

This PR fixes a metric recording issue where loop iteration durations were being recorded even when the iteration callback failed with an error. The fix ensures that the LOOP_ITERATION_DURATION metric only captures successful iteration timings.

Changes Analysis

What Changed:

iteration_dt changed from Duration to Option<Duration> (line 120)
Duration is now only calculated when cb_res.is_ok() (line 203)
Metric recording is now conditional on iteration_dt.is_some() (lines 189-194)

✅ Positives

Correct Metric Semantics: The change ensures that only successful iteration durations are recorded, which is the correct behavior for performance metrics. Error paths may have different timing characteristics and would skew the metric distribution.
Clean Implementation: Using Option<Duration> with then() is idiomatic Rust and makes the conditional nature explicit.
Preserves Performance: The optimization structure remains intact - the metric is still recorded in the parallelized join block.

🔍 Code Quality Observations

Good:

Variable rename from loop_res to cb_res (line 158, 215) improves clarity
The logic is sound and easy to follow
No unnecessary changes to surrounding code

Minor Observations:

Initialization Pattern: The iteration_dt is initialized as None and only set after the first iteration completes (line 213). This means:
- First iteration: iteration_dt = None
- Second iteration: Records duration from first iteration
- This is correct but subtle - the metric recording happens one iteration delayed
Replay Path: When iterations are replays (line 180 condition), the metric is correctly skipped because iteration_dt will be None for error iterations. This is the intended behavior.

🐛 Potential Issues

None identified - The logic appears sound.

🔒 Security Considerations

No security concerns - this is purely a metrics recording change.

⚡ Performance Considerations

No performance impact - the change only adds a conditional check around metric recording
The parallelization structure remains intact
The then() call is cheap and happens inline

🧪 Test Coverage

Recommendation: Consider adding a test case that:

Runs a loop where some iterations fail
Verifies that the LOOP_ITERATION_DURATION metric only records successful iterations
Checks that error iterations don't pollute the metric

Example test scenario:

// Loop that fails on iteration 2, succeeds on 1 and 3
// Verify metric only has 2 recordings (for iterations 1 and 3)

However, this may require exposing metric values in tests, which might not be straightforward.

📝 Suggestions

Comment Clarity: The comment on line 190 says "Only record iteration duration if its not a replay" but the conditional now also excludes errors. Consider updating to:
```
// Only record iteration duration if it's not a replay and was successful
```
Documentation: Consider adding a doc comment or inline comment explaining that iteration_dt is None for:
- The first iteration (before any callback has run)
- Any iteration where the callback returned an error

✅ Verdict

LGTM with minor suggestions. The fix is correct and improves metric accuracy. The suggested comment update would improve code clarity but is not blocking.

Score: 9/10 - Solid fix with good implementation, minor documentation improvement opportunity.

claude · 2026-01-09T00:02:17Z

Pull Request Review

Summary

This PR fixes a bug where the LOOP_ITERATION_DURATION metric was incorrectly including failed iterations. The fix changes the metric to only record duration for successful iterations by making the duration optional and only setting it when the callback succeeds.

Changes

Changed iteration_dt from Duration to Option<Duration> at line 120
Modified the callback execution to only record duration on success using .then() at line 203
Updated metric recording to check if duration exists before observing at lines 189-194

Code Quality ✅

Strengths:

Correct approach: Using Option<Duration> and cb_res.is_ok().then(|| ...) is idiomatic Rust for conditional value creation
Minimal change: The fix is surgical and doesn't introduce unnecessary complexity
Preserves parallelization: The change maintains the existing parallel execution pattern without adding latency
Consistent pattern: The conditional metric recording pattern is already used elsewhere (line 189 checks for replay)

Variable naming consistency:

Line 158: Variable renamed from loop_res to cb_res - good clarification since it's the callback result, not a "loop result"

Potential Issues 🔍

1. Initial iteration always skips metric (minor)
On line 120, iteration_dt is initialized to None, which means the first iteration will never have a duration recorded even if it succeeds. This is because:

Line 213 sets iteration_dt = cb_dt (the duration from current iteration)
Line 189-194 records the metric using the previous iteration's duration (before committing the branch event)

This appears intentional (to avoid recording replayed iterations), but it means iteration 0's duration is never recorded even on first run. Consider if this is the desired behavior.

2. Error iterations still affect timing (edge case)
If an iteration errors, cb_dt becomes None, but the next successful iteration will have no prior duration to record. This means:

Iteration N errors → iteration_dt = None
Iteration N+1 succeeds → records its duration
But iteration N's timing impact (failed work) isn't captured

This is likely acceptable since we don't want to skew metrics with error cases, but worth documenting.

Performance ✅

No performance concerns. The change:

Uses zero-cost abstractions (Option, .then())
Maintains the existing parallelization strategy
Adds negligible overhead (one is_ok() check)

Security ✅

No security concerns identified.

Test Coverage ⚠️

Recommendation: Add a test case that verifies error iterations don't record metrics. The existing loop_test.rs only tests successful iterations.

Suggested test scenario:

// Test that error iterations don't pollute LOOP_ITERATION_DURATION
// 1. Run loop with iterations that error periodically
// 2. Verify only successful iterations contribute to metric

Metrics Semantics 📊

The metric gasoline_loop_iteration_duration now measures:

Before: Time elapsed for all iterations (including errors)
After: Time elapsed only for successful iterations

This is a breaking change in metric semantics. Consider:

Documenting this in the metric description (currently "Total duration of a single loop iteration")
Alerting teams that monitor this metric about the behavior change
Possibly renaming to gasoline_loop_iteration_duration_success for clarity (optional)

Recommendation: Approve with minor suggestions ✅

The fix correctly addresses the issue. Consider:

Adding a test for error case metric behavior
Updating metric documentation to clarify it only measures successful iterations
Verifying the first-iteration behavior is intentional (or adjust if needed)

Great work on the surgical fix! The use of Option and conditional timing is the right approach.

pkg-pr-new · 2026-01-09T00:02:47Z

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3796

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3796

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3796

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3796

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3796

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3796

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3796

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@3796

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3796

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3796

commit: a6a1a86

MasterPtato requested a review from NathanFlurry January 9, 2026 00:00

fix(gas): dont include error iters in metric

a6a1a86

MasterPtato force-pushed the 01-08-fix_gas_dont_include_error_iters_in_metric branch from ce9397d to a6a1a86 Compare January 9, 2026 00:00

MasterPtato mentioned this pull request Jan 9, 2026

fix: remove pending actors metric, fix actor error tracker, engine runner error print #3797

Open

railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3796 January 9, 2026 00:00 Destroyed

MasterPtato closed this Jan 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(gas): dont include error iters in metric #3796

fix(gas): dont include error iters in metric #3796

Uh oh!

MasterPtato commented Jan 9, 2026

Uh oh!

railway-app bot commented Jan 9, 2026 •

edited

Loading

Uh oh!

MasterPtato commented Jan 9, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 9, 2026

Uh oh!

claude bot commented Jan 9, 2026

Uh oh!

pkg-pr-new bot commented Jan 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(gas): dont include error iters in metric #3796

fix(gas): dont include error iters in metric #3796

Uh oh!

Conversation

MasterPtato commented Jan 9, 2026

Uh oh!

railway-app bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MasterPtato commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

claude bot commented Jan 9, 2026

PR Review: fix(gas): dont include error iters in metric

Summary

Changes Analysis

✅ Positives

🔍 Code Quality Observations

🐛 Potential Issues

🔒 Security Considerations

⚡ Performance Considerations

🧪 Test Coverage

📝 Suggestions

✅ Verdict

Uh oh!

claude bot commented Jan 9, 2026

Pull Request Review

Summary

Changes

Code Quality ✅

Potential Issues 🔍

Performance ✅

Security ✅

Test Coverage ⚠️

Metrics Semantics 📊

Recommendation: Approve with minor suggestions ✅

Uh oh!

pkg-pr-new bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

railway-app bot commented Jan 9, 2026 •

edited

Loading

MasterPtato commented Jan 9, 2026 •

edited

Loading

pkg-pr-new bot commented Jan 9, 2026 •

edited

Loading