Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

Copy link
Contributor Author

MasterPtato commented Jan 12, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@MasterPtato MasterPtato force-pushed the 01-12-fix_set_failure_reason_regardless_of_crash_policy branch from f7e33e7 to b613527 Compare January 12, 2026 19:11
@MasterPtato MasterPtato force-pushed the 01-09-feat_add_metrics_to_pb_guard branch from 9297e74 to 442651f Compare January 12, 2026 19:11
@claude
Copy link

claude bot commented Jan 12, 2026

PR Review: Fix failure reason setting regardless of crash policy

Overview

This PR refactors how failure reasons are set when Rivet Actors stop, ensuring failure reasons are always recorded regardless of the crash policy. The changes improve observability by guaranteeing that crash/failure information is preserved even when actors are configured to sleep on crash.


✅ Positive Changes

1. Improved Failure Reason Tracking (mod.rs:879-919)

The refactoring correctly moves failure reason setting earlier in the flow and handles all crash types uniformly:

  • Normal crashes with error code: Now properly sets FailureReason::Crashed immediately (lines 889-901)
  • Lost actors: Sets runner-related failure reasons early (lines 907-915) before crash policy evaluation
  • Comment accuracy: The comment on line 908 correctly explains that failure reasons apply to all crash policies

This is a significant improvement - previously, failure reasons for normal crashes were only set in the CrashPolicy::Sleep branch, meaning crashes with CrashPolicy::Restart or other policies would not record the failure reason.

2. Removed Redundant Code (mod.rs:1073)

The ctx.removed activity call is now correctly placed in the CrashPolicy::Sleep branch. This makes sense because failure reasons are now set upfront for all crash types.

3. Fixed Database Query Bug (debug.rs:663-694)

Good catch changing tx.read() to tx.read_opt() and handling the None case. This prevents crashes when workflows are deleted mid-query during debug operations.

4. Removed Premature State Clearing (runtime.rs:1130)

Removing state.failure_reason = None from set_started is correct. The failure reason should only be cleared when an actor successfully allocates, not when it becomes connectable. This preserves failure history during the actor lifecycle.


🔍 Potential Issues & Questions

1. Match Expression Restructuring (mod.rs:879-919)

The match expression is now split into 3 arms instead of 2. While functionally correct, if new StopCode variants are added in the future, this match might not catch them at compile time. A nested match structure could provide better exhaustiveness checking.

2. Test Coverage

I could not find explicit unit tests for the failure reason setting logic. Consider adding tests for:

  • Actors crashing with CrashPolicy::Restart should have failure_reason set
  • Actors crashing with CrashPolicy::Sleep should have failure_reason set
  • Lost actors should have failure_reason set based on the loss reason
  • Normal exits should NOT set a failure reason

3. State Management Consistency

The code now has two places where failure_reason is cleared:

  • Line 137: In set_allocated when actor is allocated
  • Line 478: In reschedule_actor after successful allocation

Question: Are both necessary? It seems like line 478 might be redundant if set_allocated always clears it.


🔒 Security & Performance

Security: ✅ No security concerns identified
Performance: ✅ No performance impact


📋 Code Quality

Adherence to CLAUDE.md:

  • ✅ Logging follows structured logging patterns
  • ✅ Comments are complete sentences and clear
  • ✅ Error handling follows the repository patterns
  • ⚠️ No tests added for the behavioral change

📝 Suggestions

  1. Add integration tests demonstrating that failure reasons are properly set for all crash policies
  2. Document the state lifecycle of failure_reason in the struct docstring to clarify when it is set vs. cleared
  3. Verify the double-clear pattern mentioned above
  4. Consider exhaustive matching for StopCode variants if the enum might expand

Summary

Verdict: ✅ Approve with minor suggestions

This is a solid fix that improves failure tracking across all crash policies. The core logic is sound, and the refactoring makes the code more maintainable. The main suggestion is to add test coverage to prevent regression and document the state lifecycle more clearly.

The bug fix in debug.rs is a good catch that prevents potential crashes. Overall, this PR improves both correctness and observability of the Rivet Actor system.


Reviewed by Claude Sonnet 4.5

@pkg-pr-new
Copy link

pkg-pr-new bot commented Jan 12, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3835

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3835

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3835

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3835

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3835

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3835

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3835

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@3835

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3835

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3835

commit: 980dd0a

@MasterPtato MasterPtato force-pushed the 01-12-fix_set_failure_reason_regardless_of_crash_policy branch from b613527 to b4daf93 Compare January 13, 2026 00:27
@MasterPtato MasterPtato force-pushed the 01-09-feat_add_metrics_to_pb_guard branch 2 times, most recently from 7915d6f to 258ae43 Compare January 13, 2026 00:44
@MasterPtato MasterPtato force-pushed the 01-12-fix_set_failure_reason_regardless_of_crash_policy branch 2 times, most recently from 0b9ca2f to 1a08758 Compare January 13, 2026 01:01
@MasterPtato MasterPtato force-pushed the 01-09-feat_add_metrics_to_pb_guard branch 2 times, most recently from 42cfc6d to 506742d Compare January 13, 2026 01:29
@MasterPtato MasterPtato force-pushed the 01-12-fix_set_failure_reason_regardless_of_crash_policy branch from 1a08758 to 5b1c724 Compare January 13, 2026 01:29
@MasterPtato MasterPtato force-pushed the 01-09-feat_add_metrics_to_pb_guard branch from 506742d to ca4cbfc Compare January 14, 2026 02:05
@MasterPtato MasterPtato force-pushed the 01-12-fix_set_failure_reason_regardless_of_crash_policy branch from 5b1c724 to 8152462 Compare January 14, 2026 02:05
@MasterPtato MasterPtato force-pushed the 01-09-feat_add_metrics_to_pb_guard branch from ca4cbfc to 8102256 Compare January 14, 2026 03:06
@MasterPtato MasterPtato force-pushed the 01-12-fix_set_failure_reason_regardless_of_crash_policy branch from 8152462 to 82ed36d Compare January 14, 2026 03:06
@MasterPtato MasterPtato force-pushed the 01-12-fix_set_failure_reason_regardless_of_crash_policy branch from 82ed36d to f5a4f68 Compare January 14, 2026 19:47
@MasterPtato MasterPtato force-pushed the 01-09-feat_add_metrics_to_pb_guard branch 2 times, most recently from 4b907cb to dc985c0 Compare January 14, 2026 20:01
@MasterPtato MasterPtato force-pushed the 01-12-fix_set_failure_reason_regardless_of_crash_policy branch from f5a4f68 to 9e4ae6f Compare January 14, 2026 20:01
@MasterPtato MasterPtato force-pushed the 01-09-feat_add_metrics_to_pb_guard branch from dc985c0 to f3f7d72 Compare January 14, 2026 22:52
@MasterPtato MasterPtato force-pushed the 01-12-fix_set_failure_reason_regardless_of_crash_policy branch 2 times, most recently from 595d7b8 to 250667d Compare January 14, 2026 23:02
@MasterPtato MasterPtato force-pushed the 01-09-feat_add_metrics_to_pb_guard branch from f3f7d72 to ae1c40c Compare January 14, 2026 23:02
@MasterPtato MasterPtato force-pushed the 01-12-fix_set_failure_reason_regardless_of_crash_policy branch from 250667d to 980dd0a Compare January 14, 2026 23:07
@MasterPtato MasterPtato force-pushed the 01-09-feat_add_metrics_to_pb_guard branch from ae1c40c to 3da23f7 Compare January 14, 2026 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants