fix(workflow-engine): replay completed void steps on restart by NathanFlurry · Pull Request #4186 · rivet-dev/rivet

NathanFlurry · 2026-02-12T10:49:58Z

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

railway-app · 2026-02-12T10:50:14Z

🚅 Deployed to the rivet-pr-4186 environment in rivet-frontend

Service	Status	Web	Updated (UTC)
website	😴 Sleeping (View Logs)	Web	Feb 13, 2026 at 9:49 pm
frontend-inspector	❌ Build Failed (View Logs)	Web	Feb 12, 2026 at 10:51 am
frontend-cloud	❌ Build Failed (View Logs)	Web	Feb 12, 2026 at 10:51 am
mcp-hub	✅ Success (View Logs)	Web	Feb 12, 2026 at 10:51 am
ladle	❌ Build Failed (View Logs)	Web	Feb 12, 2026 at 10:50 am

NathanFlurry · 2026-02-12T10:50:15Z

fix(workflow-engine): replay completed void steps on restart #4186 👈 (View in Graphite)
main

How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2026-02-12T10:51:51Z

Code Review

Summary

This PR fixes a critical bug in the workflow engine where steps that return undefined (void steps) were being incorrectly re-executed on workflow restart. The fix properly checks the step's metadata status instead of relying solely on the presence of output data.

Positive Aspects ✅

Root Cause Analysis: The fix correctly identifies that JSON serialization omits undefined values, making stepData.output !== undefined an unreliable completion check for void steps.
Clear Documentation: The added comment at context.ts:385-388 provides excellent context explaining why this check is necessary and the technical rationale.
Proper Metadata Usage: Using metadata.status === "completed" is the correct approach, as metadata is the source of truth for step execution state.
Test Coverage: The new test case thoroughly validates the fix by:
- Creating a void step
- Running it twice
- Verifying the step only executes once (callCount remains 1)
- Testing with both workflow modes
Consistent Logging: The debug log at context.ts:396-400 follows the existing logging pattern and will aid debugging.

Potential Concerns 🔍

1. Performance Consideration (Minor)

Location: context.ts:389-393

The fix loads metadata for every step replay, even when stepData.output !== undefined. This means successful steps with defined outputs now make an extra metadata lookup.

Current flow:

// Line 378: Fast path for steps with defined output
if (stepData.output !== undefined) {
    return stepData.output as T;
}

// Line 389: Loads metadata for ALL remaining steps
const metadata = await loadMetadata(...);

// Line 395: Checks completed status for void steps
if (metadata.status === "completed") {
    return stepData.output as T;
}

Consideration: The metadata load happens await loadMetadata() for every step that doesn't have a defined output, which could include:

Failed steps (needing retry checks)
Void steps (the bug being fixed)

Question: Is there a concern about the performance impact of loading metadata for void steps? The loadMetadata function does cache results in storage.entryMetadata, so subsequent calls for the same step should be fast. However, on first access after restart, this adds a driver read operation for each void step.

Potential optimization (if needed):

// Fast path: output exists AND not explicitly undefined in serialization
if (stepData.output !== undefined) {
    return stepData.output as T;
}

// Load metadata once for both void step check and retry logic
const metadata = await loadMetadata(...);

// Check if completed (handles void steps)
if (metadata.status === "completed") {
    return stepData.output as T;
}

// Continue with retry logic...

This is already what the PR does, so the implementation is optimal given the constraint. ✅

2. Edge Case: What if metadata is missing?

Location: context.ts:389-402

The loadMetadata function creates new metadata if none exists (based on the partial code I saw). If metadata creation returns a non-"completed" status for a step that actually completed, the void step would re-execute.

Question: Are there any scenarios where:

A step completes successfully
Its history entry is persisted with output: undefined
But its metadata is missing or has status !== "completed"?

This seems unlikely in normal operation, but could occur if:

There's a partial write failure (history saved but metadata not)
Metadata is manually deleted
There's a bug in metadata persistence

Recommendation: Consider if this scenario needs defensive handling or is considered an invalid/corrupted state that should fail loudly.

3. Test Coverage Gap (Minor)

The new test validates restart behavior, but doesn't explicitly test:

A void step that fails and retries
A void step that exhausts retries
Mixed workflows with both void and non-void steps

Recommendation: Consider adding tests for:

it("should retry failed void steps", async () => {
  let attempts = 0;
  const workflow = async (ctx: WorkflowContextInterface) => {
    await ctx.step("failing-void", async () => {
      attempts++;
      if (attempts < 2) throw new Error("fail");
      // return void on success
    });
  };
  // ... test that it retries and eventually succeeds
});

Code Style ✅

Comment style follows CLAUDE.md guidelines (complete sentences)
Logging follows structured logging patterns
Test naming is clear and descriptive

Security & Correctness ✅

No security concerns identified
The fix is logically sound and addresses the root cause
No risk of breaking existing functionality (non-void steps continue to use fast path)

Recommendation

Approve with minor suggestions

The fix is correct and well-implemented. The performance impact is minimal due to metadata caching. Consider:

Adding test coverage for void step retry scenarios (low priority)
Documenting expected behavior when metadata is missing (low priority)

The PR is ready to merge as-is. The bug fix is important and the implementation is sound.

Review generated by Claude Code - Learn more

fix(workflow-engine): replay completed void steps on restart

26c3e27

railway-app bot had a problem deploying to rivet-frontend / rivet-pr-4186 February 12, 2026 10:50 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(workflow-engine): replay completed void steps on restart#4186

fix(workflow-engine): replay completed void steps on restart#4186
NathanFlurry wants to merge 1 commit intomainfrom
02-12-fix-workflow-void-step-replay

NathanFlurry commented Feb 12, 2026

Uh oh!

railway-app bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

NathanFlurry commented Feb 12, 2026

Uh oh!

claude bot commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NathanFlurry commented Feb 12, 2026

Description

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

railway-app bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanFlurry commented Feb 12, 2026

How to use the Graphite Merge Queue

Uh oh!

claude bot commented Feb 12, 2026

Code Review

Summary

Positive Aspects ✅

Potential Concerns 🔍

1. Performance Consideration (Minor)

2. Edge Case: What if metadata is missing?

3. Test Coverage Gap (Minor)

Code Style ✅

Security & Correctness ✅

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

railway-app bot commented Feb 12, 2026 •

edited

Loading