Skip to content

Conversation

@NathanFlurry
Copy link
Member

No description provided.

@railway-app
Copy link

railway-app bot commented Jan 9, 2026

🚅 Deployed to the rivet-pr-3800 environment in rivet-frontend

Service Status Web Updated (UTC)
website 😴 Sleeping (View Logs) Web Jan 9, 2026 at 2:39 am
frontend-cloud-staging ❌ Build Failed (View Logs) Web Jan 9, 2026 at 2:30 am
frontend-cloud-prod ❌ Build Failed (View Logs) Web Jan 9, 2026 at 2:29 am
frontend-inspector ❌ Build Failed (View Logs) Web Jan 9, 2026 at 2:29 am

🚅 Deployed to the rivet-pr-3800 environment in rivet-frontend-website-preview

Service Status Web Updated (UTC)
frontend-cloud ◻️ Removed (View Logs) Jan 9, 2026 at 5:01 pm

Copy link
Member Author

NathanFlurry commented Jan 9, 2026


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@NathanFlurry NathanFlurry changed the base branch from 01-08-chore_examples_update_examples_to_standardized_format to graphite-base/3800 January 9, 2026 02:28
@NathanFlurry NathanFlurry force-pushed the 01-08-fix_rivetkit_fix_runner_reconnection_bug branch from 223a3c5 to 1c0e2ca Compare January 9, 2026 02:28
@NathanFlurry NathanFlurry changed the base branch from graphite-base/3800 to main January 9, 2026 02:28
@claude
Copy link

claude bot commented Jan 9, 2026

Code Review: Runner Reconnection Bug Fix

This PR addresses a critical bug in the runner reconnection logic by fixing duplicate WebSocket connections and race conditions during reconnection. The changes are well-targeted and improve the robustness of the WebSocket lifecycle management.


Summary of Changes

  1. Visibility modifier change: Changed __pegboardWebSocket from using double underscores to private field syntax (#pegboardWebSocket)
  2. Duplicate WebSocket prevention: Added assertion to close existing WebSocket before creating a new one (lines 719-729)
  3. Reconnect timeout cleanup: Added cleanup of previous reconnect timeout before scheduling a new one (lines 1797-1802)

Positive Aspects ✅

  1. Proper encapsulation: Converting from __pegboardWebSocket to #pegboardWebSocket uses proper JavaScript private field syntax, which is more robust than convention-based privacy with double underscores.

  2. Race condition mitigation: The duplicate WebSocket check (lines 719-729) prevents multiple concurrent WebSocket connections, which could cause message routing issues and resource leaks.

  3. Reconnect cleanup: Clearing the previous reconnect timeout (lines 1797-1802) prevents multiple reconnection attempts from being scheduled simultaneously.

  4. Good logging: The error logging when duplicate WebSockets are found helps with debugging and observability.


Concerns and Recommendations ⚠️

1. Type Guard Inconsistency (Medium Priority)

Location: Lines 1687-1695

The __webSocketReady() type guard still references the old property name in its type predicate:

__webSocketReady(): this is this & {
    __pegboardWebSocket: NonNullable<Runner["#pegboardWebSocket"]>;
} {
    return (
        \!\!this.#pegboardWebSocket &&
        this.#pegboardWebSocket.readyState === 1
    );
}

Issue: The type guard returns a type that claims __pegboardWebSocket exists, but the actual property is #pegboardWebSocket. This creates a type system inconsistency.

Recommendation: This appears to be an intentional design where the public-facing type guard exposes the private field with a different name. However, this is unusual. Consider either:

  • Removing the type guard entirely and just using if (this.__webSocketReady()) checks without relying on type narrowing
  • Or documenting why this pattern is used

2. Potential Race Condition in Duplicate Check (Low-Medium Priority)

Location: Lines 719-729

if (
    this.#pegboardWebSocket &&
    (this.#pegboardWebSocket.readyState === WS.CONNECTING ||
        this.#pegboardWebSocket.readyState === WS.OPEN)
) {
    this.log?.error(
        "found duplicate pegboardWebSocket, closing previous",
    );
    this.#pegboardWebSocket.close(1000, "duplicate_websocket");
}

Issue: The close() call is asynchronous - it doesn't immediately set readyState to CLOSED. There's a small window where:

  1. First call to #openPegboardWebSocket() starts
  2. It calls .close() on existing WebSocket
  3. Second call to #openPegboardWebSocket() starts before the first one's WebSocket is fully closed
  4. Both create new WebSocket instances

Recommendation: Consider adding await logic or a guard flag to prevent concurrent calls to #openPegboardWebSocket():

#isConnecting: boolean = false;

async #openPegboardWebSocket() {
    if (this.#isConnecting) {
        this.log?.warn("connection already in progress, skipping");
        return;
    }
    
    this.#isConnecting = true;
    try {
        // ... existing code
    } finally {
        this.#isConnecting = false;
    }
}

3. Missing Test Coverage (Medium Priority)

Observation: No test files were found for the runner module.

Recommendation: The reconnection logic is complex and critical. Consider adding integration tests that verify:

  • Duplicate WebSocket prevention works correctly
  • Reconnect timeout cleanup prevents double scheduling
  • WebSocket lifecycle transitions are handled properly
  • Edge cases like rapid connect/disconnect cycles

4. Error vs Info Logging (Low Priority)

Location: Line 725

this.log?.error(
    "found duplicate pegboardWebSocket, closing previous",
);

Issue: This is logged as an error, but after this fix, it might be expected behavior during reconnection scenarios.

Recommendation: Consider downgrading to warn or info level, or add context about whether this is expected vs. unexpected:

this.log?.warn(
    "found duplicate pegboardWebSocket during reconnection, closing previous",
);

5. Timeout Cleanup Pattern (Low Priority)

Location: Lines 1797-1802

Observation: The reconnect timeout cleanup is good, but similar patterns exist elsewhere (runner lost timeout, ack interval, etc.)

Recommendation: The codebase already handles this consistently in the shutdown() method. This change aligns with that pattern, which is good.


Security Considerations 🔒

No security concerns identified. The changes improve robustness without introducing vulnerabilities.


Performance Considerations ⚡

Positive impact:

  • Prevents resource leaks from duplicate WebSocket connections
  • Prevents unnecessary reconnection attempts from stacking up

No negative impact detected


Verdict

Overall Assessment: ✅ Approve with suggestions

This is a solid bug fix that addresses real issues in the reconnection logic. The main concern is the type guard inconsistency, which should be reviewed for correctness. The potential race condition in duplicate checking is worth considering for a future improvement, but the current fix is already much better than before.

Recommended Next Steps:

  1. Review the type guard return type (high priority)
  2. Consider adding a connection guard flag to prevent concurrent connection attempts (medium priority)
  3. Add test coverage for reconnection scenarios (medium priority)

Review generated by Claude Code following CLAUDE.md conventions

@claude
Copy link

claude bot commented Jan 9, 2026

Pull Request Review: fix(rivetkit): fix runner reconnection bug

Summary

This PR fixes a WebSocket reconnection bug by:

  1. Making the #pegboardWebSocket field properly private (using # prefix)
  2. Adding duplicate WebSocket detection and cleanup logic
  3. Adding duplicate reconnect timeout cleanup

Code Quality ✅

Positive Changes:

  • Proper encapsulation: Converting __pegboardWebSocket to #pegboardWebSocket follows JavaScript private field conventions and improves encapsulation
  • Defensive programming: The duplicate WebSocket check (lines 720-729) prevents a race condition where reconnection logic could create multiple active WebSocket connections
  • Consistent cleanup: The reconnect timeout cleanup (lines 1797-1802) prevents duplicate timeouts from accumulating

Potential Issues ⚠️

  1. Type Guard Issue (engine/sdks/typescript/runner/src/mod.ts:1689)

    The type guard __webSocketReady() has an incorrect return type annotation:

    __webSocketReady(): this is this & {
        __pegboardWebSocket: NonNullable<Runner["#pegboardWebSocket"]>;
    } {

    Problem: The property name in the type annotation is still __pegboardWebSocket but the actual field is now #pegboardWebSocket. This creates a mismatch between the type system and runtime behavior.

    Impact: This likely causes TypeScript errors or requires type assertions elsewhere. The type guard should either:

    • Use a different public property name in the type guard
    • Or remove this type guard and use simple boolean returns

    Recommendation: Since this is a private field, the type guard should probably be:

    __webSocketReady(): boolean {
        return (
            !!this.#pegboardWebSocket &&
            this.#pegboardWebSocket.readyState === 1
        );
    }

    Then callers should do their own null checks if needed.

  2. Race Condition in Duplicate Detection (lines 720-729)

    The duplicate WebSocket check happens synchronously before creating the new WebSocket, but there's still a potential race if #openPegboardWebSocket() is called concurrently (e.g., from reconnect timeout and manual reconnect):

    // Thread A checks - no duplicate
    // Thread B checks - no duplicate  
    // Thread A creates WS
    // Thread B creates WS - duplicate!

    Recommendation: Consider adding a connection state flag:

    #connecting: boolean = false;
    
    async #openPegboardWebSocket() {
        if (this.#connecting) {
            this.log?.warn("WebSocket connection already in progress");
            return;
        }
        this.#connecting = true;
        try {
            // ... existing logic
        } finally {
            this.#connecting = false;
        }
    }
  3. Log Level Inconsistency

    Line 726: Uses this.log?.error() for finding a duplicate WebSocket
    Line 1799: Uses this.log?.info() for clearing a duplicate reconnect timeout

    Both are similar duplicate detection scenarios. Consider using consistent log levels (probably warn for both).

Performance Considerations ✅

No significant performance concerns. The duplicate detection adds minimal overhead and prevents resource leaks.

Security Concerns ✅

No security issues identified. The changes are purely internal implementation details.

Test Coverage ⚠️

Missing Tests: No test files found for the runner module. Given that this fixes a reconnection bug, it would be valuable to add tests covering:

  • Normal reconnection flow
  • Preventing duplicate WebSocket connections
  • Preventing duplicate reconnect timeouts
  • WebSocket lifecycle edge cases

Best Practices

Following Repository Conventions: ✅

  • Uses structured logging with tracing
  • Follows lowercase log message convention
  • No glob imports from anyhow (not applicable here)

Code Readability: ✅

  • Clear variable names
  • Reasonable code comments explaining the duplicate detection

Recommendation

Status: Approve with minor changes requested

Required Changes:

  1. Fix the type guard return type at line 1689 (critical)

Suggested Improvements:
2. Add connection state flag to prevent race conditions
3. Standardize log levels for duplicate detection
4. Add test coverage for reconnection scenarios

The core fix (duplicate WebSocket detection) is sound and addresses a real bug. The type guard issue should be fixed before merging to prevent TypeScript compilation issues.

@claude
Copy link

claude bot commented Jan 9, 2026

PR Review: fix(rivetkit): fix runner reconnection bug

Summary

This PR fixes a reconnection bug in the runner by making the following changes:

  1. Changes __pegboardWebSocket from public to private (#pegboardWebSocket)
  2. Adds assertion logic to detect and close duplicate WebSocket connections
  3. Clears previous reconnect timeouts before scheduling new ones

Code Quality ✅

Positive changes:

  • Encapsulation improvement: Converting __pegboardWebSocket from public (__) to private (#) is a good practice that properly encapsulates internal state
  • Defensive programming: The duplicate WebSocket check at mod.ts:720-729 is a good safety measure that prevents resource leaks
  • Timeout management: Clearing previous reconnect timeouts at mod.ts:1797-1802 prevents potential timeout accumulation

Potential Issues 🔍

1. Type Declaration Inconsistency (Critical)

Location: mod.ts:1686-1695

The type guard __webSocketReady() still references the OLD property name in its type signature:

__webSocketReady(): this is this & {
    __pegboardWebSocket: NonNullable<Runner["#pegboardWebSocket"]>;  // ⚠️ Incorrect
}

Problem: This creates a type mismatch. The type guard asserts that __pegboardWebSocket exists, but the actual property is #pegboardWebSocket.

Impact: Code using this type guard expects this.__pegboardWebSocket to be available, but:

  • The property doesn't exist at runtime
  • TypeScript might not catch errors where code tries to access this.__pegboardWebSocket after the type guard

Recommended fix:

__webSocketReady(): this is this & {
    '#pegboardWebSocket': NonNullable<Runner["#pegboardWebSocket"]>;
}

Or better yet, since the WebSocket is now private, remove the type assertion entirely and just make it a simple boolean check:

#webSocketReady(): boolean {
    return !!this.#pegboardWebSocket && this.#pegboardWebSocket.readyState === 1;
}

2. Error Logging Level

Location: mod.ts:725-727

this.log?.error("found duplicate pegboardWebSocket, closing previous");

Consideration: Using error level for this scenario might be appropriate if it's truly unexpected, but if reconnection races are possible during normal operation, consider using warn level instead. The current implementation treats this as an error, but it's handled gracefully, so it might not warrant error-level logging.

3. Missing Context in Logs

Location: mod.ts:1798-1800

this.log?.info("clearing previous reconnect timeout in schedule reconnect");

Suggestion: Add context about why this is happening:

this.log?.info({
    msg: "clearing previous reconnect timeout to avoid duplicates",
    reconnectAttempt: this.#reconnectAttempt,
});

This follows the structured logging pattern used elsewhere in the codebase.

Performance Considerations ✅

No performance concerns. The changes are minimal and only add safety checks.

Security Concerns ✅

Positive: Making #pegboardWebSocket private reduces the attack surface by preventing external access to the WebSocket instance.

Test Coverage ⚠️

Missing: No test changes were included in this PR. Consider adding tests for:

  1. Duplicate WebSocket detection and cleanup
  2. Reconnect timeout clearing behavior
  3. Verify the WebSocket state transitions during reconnection

Additional Observations

Consistency in Property Access

All references to __pegboardWebSocket have been updated to #pegboardWebSocket, which is good. The changes are consistent throughout the file at lines:

  • 204 (declaration)
  • 532, 593 (references in shutdown)
  • 720-728 (duplicate check)
  • 732 (assignment)
  • 1689, 1692-1693 (type guard implementation)
  • 1705 (send method)

Recommendations

  1. Fix the type guard (Critical): Update __webSocketReady() type signature to correctly reference the private field or convert to a simple boolean method
  2. Add tests: Include test coverage for the bug fix scenarios
  3. Improve logging: Use structured logging consistently
  4. Consider log level: Evaluate if duplicate WebSocket should be warn instead of error

Verdict

This PR addresses a real reconnection bug with good defensive programming practices, but has a critical type system issue that should be fixed before merging. The type guard's signature doesn't match the renamed private field, which could lead to runtime errors.


Suggested priority: Fix the type guard issue, then merge. The reconnection fixes are valuable and the encapsulation improvement is good practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants