Skip to content

Conversation

@nicktrn
Copy link
Collaborator

@nicktrn nicktrn commented Sep 18, 2025

Other changes:

  • supervisor: use lightweight mock route when debug logs are disabled

@changeset-bot
Copy link

changeset-bot bot commented Sep 18, 2025

🦋 Changeset detected

Latest commit: 9716038

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 23 packages
Name Type
trigger.dev Patch
@trigger.dev/core Patch
d3-chat Patch
references-d3-openai-agents Patch
references-nextjs-realtime Patch
@trigger.dev/build Patch
@trigger.dev/python Patch
@trigger.dev/redis-worker Patch
@trigger.dev/schema-to-json Patch
@trigger.dev/sdk Patch
@internal/cache Patch
@internal/clickhouse Patch
@internal/redis Patch
@internal/replication Patch
@internal/run-engine Patch
@internal/schedule-engine Patch
@internal/testcontainers Patch
@internal/tracing Patch
@internal/zod-worker Patch
@trigger.dev/react-hooks Patch
@trigger.dev/rsc Patch
@trigger.dev/database Patch
@trigger.dev/otlp-importer Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 18, 2025

Walkthrough

The PR adds a changeset for patch releases. In the supervisor app, the debug-logs route is gated by an env flag with two route variants (real vs. mock) and minor server construction refactor. In CLI v3, controller reconnection logic is revised to act on runner/supervisor changes and re-subscribe selectively; automatic resubscribe on connect is removed. Execution gains connection-error awareness, retries on restore, and expands processEnvOverrides to return runnerIdChanged/supervisorChanged. Snapshot manager adds de-duplication for deprecated detections and new tests (duplicated block noted). Core workload HTTP client centralizes connection-error detection and adds timeouts to heartbeat/snapshots.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–75 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description is incomplete and does not follow the repository template: it only notes a single "Other changes" line about the supervisor mock route and omits the required "Closes #" line, the checklist, Testing, Changelog, and Screenshots sections, and it fails to summarize other significant edits present in the diff such as CLI restore/recovery behavior, the processEnvOverrides signature change, snapshot de-duplication and tests, and workload client connection-error handling. Please update the PR description to use the repository template: add "Closes #" (or N/A), complete the checklist, provide concrete testing steps and results, include a short changelog summarizing all major changes (CLI restore/reconnect and signature change, snapshot de-duplication and tests, workload client connection-error handling, supervisor mock route, and the changeset file), and attach any relevant test logs or screenshots.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "fix(runner): reduce restore recovery time and deprecated runner false positives" is concise, follows the conventional prefix, and accurately captures the primary intent of the changeset (faster restore/reconnect handling and fixes to avoid deprecated-runner false positives as reflected in the CLI, snapshot, and related modules), so it appropriately summarizes the main change for a reviewer scanning history.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/restore-detection

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/cli-v3/src/entryPoints/managed/controller.ts (1)

504-510: WorkloadHttpClient updates OK; add guarded re-subscribe on socket reconnect

  • WorkloadHttpClient supports dynamic updates (updateApiUrl/updateRunnerId) and execution.processEnvOverrides already calls them — no need to recreate the client or ManagedRunLogger. (packages/cli-v3/src/entryPoints/managed/execution.ts ≈ lines 974–978; ManagedRunLogger holds the httpClient reference in packages/cli-v3/src/entryPoints/managed/logger.ts).
  • createSupervisorSocket's "connect" handler only logs and does not re-subscribe. Add a guarded re-subscribe on connect when an execution is active (e.g. if this.runFriendlyId && this.snapshotFriendlyId then call this.subscribeToRunNotifications(...)) to handle servers that don’t retain subscriptions across reconnects. (packages/cli-v3/src/entryPoints/managed/controller.ts — inside createSupervisorSocket socket.on("connect")).
🧹 Nitpick comments (7)
packages/cli-v3/src/entryPoints/managed/snapshot.ts (2)

52-55: Use a Set for O(1) de‑dup and simpler pruning.

Array scans are fine at size 50, but a Set is cleaner and future‑proof.

Apply:

-  // Track seen deprecated snapshots to prevent false positives
-  private seenDeprecatedSnapshotIds: string[] = [];
-  private readonly maxSeenDeprecatedSnapshotIds = 50;
+  // Track seen deprecated snapshots to prevent false positives
+  private seenDeprecatedSnapshotIds = new Set<string>();
+  private readonly maxSeenDeprecatedSnapshotIds = 50;

291-297: Align filtering and pruning with Set semantics.

Switch membership checks/additions to Set and prune in insertion order.

-        const deprecatedSnapshots = previousSnapshots.filter((snap) => {
-          const isDeprecated = deprecatedStatus.includes(snap.snapshot.executionStatus);
-          const previouslySeen = this.seenDeprecatedSnapshotIds.some(
-            (s) => s === snap.snapshot.friendlyId
-          );
-          return isDeprecated && !previouslySeen;
-        });
+        const deprecatedSnapshots = previousSnapshots.filter((snap) => {
+          const isDeprecated = deprecatedStatus.includes(snap.snapshot.executionStatus);
+          const previouslySeen = this.seenDeprecatedSnapshotIds.has(snap.snapshot.friendlyId);
+          return isDeprecated && !previouslySeen;
+        });
@@
-          // Add the deprecated snapshot IDs to the seen list
-          this.seenDeprecatedSnapshotIds.push(
-            ...deprecatedSnapshots.map((s) => s.snapshot.friendlyId)
-          );
-
-          if (this.seenDeprecatedSnapshotIds.length > this.maxSeenDeprecatedSnapshotIds) {
-            // Only keep the latest maxSeenDeprecatedSnapshotIds
-            this.seenDeprecatedSnapshotIds = this.seenDeprecatedSnapshotIds.slice(
-              -this.maxSeenDeprecatedSnapshotIds
-            );
-          }
+          // Add the deprecated snapshot IDs to the seen set
+          for (const id of deprecatedSnapshots.map((s) => s.snapshot.friendlyId)) {
+            this.seenDeprecatedSnapshotIds.add(id);
+          }
+          // Prune to most recent N (Set keeps insertion order)
+          while (this.seenDeprecatedSnapshotIds.size > this.maxSeenDeprecatedSnapshotIds) {
+            const first = this.seenDeprecatedSnapshotIds.values().next().value as string | undefined;
+            if (!first) break;
+            this.seenDeprecatedSnapshotIds.delete(first);
+          }

Also applies to: 310-320

packages/core/src/v3/runEngineWorker/workload/http.ts (1)

55-69: Extend the wrapper to start/complete as well and add timeouts for parity.

Start/complete failures from network issues should also surface isConnectionError; adding timeouts keeps all calls bounded.

Outside the selected lines, consider:

   async startRunAttempt(
     runId: string,
     snapshotId: string,
     body: WorkloadRunAttemptStartRequestBody
   ) {
-    return wrapZodFetch(
+    return this.withConnectionErrorDetection(() =>
+      wrapZodFetch(
         WorkloadRunAttemptStartResponseBody,
         `${this.apiUrl}/api/v1/workload-actions/runs/${runId}/snapshots/${snapshotId}/attempts/start`,
         {
           method: "POST",
           headers: {
             ...this.defaultHeaders(),
             },
           body: JSON.stringify(body),
+          signal: AbortSignal.timeout(10_000),
         }
-    );
+      )
+    );
   }
@@
   async completeRunAttempt(
     runId: string,
     snapshotId: string,
     body: WorkloadRunAttemptCompleteRequestBody
   ) {
-    return wrapZodFetch(
+    return this.withConnectionErrorDetection(() =>
+      wrapZodFetch(
         WorkloadRunAttemptCompleteResponseBody,
         `${this.apiUrl}/api/v1/workload-actions/runs/${runId}/snapshots/${snapshotId}/attempts/complete`,
         {
           method: "POST",
           headers: {
             ...this.defaultHeaders(),
           },
           body: JSON.stringify(body),
+          signal: AbortSignal.timeout(10_000),
         }
-    );
+      )
+    );
   }
packages/cli-v3/src/entryPoints/managed/execution.ts (2)

918-990: Good: env overrides now signal runner/supervisor changes. Consider immediate re‑poll toggle.

You already support shouldPollForSnapshotChanges; wire it from callers that refresh due to errors to reduce recovery latency.


1234-1243: Trigger an immediate re‑poll after refreshing metadata on snapshot fetch errors.

This shortens recovery when URLs/runner IDs change.

-      // Always trigger metadata refresh on snapshot fetch errors
-      await this.processEnvOverrides("snapshots since error");
+      // Always trigger metadata refresh on snapshot fetch errors, then re-poll once
+      await this.processEnvOverrides("snapshots since error", true);
packages/cli-v3/src/entryPoints/managed/controller.ts (1)

536-553: Current execution is still bound to the old socket after supervisor change

You create a new socket but never rebind currentExecution’s run:notify listeners to it. After removeAllListeners() on the old socket, notifications won’t reach the in-flight execution, which can delay restore recovery and other reactions.

Add a rebind call here and implement it in RunExecution:

           // Create a new socket with the updated URL and headers
           this.socket = this.createSupervisorSocket();

+          // Rebind the in-flight execution to the new socket so run:notify continues to flow
+          this.currentExecution?.rebindSocket(this.socket);

           // Re-subscribe to notifications if we have an active execution
           if (this.runFriendlyId && this.snapshotFriendlyId) {
             this.subscribeToRunNotifications(this.runFriendlyId, this.snapshotFriendlyId);
           }

And in RunExecution (example, adjust to your structure):

// packages/cli-v3/src/entryPoints/managed/execution.ts
export class RunExecution {
  private supervisorSocket: SupervisorSocket;
  // Ensure the notify handler is a stable reference
  private readonly onRunNotify = (payload: unknown) => this.handleRunNotify(payload);

  public rebindSocket(socket: SupervisorSocket) {
    // Detach from old socket if present
    this.supervisorSocket?.off?.("run:notify", this.onRunNotify);
    // Attach to new socket
    this.supervisorSocket = socket;
    this.supervisorSocket.on("run:notify", this.onRunNotify);
  }

  private handleRunNotify(payload: unknown) {
    // existing logic…
  }
}
packages/cli-v3/src/entryPoints/managed/snapshot.test.ts (1)

701-758: Test looks good; add call-count assertions to harden against regressions

Tighten expectations so we catch extra/deferred handler calls and verify a single metadata lookup.

     // First call should be deprecated=false (restore detected)
     expect(onSnapshotChange).toHaveBeenCalledWith(
       expect.objectContaining({ snapshot: expect.objectContaining({ friendlyId: "snapshot-2" }) }),
       false
     );
+    expect(onSnapshotChange).toHaveBeenCalledTimes(1);
+    expect(mockMetadataClient.getEnvOverrides).toHaveBeenCalledTimes(1);

     onSnapshotChange.mockClear();

@@
     // Should call onSnapshotChange with deprecated = false (no new deprecated snapshots)
     expect(onSnapshotChange).toHaveBeenCalledWith(
       expect.objectContaining({ snapshot: expect.objectContaining({ friendlyId: "snapshot-3" }) }),
       false
     );
+    expect(onSnapshotChange).toHaveBeenCalledTimes(1);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3ef6ea and 9716038.

📒 Files selected for processing (7)
  • .changeset/six-cougars-play.md (1 hunks)
  • apps/supervisor/src/workloadServer/index.ts (2 hunks)
  • packages/cli-v3/src/entryPoints/managed/controller.ts (2 hunks)
  • packages/cli-v3/src/entryPoints/managed/execution.ts (6 hunks)
  • packages/cli-v3/src/entryPoints/managed/snapshot.test.ts (1 hunks)
  • packages/cli-v3/src/entryPoints/managed/snapshot.ts (3 hunks)
  • packages/core/src/v3/runEngineWorker/workload/http.ts (3 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Always prefer using isomorphic code like fetch, ReadableStream, etc. instead of Node.js specific code
For TypeScript, we usually use types over interfaces
Avoid enums
No default exports, use function declarations

Files:

  • packages/cli-v3/src/entryPoints/managed/snapshot.ts
  • packages/cli-v3/src/entryPoints/managed/snapshot.test.ts
  • packages/cli-v3/src/entryPoints/managed/execution.ts
  • apps/supervisor/src/workloadServer/index.ts
  • packages/cli-v3/src/entryPoints/managed/controller.ts
  • packages/core/src/v3/runEngineWorker/workload/http.ts
**/*.test.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Our tests are all vitest

Files:

  • packages/cli-v3/src/entryPoints/managed/snapshot.test.ts
**/*.{test,spec}.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{test,spec}.{ts,tsx,js,jsx}: Unit tests must use Vitest
Tests should avoid mocks or stubs and use helpers from @internal/testcontainers when Redis or Postgres are needed
Test files live beside the files under test and should use descriptive describe and it blocks

Files:

  • packages/cli-v3/src/entryPoints/managed/snapshot.test.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

We use zod a lot in packages/core and in the webapp

Files:

  • packages/core/src/v3/runEngineWorker/workload/http.ts
🧬 Code graph analysis (4)
packages/cli-v3/src/entryPoints/managed/snapshot.test.ts (1)
packages/cli-v3/src/entryPoints/managed/snapshot.ts (1)
  • SnapshotManager (36-416)
packages/cli-v3/src/entryPoints/managed/execution.ts (1)
packages/cli-v3/src/entryPoints/managed/overrides.ts (1)
  • Metadata (1-13)
apps/supervisor/src/workloadServer/index.ts (3)
packages/core/src/v3/serverOnly/httpServer.ts (1)
  • HttpServer (64-379)
apps/supervisor/src/env.ts (1)
  • env (118-118)
packages/core/src/v3/runEngineWorker/workload/schemas.ts (2)
  • WorkloadDebugLogRequestBody (61-61)
  • WorkloadDebugLogRequestBody (62-62)
packages/core/src/v3/runEngineWorker/workload/http.ts (1)
packages/core/src/v3/apiClient/core.ts (1)
  • wrapZodFetch (727-767)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (7)
.changeset/six-cougars-play.md (1)

1-6: Changeset looks correct for a patch release.

Packages and summary align with the code changes.

apps/supervisor/src/workloadServer/index.ts (1)

389-395: Lightweight mock route is a good trade‑off.

Keeps the same path with minimal overhead when debug logs are disabled.

packages/core/src/v3/runEngineWorker/workload/http.ts (3)

92-107: Heartbeat wrapper + timeout: LGTM.

Consistent shape with isConnectionError and a sane 10s timeout.


123-135: Continue wrapper: LGTM.

Good to surface connection errors here.


175-187: Snapshots since wrapper + timeout: LGTM.

Matches heartbeat behavior and enables upstream recovery.

packages/cli-v3/src/entryPoints/managed/execution.ts (2)

875-893: Restore retry on connection error: LGTM.

Clear, single retry after metadata refresh.


1014-1019: Heartbeat connection‑error refresh: LGTM.

Keeps metadata in sync after transient failures.

@nicktrn nicktrn merged commit db87295 into main Sep 18, 2025
32 checks passed
@nicktrn nicktrn deleted the fix/restore-detection branch September 18, 2025 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants