Skip to content

Conversation

myftija
Copy link
Member

@myftija myftija commented Oct 6, 2025

The Depot build init with depot.build.v1.BuildService.createBuild fails surprisingly often due to transient errors, causing the whole deployment to fail. This PR adds a simple retry mechanism with backoff using p-retry. This should improve the failure rate.

The Depot build init with `depot.build.v1.BuildService.createBuild`
fails surprisingly often due to transient error. This PR adds a simple retry
mechanism with backoff using `p-retry`.
Copy link

changeset-bot bot commented Oct 6, 2025

⚠️ No Changeset found

Latest commit: 0edcc7d

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

coderabbitai bot commented Oct 6, 2025

Walkthrough

  • Adds logging and braces for two error-handling cases in the deployment progress route: logs a warning on "failed_to_extend_deployment_timeout" while keeping a 204 response; logs an error on "failed_to_create_remote_build" and returns a 500 with a generic message.
  • Adds retry logic around remote build creation using p-retry with configured retries and timeouts, including onFailedAttempt logging.
  • Adds an Authorization header to the remote build request.
  • Imports a logger and p-retry in the remote image builder module.
  • Adds p-retry (^4.6.1) as a runtime dependency in apps/webapp/package.json.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description does not follow the repository’s required template and is missing multiple sections including the Closes statement, the ✅ Checklist, Testing steps, Changelog summary, and Screenshots placeholder, making it incomplete for reviewers. Without these elements the description lacks structured context, validation steps, and tracking information. Therefore it fails to meet the description template requirements. Please update the PR description to include the “Closes #” reference, complete the checklist items, add a Testing section with the steps you performed, include a Changelog entry summarizing the change, and provide Screenshots if applicable.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly and accurately describes the primary change by indicating that deployment logic now retries transient failures when initializing Depot builds, and it follows the conventional commit style without extraneous details. It is clear, concise, and directly related to the main change in the pull request.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-retry-depot-builds

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
apps/webapp/app/v3/remoteImageBuilder.server.ts (1)

18-36: Consider adding AbortError for non-retryable errors.

The retry logic is sound, but retrying all errors indiscriminately may waste resources on non-transient failures. Authentication failures (401, 403) and validation errors (400, 422) should abort immediately rather than retry.

Additionally, the onFailedAttempt callback could log attemptNumber and retriesLeft for better observability.

Apply this diff to handle non-retryable errors and improve logging:

 import pRetry from "p-retry";
+import { AbortError } from "p-retry";
 import { logger } from "~/services/logger.server";
   const result = await pRetry(
-    () =>
+    async () => {
+      try {
+        return await depot.build.v1.BuildService.createBuild(
-      depot.build.v1.BuildService.createBuild(
           { projectId: builderProjectId },
           {
             headers: {
               Authorization: `Bearer ${env.DEPOT_TOKEN}`,
             },
           }
         );
+      } catch (error: any) {
+        // Don't retry authentication or validation errors
+        if (error.code === "UNAUTHENTICATED" || error.code === "PERMISSION_DENIED" || error.code === "INVALID_ARGUMENT") {
+          throw new AbortError(error);
+        }
+        throw error;
+      }
+    },
     {
       retries: 3,
       minTimeout: 200,
       maxTimeout: 2000,
       onFailedAttempt: (error) => {
-        logger.error("Failed attempt to create remote Depot build", { error });
+        logger.error("Failed attempt to create remote Depot build", {
+          error,
+          attemptNumber: error.attemptNumber,
+          retriesLeft: error.retriesLeft,
+        });
       },
     }
   );

Note: Adjust the error code checks based on the actual error structure returned by the Depot SDK. You may need to inspect error.response?.status for HTTP status codes instead.

Based on learnings.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b90f3e2 and 0edcc7d.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (3)
  • apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts (1 hunks)
  • apps/webapp/app/v3/remoteImageBuilder.server.ts (2 hunks)
  • apps/webapp/package.json (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Always prefer using isomorphic code like fetch, ReadableStream, etc. instead of Node.js specific code
For TypeScript, we usually use types over interfaces
Avoid enums
No default exports, use function declarations

Files:

  • apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts
  • apps/webapp/app/v3/remoteImageBuilder.server.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

We use zod a lot in packages/core and in the webapp

Files:

  • apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts
  • apps/webapp/app/v3/remoteImageBuilder.server.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

When importing from @trigger.dev/core in the webapp, never import the root package path; always use one of the documented subpath exports from @trigger.dev/core’s package.json

Files:

  • apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts
  • apps/webapp/app/v3/remoteImageBuilder.server.ts
{apps/webapp/app/**/*.server.{ts,tsx},apps/webapp/app/routes/**/*.ts}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Access environment variables only via the env export from app/env.server.ts; do not reference process.env directly

Files:

  • apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts
  • apps/webapp/app/v3/remoteImageBuilder.server.ts
apps/webapp/app/**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Modules intended for test consumption under apps/webapp/app/**/*.ts must not read environment variables; accept configuration via options instead

Files:

  • apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts
  • apps/webapp/app/v3/remoteImageBuilder.server.ts
🧬 Code graph analysis (2)
apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts (1)
packages/core/src/v3/apps/http.ts (1)
  • json (65-75)
apps/webapp/app/v3/remoteImageBuilder.server.ts (1)
apps/webapp/app/env.server.ts (1)
  • env (1203-1203)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (5)
apps/webapp/package.json (1)

169-169: LGTM! Dependency addition supports retry logic.

The p-retry@^4.6.1 dependency is used in apps/webapp/app/v3/remoteImageBuilder.server.ts to handle transient Depot build failures. The version is stable and the caret range allows safe updates within v4.x.

apps/webapp/app/v3/remoteImageBuilder.server.ts (2)

6-7: LGTM! Imports correctly support retry and logging.

The imports follow the correct pattern: pRetry is imported as a default export (as per p-retry v4 API), and logger follows the project's logging pattern.


23-25: LGTM! Authorization header correctly added.

The Authorization header follows the same pattern used in createBuilderProjectIfNotExists (lines 57-59) and correctly uses env.DEPOT_TOKEN as per coding guidelines.

apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts (2)

58-61: LGTM! Logging improves observability for timeout extensions.

The warning log with error.cause provides useful debugging context while maintaining the 204 response behavior. Adding braces also improves maintainability if additional statements are needed later.


69-72: LGTM! Logging improves observability for remote build failures.

The error log with error.cause provides useful debugging context for Depot build creation failures. The 500 response correctly signals a server error while maintaining a generic client-facing message.

@myftija myftija merged commit 107f4dc into main Oct 6, 2025
31 checks passed
@myftija myftija deleted the fix-retry-depot-builds branch October 6, 2025 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants