Skip to content

feat(controlplane): implement dual-bucket write for execution configs#2717

Merged
pepol merged 7 commits intomainfrom
peter/eng-8555-implement-failover-cdn
Apr 9, 2026
Merged

feat(controlplane): implement dual-bucket write for execution configs#2717
pepol merged 7 commits intomainfrom
peter/eng-8555-implement-failover-cdn

Conversation

@pepol
Copy link
Copy Markdown
Member

@pepol pepol commented Mar 31, 2026

Summary by CodeRabbit

  • New Features

    • Optional secondary S3-compatible failover storage with dual-write and automatic read failover. Writes are coordinated to both stores with rollback attempts on partial failures; errors aggregate and surface combined failure details.
    • New build/config and environment options to enable and tune failover (URL, endpoint/region, credentials, path-style, and delete behavior).
  • Tests

    • Added tests covering dual-write behavior, rollback on partial failures, read failover, directory removal, deletion behavior, and aggregated error reporting.

Checklist

  • I have discussed my proposed changes in an issue and have received approval to proceed.
  • I have followed the coding standards of the project.
  • Tests or benchmarks have been added or updated.
  • Documentation has been updated on https://github.com/wundergraph/docs-website.
  • I have read the Contributors Guide.

Open Source AI Manifesto

This project follows the principles of the Open Source AI Manifesto. Please ensure your contribution aligns with its principles.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

Walkthrough

Adds DualBlobStorage coordinating primary and secondary BlobStorage for reads, writes, deletes; integrates optional S3 failover into build-server and startup via new env vars and BuildConfig; extends env schema; and adds unit tests covering dual-storage semantics.

Changes

Cohort / File(s) Summary
DualBlobStorage implementation
controlplane/src/core/blobstorage/dual.ts, controlplane/src/core/blobstorage/index.ts
New exported DualBlobStorage class. putObject writes to both stores concurrently, attempts rollback on partial failure, and throws aggregated errors; getObject tries primary then falls back to secondary with aggregated errors if both fail; deleteObject and removeDirectory call both stores. Index re-exports the class.
Build server wiring
controlplane/src/core/build-server.ts
Added optional s3StorageFailover to BuildConfig. Builds primary S3 storage as before; when failover is configured, constructs a secondary S3BlobStorage and uses DualBlobStorage(primary, secondary) for blob operations passed to workers/routes.
Environment schema
controlplane/src/core/env.schema.ts
Extended envVariables with optional S3_FAILOVER_* fields: S3_FAILOVER_STORAGE_URL, S3_FAILOVER_ENDPOINT, S3_FAILOVER_REGION (default 'auto'), S3_FAILOVER_ACCESS_KEY_ID, S3_FAILOVER_SECRET_ACCESS_KEY, plus S3_FAILOVER_FORCE_PATH_STYLE and S3_FAILOVER_USE_INDIVIDUAL_DELETES boolean flags derived from strings.
App startup configuration
controlplane/src/index.ts
Parses new S3_FAILOVER_* env vars and conditionally sets options.s3StorageFailover with structured failover config (url, region, endpoint, credentials, flags) when S3_FAILOVER_STORAGE_URL is present.
Tests
controlplane/test/dual-blob-storage.test.ts
New Vitest suite with createMockBlobStorage helper. Tests putObject behavior including rollback and rollback-error aggregation, getObject primary-first then fallback with aggregated errors, deleteObject concurrent deletes and error propagation, and removeDirectory behavior and error handling.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing dual-bucket write capability (via DualBlobStorage) for execution configs in the controlplane service.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
controlplane/test/dual-blob-storage.test.ts (2)

84-94: Consider documenting the error propagation behavior.

When both storages fail, the secondary error is thrown (line 93), discarding the primary error. This is valid behavior, but the test serves as documentation that this is intentional. You may want to add a brief comment in dual.ts clarifying this design choice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/test/dual-blob-storage.test.ts` around lines 84 - 94, Add a
short explanatory comment in the DualBlobStorage implementation clarifying that
when both primary and secondary getObject calls fail the design intentionally
propagates/throws the secondary error (i.e., the error from the fallback), so
callers will receive the secondary error instead of the primary; locate the
getObject method in DualBlobStorage (dual.ts) and insert a concise note above
the error-handling block that documents this behavior and the rationale.

120-135: Test coverage looks good but could verify count discrepancy handling.

The test verifies that removeDirectory returns the primary count (10), but both mocks return the same value. Consider adding a test where primary and secondary return different counts to explicitly document that the secondary count is discarded.

💡 Suggested additional test case
test('returns primary count even when secondary differs', async () => {
  const primary = createMockBlobStorage({
    removeDirectory: vi.fn().mockResolvedValue(10),
  });
  const secondary = createMockBlobStorage({
    removeDirectory: vi.fn().mockResolvedValue(15), // Different count
  });
  const dual = new DualBlobStorage(primary, secondary);

  const count = await dual.removeDirectory({ key: 'dir/' });

  expect(count).toBe(10); // Primary count returned
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/test/dual-blob-storage.test.ts` around lines 120 - 135, Add a
new test to assert that DualBlobStorage.removeDirectory returns the primary
result when primary and secondary counts differ: create mocks via
createMockBlobStorage with primary.removeDirectory resolving to 10 and
secondary.removeDirectory resolving to a different value (e.g., 15), instantiate
new DualBlobStorage(primary, secondary), call dual.removeDirectory({ key: 'dir/'
}) and expect the returned count to equal the primary's value (10); also assert
both removeDirectory mocks were called with the same { key: 'dir/' } argument to
document that the secondary result is invoked but ignored.
controlplane/src/core/blobstorage/dual.ts (2)

15-23: Consider whether partial write failure should trigger cleanup.

putObject uses Promise.all, so if the secondary write fails after the primary succeeds, the primary will have the object but the secondary won't. This creates an inconsistency that could cause issues if the primary later becomes unavailable.

This may be acceptable depending on your consistency requirements, but it's worth documenting this behavior explicitly in the JSDoc.

📝 Suggested documentation update
 /**
  * A BlobStorage implementation that writes to two underlying stores (primary + secondary).
  *
  * - Writes and deletes go to both stores concurrently; both must succeed.
+ *   Note: On partial failure (e.g., primary succeeds, secondary fails), no rollback is performed.
+ *   This may leave stores in an inconsistent state until the operation is retried.
  * - Reads try the primary first, falling back to the secondary on failure.
  */
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/core/blobstorage/dual.ts` around lines 15 - 23, The
putObject implementation in dual.ts calls Promise.all on this.primary.putObject
and this.secondary.putObject which can leave a partial write (primary succeeds
but secondary fails); update the JSDoc for async putObject<Metadata> to
explicitly document this failure mode and the chosen semantics (e.g., that
partial writes may remain, whether a retry/compensation/cleanup is attempted or
not, and how callers should handle it), referencing the interaction between
primary.putObject and secondary.putObject; if you prefer automatic cleanup
instead of documentation, describe in the JSDoc that on secondary failure the
method will attempt to delete the primary object (or retry) and implement that
cleanup/retry logic around the Promise.all call.

33-36: Discarding secondary removeDirectory count may hide data inconsistencies.

The secondary storage's deletion count is discarded. If primary deletes 10 objects but secondary deletes 15, this could indicate desynchronization between storages. Consider logging or returning both counts for diagnostics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/core/blobstorage/dual.ts` around lines 33 - 36, The current
removeDirectory implementation discards the secondary result; change it to
capture both results from Promise.all (e.g., const [primaryCount,
secondaryCount] = await Promise.all([this.primary.removeDirectory(data),
this.secondary.removeDirectory(data)]) and then if counts differ log a warning
including data.key and both counts (use this.logger.warn or console.warn if no
logger is available); keep returning primaryCount to preserve the existing
Promise<number> signature, or if you prefer to expose both counts update the
method signature to return an object like { primary: number, secondary: number }
and adjust callers accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@controlplane/src/core/blobstorage/dual.ts`:
- Around line 25-31: The getObject method in dual blob storage silently swallows
the error from primary.getObject; update the Dual blob storage class (the method
getObject in controlplane/src/core/blobstorage/dual.ts) to capture the caught
error, log it (or emit a metric/event) before falling back to
secondary.getObject, and if secondary.getObject also fails, re-throw a wrapped
error that includes both the primary error context and the secondary error;
inject a logger/metrics client into the class constructor (or accept one via
dependency injection) and use it in getObject to record the primary failure
prior to failover.

In `@controlplane/src/core/env.schema.ts`:
- Around line 165-180: The schema currently allows S3_FAILOVER_STORAGE_URL
without requiring S3_FAILOVER_REGION, which can cause createS3ClientConfig (in
util.ts) to throw "Missing region in S3 configuration."; fix by either giving
S3_FAILOVER_REGION a sensible default (e.g., the same default as S3_REGION -
'auto') in the env schema declaration for S3_FAILOVER_REGION, or add a Zod
refinement on the exported env schema that checks if S3_FAILOVER_STORAGE_URL is
set then S3_FAILOVER_REGION must be present and non-empty (return false with an
error message like 'S3_FAILOVER_REGION is required when S3_FAILOVER_STORAGE_URL
is set'), referencing the S3_FAILOVER_STORAGE_URL and S3_FAILOVER_REGION symbols
so validation prevents runtime startup errors.

---

Nitpick comments:
In `@controlplane/src/core/blobstorage/dual.ts`:
- Around line 15-23: The putObject implementation in dual.ts calls Promise.all
on this.primary.putObject and this.secondary.putObject which can leave a partial
write (primary succeeds but secondary fails); update the JSDoc for async
putObject<Metadata> to explicitly document this failure mode and the chosen
semantics (e.g., that partial writes may remain, whether a
retry/compensation/cleanup is attempted or not, and how callers should handle
it), referencing the interaction between primary.putObject and
secondary.putObject; if you prefer automatic cleanup instead of documentation,
describe in the JSDoc that on secondary failure the method will attempt to
delete the primary object (or retry) and implement that cleanup/retry logic
around the Promise.all call.
- Around line 33-36: The current removeDirectory implementation discards the
secondary result; change it to capture both results from Promise.all (e.g.,
const [primaryCount, secondaryCount] = await
Promise.all([this.primary.removeDirectory(data),
this.secondary.removeDirectory(data)]) and then if counts differ log a warning
including data.key and both counts (use this.logger.warn or console.warn if no
logger is available); keep returning primaryCount to preserve the existing
Promise<number> signature, or if you prefer to expose both counts update the
method signature to return an object like { primary: number, secondary: number }
and adjust callers accordingly.

In `@controlplane/test/dual-blob-storage.test.ts`:
- Around line 84-94: Add a short explanatory comment in the DualBlobStorage
implementation clarifying that when both primary and secondary getObject calls
fail the design intentionally propagates/throws the secondary error (i.e., the
error from the fallback), so callers will receive the secondary error instead of
the primary; locate the getObject method in DualBlobStorage (dual.ts) and insert
a concise note above the error-handling block that documents this behavior and
the rationale.
- Around line 120-135: Add a new test to assert that
DualBlobStorage.removeDirectory returns the primary result when primary and
secondary counts differ: create mocks via createMockBlobStorage with
primary.removeDirectory resolving to 10 and secondary.removeDirectory resolving
to a different value (e.g., 15), instantiate new DualBlobStorage(primary,
secondary), call dual.removeDirectory({ key: 'dir/' }) and expect the returned
count to equal the primary's value (10); also assert both removeDirectory mocks
were called with the same { key: 'dir/' } argument to document that the
secondary result is invoked but ignored.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 323fdc89-9b83-4fd0-8703-c142b5b541a6

📥 Commits

Reviewing files that changed from the base of the PR and between 4e2b146 and 4129480.

📒 Files selected for processing (6)
  • controlplane/src/core/blobstorage/dual.ts
  • controlplane/src/core/blobstorage/index.ts
  • controlplane/src/core/build-server.ts
  • controlplane/src/core/env.schema.ts
  • controlplane/src/index.ts
  • controlplane/test/dual-blob-storage.test.ts

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 54.25532% with 43 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.45%. Comparing base (2ca7b28) to head (4aef608).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
controlplane/src/index.ts 0.00% 18 Missing ⚠️
controlplane/src/core/env.schema.ts 0.00% 13 Missing ⚠️
controlplane/src/core/build-server.ts 20.00% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2717      +/-   ##
==========================================
+ Coverage   63.46%   64.45%   +0.98%     
==========================================
  Files         251      306      +55     
  Lines       26767    43621   +16854     
  Branches        0     4690    +4690     
==========================================
+ Hits        16989    28114   +11125     
- Misses       8415    15485    +7070     
+ Partials     1363       22    -1341     
Files with missing lines Coverage Δ
controlplane/src/core/blobstorage/dual.ts 100.00% <100.00%> (ø)
controlplane/src/core/blobstorage/index.ts 76.92% <ø> (ø)
controlplane/src/core/build-server.ts 73.56% <20.00%> (ø)
controlplane/src/core/env.schema.ts 0.00% <0.00%> (ø)
controlplane/src/index.ts 0.00% <0.00%> (ø)

... and 552 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pepol pepol force-pushed the peter/eng-8555-implement-failover-cdn branch from 4129480 to 55c5070 Compare March 31, 2026 13:46
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
controlplane/src/index.ts (1)

152-162: Consider extracting shared S3 mapping to avoid drift.

options.s3Storage and options.s3StorageFailover now duplicate the same field mapping. A small helper would reduce future divergence risk.

♻️ Suggested refactor
+const toS3StorageOptions = (cfg: {
+  url?: string;
+  region?: string;
+  endpoint?: string;
+  accessKeyId?: string;
+  secretAccessKey?: string;
+  forcePathStyle?: boolean;
+  useIndividualDeletes?: boolean;
+}) =>
+  cfg.url
+    ? {
+        url: cfg.url,
+        region: cfg.region,
+        endpoint: cfg.endpoint,
+        username: cfg.accessKeyId,
+        password: cfg.secretAccessKey,
+        forcePathStyle: cfg.forcePathStyle,
+        useIndividualDeletes: cfg.useIndividualDeletes,
+      }
+    : undefined;
+
   s3Storage: {
     url: S3_STORAGE_URL,
     region: S3_REGION,
     endpoint: S3_ENDPOINT,
     username: S3_ACCESS_KEY_ID,
     password: S3_SECRET_ACCESS_KEY,
     forcePathStyle: S3_FORCE_PATH_STYLE,
     useIndividualDeletes: S3_USE_INDIVIDUAL_DELETES,
   },
-  s3StorageFailover: S3_FAILOVER_STORAGE_URL
-    ? {
-        url: S3_FAILOVER_STORAGE_URL,
-        region: S3_FAILOVER_REGION,
-        endpoint: S3_FAILOVER_ENDPOINT,
-        username: S3_FAILOVER_ACCESS_KEY_ID,
-        password: S3_FAILOVER_SECRET_ACCESS_KEY,
-        forcePathStyle: S3_FAILOVER_FORCE_PATH_STYLE,
-        useIndividualDeletes: S3_FAILOVER_USE_INDIVIDUAL_DELETES,
-      }
-    : undefined,
+  s3StorageFailover: toS3StorageOptions({
+    url: S3_FAILOVER_STORAGE_URL,
+    region: S3_FAILOVER_REGION,
+    endpoint: S3_FAILOVER_ENDPOINT,
+    accessKeyId: S3_FAILOVER_ACCESS_KEY_ID,
+    secretAccessKey: S3_FAILOVER_SECRET_ACCESS_KEY,
+    forcePathStyle: S3_FAILOVER_FORCE_PATH_STYLE,
+    useIndividualDeletes: S3_FAILOVER_USE_INDIVIDUAL_DELETES,
+  }),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/index.ts` around lines 152 - 162, s3Storage and
s3StorageFailover duplicate the same field mapping; extract a small helper
(e.g., buildS3Config or mapS3Config) that accepts the set of S3 env variables or
a prefix and returns the common object shape, then replace both inline objects
with calls to that helper (use the same property names: url, region, endpoint,
username, password, forcePathStyle, useIndividualDeletes) so future changes
apply in one place; update usages where options.s3Storage and
options.s3StorageFailover are set to call this helper and preserve the ternary
behavior (return undefined when no URL).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@controlplane/src/index.ts`:
- Around line 152-162: s3Storage and s3StorageFailover duplicate the same field
mapping; extract a small helper (e.g., buildS3Config or mapS3Config) that
accepts the set of S3 env variables or a prefix and returns the common object
shape, then replace both inline objects with calls to that helper (use the same
property names: url, region, endpoint, username, password, forcePathStyle,
useIndividualDeletes) so future changes apply in one place; update usages where
options.s3Storage and options.s3StorageFailover are set to call this helper and
preserve the ternary behavior (return undefined when no URL).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 553f97f9-c392-48d7-9647-384a16fdb1b2

📥 Commits

Reviewing files that changed from the base of the PR and between 4129480 and 55c5070.

📒 Files selected for processing (6)
  • controlplane/src/core/blobstorage/dual.ts
  • controlplane/src/core/blobstorage/index.ts
  • controlplane/src/core/build-server.ts
  • controlplane/src/core/env.schema.ts
  • controlplane/src/index.ts
  • controlplane/test/dual-blob-storage.test.ts
✅ Files skipped from review due to trivial changes (2)
  • controlplane/src/core/blobstorage/index.ts
  • controlplane/test/dual-blob-storage.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • controlplane/src/core/env.schema.ts
  • controlplane/src/core/build-server.ts
  • controlplane/src/core/blobstorage/dual.ts

@pepol pepol force-pushed the peter/eng-8555-implement-failover-cdn branch from 122d7cb to 95b6be2 Compare March 31, 2026 16:02
@pepol pepol marked this pull request as ready for review March 31, 2026 17:01
@pepol pepol force-pushed the peter/eng-8555-implement-failover-cdn branch from 95b6be2 to bf26275 Compare April 1, 2026 13:33
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
controlplane/src/core/blobstorage/dual.ts (2)

29-32: Consider documenting or handling divergent counts between stores.

removeDirectory returns only results[0] (primary's deleted count), discarding secondary's result. If the stores become inconsistent (e.g., secondary has more/fewer objects), the returned count may be misleading.

Consider either:

  1. Documenting this behavior explicitly
  2. Returning the maximum or logging a warning if counts differ
💡 Optional: Detect inconsistency
 async removeDirectory(data: { key: string; abortSignal?: AbortSignal }): Promise<number> {
   const results = await Promise.all([this.primary.removeDirectory(data), this.secondary.removeDirectory(data)]);
+  // Optionally warn if counts differ, indicating storage drift
+  if (results[0] !== results[1]) {
+    // Consider logging: `Primary deleted ${results[0]}, secondary deleted ${results[1]}`
+  }
   return results[0];
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/core/blobstorage/dual.ts` around lines 29 - 32, The
removeDirectory implementation currently returns only results[0] (the primary
store's deleted count) which can mask divergences with the secondary; update the
method (removeDirectory in dual.ts) to await both results, compare results[0]
and results[1], and if they differ emit a warning (use the class logger if
available, otherwise console.warn) indicating primary vs secondary counts and
the key, then return a deterministic value (recommend Math.max(results[0],
results[1]) or document the chosen policy). Ensure the warning includes function
name removeDirectory, the key, and both counts so inconsistencies are
discoverable.

6-7: Documentation contradicts implementation; also consider observability on partial failures.

The class documentation states reads "try the primary first, falling back to the secondary on failure" (sequential), but Promise.any races both requests concurrently and returns whichever succeeds first. This has implications:

  1. Documentation mismatch: Update docs to reflect concurrent racing behavior
  2. Cost/Load: Secondary storage is hit on every read, not just failover scenarios
  3. Partial failure observability: If primary fails but secondary succeeds, there's no indication of the primary issue (echoing the past review concern)

If sequential fallback is intended:

🔧 Sequential fallback with observability
- * - Reads try the primary first, falling back to the secondary on failure.
+ * - Reads race both stores concurrently, returning the first successful response.

Or, if you want actual sequential behavior:

 async getObject(data: { key: string; abortSignal?: AbortSignal }): Promise<BlobObject> {
-  return await Promise.any([this.primary.getObject(data), this.secondary.getObject(data)]);
+  try {
+    return await this.primary.getObject(data);
+  } catch (primaryError) {
+    // Consider logging primaryError for observability
+    return await this.secondary.getObject(data);
+  }
 }

Also applies to: 25-27

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/core/blobstorage/dual.ts` around lines 6 - 7, The docs claim
reads fall back sequentially but the read implementation uses Promise.any to
race both stores; change the read method (replace the Promise.any usage) to
perform a sequential fetch: await the primary read first, and only if it throws
or rejects, call the secondary read; when the secondary succeeds after a primary
failure, emit a warning/metric/log (use the existing logger/metrics facility) to
record the partial failure (primary failed, secondary served) so partial
failures are observable; update the JSDoc/comments in dual.ts to describe this
sequential fallback behavior and the added observability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@controlplane/src/core/blobstorage/dual.ts`:
- Around line 29-32: The removeDirectory implementation currently returns only
results[0] (the primary store's deleted count) which can mask divergences with
the secondary; update the method (removeDirectory in dual.ts) to await both
results, compare results[0] and results[1], and if they differ emit a warning
(use the class logger if available, otherwise console.warn) indicating primary
vs secondary counts and the key, then return a deterministic value (recommend
Math.max(results[0], results[1]) or document the chosen policy). Ensure the
warning includes function name removeDirectory, the key, and both counts so
inconsistencies are discoverable.
- Around line 6-7: The docs claim reads fall back sequentially but the read
implementation uses Promise.any to race both stores; change the read method
(replace the Promise.any usage) to perform a sequential fetch: await the primary
read first, and only if it throws or rejects, call the secondary read; when the
secondary succeeds after a primary failure, emit a warning/metric/log (use the
existing logger/metrics facility) to record the partial failure (primary failed,
secondary served) so partial failures are observable; update the JSDoc/comments
in dual.ts to describe this sequential fallback behavior and the added
observability.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 40c5945e-a7e2-4663-8d6e-a97e310f8224

📥 Commits

Reviewing files that changed from the base of the PR and between 95b6be2 and bf26275.

📒 Files selected for processing (6)
  • controlplane/src/core/blobstorage/dual.ts
  • controlplane/src/core/blobstorage/index.ts
  • controlplane/src/core/build-server.ts
  • controlplane/src/core/env.schema.ts
  • controlplane/src/index.ts
  • controlplane/test/dual-blob-storage.test.ts
✅ Files skipped from review due to trivial changes (3)
  • controlplane/src/core/blobstorage/index.ts
  • controlplane/src/core/env.schema.ts
  • controlplane/test/dual-blob-storage.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • controlplane/src/core/build-server.ts
  • controlplane/src/index.ts

@pepol pepol requested a review from StarpTech April 7, 2026 11:54
@pepol pepol force-pushed the peter/eng-8555-implement-failover-cdn branch from 07c1968 to 7b817ef Compare April 7, 2026 11:55
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
controlplane/src/core/build-server.ts (1)

355-378: Consider adding an info log when dual-bucket storage is enabled.

The storage wiring logic is correct and follows established patterns. For operational visibility, consider logging when failover storage is configured, which aids troubleshooting and confirms the feature is active.

🔧 Suggested logging addition
   if (opts.s3StorageFailover?.url) {
     const failoverBucketName = extractS3BucketName(opts.s3StorageFailover);
     const failoverS3Config = createS3ClientConfig(failoverBucketName, opts.s3StorageFailover);
     const failoverS3Client = new S3Client(failoverS3Config);
     const failoverBlobStorage = new S3BlobStorage(failoverS3Client, failoverBucketName, {
       useIndividualDeletes:
         isGoogleCloudStorageUrl(opts.s3StorageFailover.url) ||
         isGoogleCloudStorageUrl(failoverS3Config.endpoint as string)
           ? true
           : (opts.s3StorageFailover.useIndividualDeletes ?? false),
     });
 
     blobStorage = new DualBlobStorage(primaryBlobStorage, failoverBlobStorage);
+    logger.info({ primaryBucket: bucketName, failoverBucket: failoverBucketName }, 'Dual-bucket blob storage enabled');
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/core/build-server.ts` around lines 355 - 378, Add an info
log when failover storage is configured so operators see dual-bucket wiring:
before assigning blobStorage = new DualBlobStorage(...), call the project's
logger (e.g., processLogger.info or the existing logger used in this module) to
log that failover storage was enabled and include identifying details such as
opts.s3Storage.url, extractS3BucketName(opts.s3Storage),
opts.s3StorageFailover.url and failoverBucketName; place the log near the
DualBlobStorage creation so it's emitted only when opts.s3StorageFailover?.url
is present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@controlplane/src/core/build-server.ts`:
- Around line 355-378: Add an info log when failover storage is configured so
operators see dual-bucket wiring: before assigning blobStorage = new
DualBlobStorage(...), call the project's logger (e.g., processLogger.info or the
existing logger used in this module) to log that failover storage was enabled
and include identifying details such as opts.s3Storage.url,
extractS3BucketName(opts.s3Storage), opts.s3StorageFailover.url and
failoverBucketName; place the log near the DualBlobStorage creation so it's
emitted only when opts.s3StorageFailover?.url is present.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b57b66cd-e869-45e2-9a14-49ddd73f6824

📥 Commits

Reviewing files that changed from the base of the PR and between 95b6be2 and 7b817ef.

📒 Files selected for processing (6)
  • controlplane/src/core/blobstorage/dual.ts
  • controlplane/src/core/blobstorage/index.ts
  • controlplane/src/core/build-server.ts
  • controlplane/src/core/env.schema.ts
  • controlplane/src/index.ts
  • controlplane/test/dual-blob-storage.test.ts
✅ Files skipped from review due to trivial changes (3)
  • controlplane/src/core/blobstorage/index.ts
  • controlplane/test/dual-blob-storage.test.ts
  • controlplane/src/core/env.schema.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • controlplane/src/core/blobstorage/dual.ts

@pepol pepol force-pushed the peter/eng-8555-implement-failover-cdn branch from 7b817ef to 6b941e0 Compare April 7, 2026 12:13
@pepol pepol enabled auto-merge (squash) April 7, 2026 12:15
@pepol pepol force-pushed the peter/eng-8555-implement-failover-cdn branch from 1222199 to cc6cf99 Compare April 7, 2026 13:14
@github-actions

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@controlplane/src/core/blobstorage/dual.ts`:
- Around line 41-47: The getObject method currently retries on any primary
error; change it to only fail over when the primary error is retriable by
calling a helper like shouldFailoverToSecondary(primaryError) before attempting
the secondary: if shouldFailoverToSecondary(primaryError) is false, rethrow the
primaryError immediately; otherwise proceed to call this.secondary.getObject and
propagate any secondaryError. Update the getObject error handling around the
this.primary.getObject call (catching primaryError) to perform this check and
ensure cancelled/legitimate-miss errors are not retried to the secondary.
- Around line 29-38: The rollback deletes currently call primary.deleteObject
and secondary.deleteObject with the caller-provided data.abortSignal and await
them directly, letting cancellation abort cleanup and letting cleanup rejections
mask the original put errors; change the rollback to run independent of the
caller signal by invoking primary.deleteObject and secondary.deleteObject
without data.abortSignal (or pass an internal cleanup AbortSignal/timeout) and
perform them with Promise.allSettled, collect any rollback errors and combine
them with the original put errors before throwing the AggregateError so the
original write failures are preserved; update the code around
primaryResult/secondaryResult handling to queue the compensating deletes only
for fulfilled puts and use Promise.allSettled to run them and merge their
failures with the put failures.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5b191c09-7809-4b26-9968-200eccbcdb80

📥 Commits

Reviewing files that changed from the base of the PR and between 6b941e0 and cc6cf99.

📒 Files selected for processing (6)
  • controlplane/src/core/blobstorage/dual.ts
  • controlplane/src/core/blobstorage/index.ts
  • controlplane/src/core/build-server.ts
  • controlplane/src/core/env.schema.ts
  • controlplane/src/index.ts
  • controlplane/test/dual-blob-storage.test.ts
✅ Files skipped from review due to trivial changes (2)
  • controlplane/src/core/blobstorage/index.ts
  • controlplane/test/dual-blob-storage.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • controlplane/src/core/build-server.ts

@pepol pepol force-pushed the peter/eng-8555-implement-failover-cdn branch from 180a793 to 3bc5554 Compare April 7, 2026 14:05
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
controlplane/src/core/blobstorage/dual.ts (1)

61-64: Clarify return value semantics for dual-storage removal.

The method removes directories from both storages but returns only the primary's count. This might be intentional if primary is authoritative, but the behavior should be documented since the interface contract promises "the number of entries removed."

Consider adding a brief comment clarifying the design decision:

💡 Suggested documentation
   async removeDirectory(data: { key: string; abortSignal?: AbortSignal }): Promise<number> {
+    // Returns the primary storage count; secondary removal is best-effort for consistency
     const results = await Promise.all([this.primary.removeDirectory(data), this.secondary.removeDirectory(data)]);
     return results[0];
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/core/blobstorage/dual.ts` around lines 61 - 64, The
removeDirectory method in dual storage (async removeDirectory) currently calls
both this.primary.removeDirectory and this.secondary.removeDirectory but returns
only the primary's result; add a clear comment above the removeDirectory
implementation stating that the primary's return value is authoritative and
intentionally returned (or alternatively update logic to aggregate results if
desired), and ensure the function signature and any public docs reflect whether
the returned "number of entries removed" refers to primary only or a
combined/validated count; reference the method name removeDirectory and the
this.primary / this.secondary calls when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@controlplane/src/core/blobstorage/dual.ts`:
- Around line 61-64: The removeDirectory method in dual storage (async
removeDirectory) currently calls both this.primary.removeDirectory and
this.secondary.removeDirectory but returns only the primary's result; add a
clear comment above the removeDirectory implementation stating that the
primary's return value is authoritative and intentionally returned (or
alternatively update logic to aggregate results if desired), and ensure the
function signature and any public docs reflect whether the returned "number of
entries removed" refers to primary only or a combined/validated count; reference
the method name removeDirectory and the this.primary / this.secondary calls when
making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a9390732-6e9b-4936-a7a9-c69fe316fdec

📥 Commits

Reviewing files that changed from the base of the PR and between cc6cf99 and 3bc5554.

📒 Files selected for processing (2)
  • controlplane/src/core/blobstorage/dual.ts
  • controlplane/test/dual-blob-storage.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • controlplane/test/dual-blob-storage.test.ts

@pepol pepol force-pushed the peter/eng-8555-implement-failover-cdn branch from 3bc5554 to 41f6ab6 Compare April 8, 2026 09:19
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
controlplane/src/core/blobstorage/dual.ts (1)

61-67: ⚠️ Potential issue | 🟠 Major

Promise.all is unsafe for the delete paths.

These methods short-circuit on the first rejection. If one bucket has already deleted the object and the other later fails or is aborted, the stale copy survives and getObject() can read it back from the fallback store. Please handle these like putObject(): wait for both results with allSettled, retry the failed side without the caller signal, and only then throw an aggregated error if reconciliation still fails.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/core/blobstorage/dual.ts` around lines 61 - 67, The delete
paths (removeDirectory and deleteObject) currently use Promise.all which
short-circuits on rejection; change both to mirror putObject: call both backends
with Promise.allSettled, collect failures, retry any failed side(s) without
passing the caller's abortSignal (invoke
primary.removeDirectory/secondary.removeDirectory or
primary.deleteObject/secondary.deleteObject with a fresh/no abortSignal), and if
any side still fails after retry throw an aggregated error containing both
failures; for removeDirectory return the successfully reconciled directory count
(consistent with putObject behavior) instead of blindly returning results[0].
Ensure you reference the existing primary and secondary callers and the
removeDirectory and deleteObject method names when making these changes.
🧹 Nitpick comments (1)
controlplane/src/core/blobstorage/dual.ts (1)

61-63: Define the deleted-count contract for removeDirectory().

Returning results[0] makes the count depend entirely on the primary bucket. If the buckets ever drift, callers get a misleading number and no signal that mirrored state diverged. Either validate both counts match before returning or document why the primary count is authoritative.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controlplane/src/core/blobstorage/dual.ts` around lines 61 - 63, The
removeDirectory implementation returns results[0] making the deleted-count
depend only on the primary; change it to compare the counts returned by
this.primary.removeDirectory and this.secondary.removeDirectory (from the
removeDirectory call in dual.ts) and enforce the contract: if both counts are
equal return that count, otherwise throw an Error (or reject) that includes both
counts and the data.key so callers can detect divergence; ensure the thrown
message clearly states primary vs secondary counts and the key so debugging and
reconciliation are possible.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@controlplane/src/core/blobstorage/dual.ts`:
- Around line 61-67: The delete paths (removeDirectory and deleteObject)
currently use Promise.all which short-circuits on rejection; change both to
mirror putObject: call both backends with Promise.allSettled, collect failures,
retry any failed side(s) without passing the caller's abortSignal (invoke
primary.removeDirectory/secondary.removeDirectory or
primary.deleteObject/secondary.deleteObject with a fresh/no abortSignal), and if
any side still fails after retry throw an aggregated error containing both
failures; for removeDirectory return the successfully reconciled directory count
(consistent with putObject behavior) instead of blindly returning results[0].
Ensure you reference the existing primary and secondary callers and the
removeDirectory and deleteObject method names when making these changes.

---

Nitpick comments:
In `@controlplane/src/core/blobstorage/dual.ts`:
- Around line 61-63: The removeDirectory implementation returns results[0]
making the deleted-count depend only on the primary; change it to compare the
counts returned by this.primary.removeDirectory and
this.secondary.removeDirectory (from the removeDirectory call in dual.ts) and
enforce the contract: if both counts are equal return that count, otherwise
throw an Error (or reject) that includes both counts and the data.key so callers
can detect divergence; ensure the thrown message clearly states primary vs
secondary counts and the key so debugging and reconciliation are possible.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 24637250-be98-4ff8-bf8e-bd778642e358

📥 Commits

Reviewing files that changed from the base of the PR and between 3bc5554 and 41f6ab6.

📒 Files selected for processing (6)
  • controlplane/src/core/blobstorage/dual.ts
  • controlplane/src/core/blobstorage/index.ts
  • controlplane/src/core/build-server.ts
  • controlplane/src/core/env.schema.ts
  • controlplane/src/index.ts
  • controlplane/test/dual-blob-storage.test.ts
✅ Files skipped from review due to trivial changes (2)
  • controlplane/src/core/blobstorage/index.ts
  • controlplane/test/dual-blob-storage.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • controlplane/src/index.ts
  • controlplane/src/core/env.schema.ts
  • controlplane/src/core/build-server.ts

Copy link
Copy Markdown
Contributor

@StarpTech StarpTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pepol pepol merged commit 0b0d42d into main Apr 9, 2026
10 checks passed
@pepol pepol deleted the peter/eng-8555-implement-failover-cdn branch April 9, 2026 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants