Skip to content

feat(cron): add build status reconciliation cron job#1641

Merged
riderx merged 7 commits intomainfrom
feat/build-status-reconciliation-cron
Feb 16, 2026
Merged

feat(cron): add build status reconciliation cron job#1641
riderx merged 7 commits intomainfrom
feat/build-status-reconciliation-cron

Conversation

@WcaleNieWolny
Copy link
Contributor

@WcaleNieWolny WcaleNieWolny commented Feb 16, 2026

Summary (AI generated)

  • Add a cron job that runs every 15 minutes to reconcile stale build_requests status by querying the builder API directly
  • New trigger handler: cron_reconcile_build_status.ts
  • New migration: creates pgmq queue + two cron_tasks entries (producer every 15m, consumer every 5m)

Motivation (AI generated)

Build status in build_requests can become stale when:

  1. The CLI disconnects before calling /build/status (network issues, user closes terminal, etc.)
  2. Historical builds from before the status persistence fix were never marked as completed/failed

Without this cron, these builds remain stuck at "pending" forever in the dashboard, even though the builder has long since finished them.

The cron handles two cases:

  • Builds with a builder_job_id stale for >5 minutes: fetches real status from the builder API, updates status and last_error, and records build time for billing
  • Builds without a builder_job_id older than 1 hour: marks as failed (never submitted to builder)

Companion PRs:

  • Cap-go/capgo#1637 — backend fix: use supabaseAdmin for status writes
  • Cap-go/CLI#496 — CLI fix: call /build/status after WebSocket delivers terminal status

Business Impact (AI generated)

  • Dashboard accuracy: All builds will eventually show correct status, even if the CLI never reports back
  • Billing correctness: Build time is recorded via recordBuildTime for completed builds that the CLI missed
  • Operational visibility: Stale/orphaned builds are automatically cleaned up instead of requiring manual DB intervention

Test Plan (AI generated)

  • Run bunx supabase db reset locally to verify migration applies cleanly (pgmq queue + cron_tasks entries created)
  • Verify cron_tasks table has both reconcile_build_status (queue, 15m) and reconcile_build_status_queue (function_queue, 5m) entries
  • Insert a test build_requests row with stale updated_at and a mock builder_job_id, verify the handler fetches from builder and updates the row
  • Insert a test build_requests row without builder_job_id and created_at >1h ago, verify it gets marked as failed
  • Confirm handler returns BRES and logs summary stats

Generated with AI

Summary by CodeRabbit

  • New Features
    • Automated build-status reconciliation endpoint added to detect and update stale/orphaned builds, with improved logging and error handling.
  • Chores
    • Scheduling and queue configuration added: main reconciliation runs every 15 minutes; processing queue runs every 1 minute.

Adds a cron job that runs every 15 minutes to reconcile stale
build_requests rows by querying the builder API directly.

For builds with a builder_job_id that have been stale for >5 minutes:
fetches real status from the builder, updates status and last_error,
and records build time for completed builds.

For builds without a builder_job_id that are older than 1 hour:
marks them as failed since they were never submitted to the builder.

This ensures the dashboard shows accurate build state even when the
CLI disconnects before calling /build/status (e.g., network issues).
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 16, 2026

Warning

Rate limit exceeded

@riderx has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 15 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Adds a new Supabase Edge Function HTTP POST endpoint that reconciles stale non-terminal build_requests by querying the DB, contacting an external builder API to fetch job statuses, updating build_requests (including marking orphaned builds), recording build durations/metrics, and registering a cron schedule and router route for periodic execution.

Changes

Cohort / File(s) Summary
Build Status Reconciliation Edge Function
supabase/functions/_backend/triggers/cron_reconcile_build_status.ts
New exported app (Hono) POST handler. Queries stale non-terminal build_requests, fetches builder job statuses via external builder API, updates build rows (status, error, timestamps), marks orphaned builds, records durations for terminal builds, and logs counters/errors.
Trigger Router Registration
supabase/functions/triggers/index.ts
Imports and registers the new trigger route: appGlobal.route('/cron_reconcile_build_status', cron_reconcile_build_status).
Cron Job Schedule Definition
supabase/migrations/20260216102420_add_build_status_reconciliation_cron.sql
Adds cron_reconcile_build_status queue and upserts two cron tasks: reconcile_build_status (queue, 15-minute interval) and reconcile_build_status_queue (function_queue, 1-minute interval) for scheduled execution.

Sequence Diagram

sequenceDiagram
    participant Scheduler as Cron Scheduler
    participant EdgeFunc as Edge Function (Hono app)
    participant DB as Supabase DB
    participant BuilderAPI as External Builder API

    Scheduler->>EdgeFunc: POST /cron_reconcile_build_status
    EdgeFunc->>DB: Query stale non-terminal build_requests (batch)
    DB-->>EdgeFunc: Return build batch

    alt builds found
        loop for each build
            alt builder_job_id present
                EdgeFunc->>BuilderAPI: GET /jobs/:id/status
                BuilderAPI-->>EdgeFunc: Return job status
                EdgeFunc->>DB: UPDATE build_request with status, error, timestamps
            else orphaned (no builder_job_id)
                EdgeFunc->>DB: UPDATE build_request -> mark failed/orphaned
            end

            alt build reached terminal status with timestamps
                EdgeFunc->>DB: INSERT/RECORD build duration metrics
            end
        end
    end

    EdgeFunc-->>Scheduler: Respond with summary (processed, reconciled, orphaned, errors)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through stale jobs in the night,
Tapped the builder to see what's right,
Fixed the orphans, tallied the time,
Logged my counts in tidy rhyme,
Now builds hum softly — all polite ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (8 files):

⚔️ .github/workflows/tests.yml (content)
⚔️ bun.lock (content)
⚔️ package.json (content)
⚔️ src/pages/admin/dashboard/plugins.vue (content)
⚔️ supabase/functions/_backend/public/replication.ts (content)
⚔️ supabase/functions/_backend/utils/pg.ts (content)
⚔️ supabase/functions/_backend/utils/version.ts (content)
⚔️ supabase/functions/triggers/index.ts (content)

These conflicts must be resolved before merging into main.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a cron job for build status reconciliation, which is the primary feature introduced across all modified files.
Description check ✅ Passed The description covers the summary, motivation, business impact, and test plan, though it lacks explicit manual testing confirmation and documentation update checklist items from the template.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/build-status-reconciliation-cron

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@supabase/functions/_backend/triggers/cron_reconcile_build_status.ts`:
- Around line 119-121: The code silently maps unknown build.platform values to
'ios' via resolvedPlatform; instead, detect invalid platform values before using
resolvedPlatform (check build.platform strictly equals 'ios' or 'android'), and
if invalid (null/empty/other) log a warning including the build identifier and
the unexpected platform value and skip calling recordBuildTime for that build;
adjust the code paths around resolvedPlatform and recordBuildTime so
recordBuildTime is only invoked for valid 'ios' or 'android' platforms.
- Around line 33-34: The code reads BUILDER_URL and BUILDER_API_KEY via getEnv
into builderUrl and builderApiKey but doesn't validate them; add an early check
after retrieving these values (using the builderUrl and builderApiKey symbols)
to verify they are non-empty and a valid URL (for builderUrl) and non-empty for
builderApiKey, and if invalid either throw/return an error or log a clear
message and exit so downstream fetch calls are never made with empty/malformed
values; use getEnv as the source and fail fast with a descriptive error
mentioning the missing/invalid env var.

In `@supabase/functions/triggers/index.ts`:
- Around line 4-5: Reorder the two imports so they follow alphabetical order:
import cron_email before cron_reconcile_build_status; update the import
statements that currently reference app as cron_reconcile_build_status and app
as cron_email so that the line importing app as cron_email appears first to
satisfy the perfectionist/sort-imports ESLint rule.
🧹 Nitpick comments (3)
supabase/functions/_backend/triggers/cron_reconcile_build_status.ts (3)

82-85: Missing timeout on external fetch call.

If the builder API is slow or unresponsive, this fetch will hang indefinitely, potentially causing the cron function to time out or block the queue. Consider adding an AbortSignal.timeout() or similar mechanism.

Proposed fix
-      const response = await fetch(`${builderUrl}/jobs/${build.builder_job_id}`, {
-        method: 'GET',
-        headers: { 'x-api-key': builderApiKey },
-      })
+      const response = await fetch(`${builderUrl}/jobs/${build.builder_job_id}`, {
+        method: 'GET',
+        headers: { 'x-api-key': builderApiKey },
+        signal: AbortSignal.timeout(10_000),
+      })

93-94: Unsafe cast of builder API response with no validation.

response.json() as BuilderStatusResponse does not validate the shape. If the builder returns an unexpected payload (e.g., missing job field), accessing builderJob.job.status will throw a runtime error. The outer catch block handles it, but the error message won't be descriptive.

Consider adding a minimal guard:

const builderJob = await response.json() as BuilderStatusResponse
if (!builderJob?.job?.status) {
  cloudlogErr({ requestId: c.get('requestId'), message: 'Invalid builder response shape', buildId: build.id })
  errors++
  continue
}

44-47: Query failure returns BRES (success) — intentional?

When the query fails, the handler logs the error but returns { status: 'ok' }. This means the cron system won't retry or flag the failure. If this is intentional (to prevent cascading retries), consider at least including the error signal in the response or a metric so it's observable.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@supabase/functions/_backend/triggers/cron_reconcile_build_status.ts`:
- Around line 98-108: The code writes an unvalidated builder API status
(builderJob.job.status stored in jobStatus) directly into the
build_requests.status column; instead map or validate jobStatus against your
known set of allowed statuses (e.g., "pending", "in_progress", "completed",
"failed") before calling supabase.from('build_requests').update, and convert
unknown values to a safe fallback (or reject/update to a normalized enum) while
also saving builderJob.job.error to last_error; update the logic around
builderJob/jobStatus to perform this mapping/validation prior to the DB update
so only allowed status values are written.
- Around line 86-96: The fetch to `${builderUrl}/jobs/${build.builder_job_id}`
can hang; modify the request inside the try where `fetch` is called to use an
AbortSignal timeout (e.g., `AbortSignal.timeout(5000)`) and pass the resulting
`signal` option to `fetch`; ensure the catch block detects an abort/timeout (and
logs via `cloudlogErr` including `c.get('requestId')`, `build.id`,
`build.builder_job_id`) and increments `errors` and `continue`s just like other
failures so a single slow request won't stall the loop.
- Line 24: Replace direct construction of the Hono app with the shared
initializer: change the app initialization from "new
Hono<MiddlewareKeyVariables>()" to calling
"createHono<MiddlewareKeyVariables>()" and update imports to import createHono
from "utils/hono.ts" (remove or keep Hono import only if still needed). This
ensures middleware wired by createHono (e.g., requestId via c.get('requestId'))
is present for the exported "app".

const ORPHAN_THRESHOLD_HOURS = 1
const BATCH_LIMIT = 50

export const app = new Hono<MiddlewareKeyVariables>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use createHono() instead of new Hono<MiddlewareKeyVariables>().

The coding guidelines require using createHono from utils/hono.ts for all Hono app initialization. This likely wires in shared middleware (e.g., request ID generation) that you rely on via c.get('requestId').

Proposed fix
-import { BRES, middlewareAPISecret } from '../utils/hono.ts'
+import { BRES, createHono, middlewareAPISecret } from '../utils/hono.ts'
 ...
-export const app = new Hono<MiddlewareKeyVariables>()
+export const app = createHono()

As per coding guidelines: "Use createHono from utils/hono.ts for all Hono framework application initialization and routing." Based on learnings: "Use createHono from utils/hono.ts for all Hono framework application initialization and routing."

🤖 Prompt for AI Agents
In `@supabase/functions/_backend/triggers/cron_reconcile_build_status.ts` at line
24, Replace direct construction of the Hono app with the shared initializer:
change the app initialization from "new Hono<MiddlewareKeyVariables>()" to
calling "createHono<MiddlewareKeyVariables>()" and update imports to import
createHono from "utils/hono.ts" (remove or keep Hono import only if still
needed). This ensures middleware wired by createHono (e.g., requestId via
c.get('requestId')) is present for the exported "app".

Comment on lines 86 to 96
try {
const response = await fetch(`${builderUrl}/jobs/${build.builder_job_id}`, {
method: 'GET',
headers: { 'x-api-key': builderApiKey },
})

if (!response.ok) {
cloudlogErr({ requestId: c.get('requestId'), message: 'Builder status fetch failed', buildId: build.id, jobId: build.builder_job_id, status: response.status })
errors++
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

External fetch call has no timeout — risk of the cron job hanging indefinitely.

If the builder API is slow or unresponsive, this fetch will block without limit. Since this runs inside a loop over up to 50 builds, a single hung request can stall the entire cron invocation (and potentially hit the function execution time limit silently).

Consider adding an AbortSignal.timeout:

Proposed fix
       const response = await fetch(`${builderUrl}/jobs/${build.builder_job_id}`, {
         method: 'GET',
         headers: { 'x-api-key': builderApiKey },
+        signal: AbortSignal.timeout(10_000),
       })
🤖 Prompt for AI Agents
In `@supabase/functions/_backend/triggers/cron_reconcile_build_status.ts` around
lines 86 - 96, The fetch to `${builderUrl}/jobs/${build.builder_job_id}` can
hang; modify the request inside the try where `fetch` is called to use an
AbortSignal timeout (e.g., `AbortSignal.timeout(5000)`) and pass the resulting
`signal` option to `fetch`; ensure the catch block detects an abort/timeout (and
logs via `cloudlogErr` including `c.get('requestId')`, `build.id`,
`build.builder_job_id`) and increments `errors` and `continue`s just like other
failures so a single slow request won't stall the loop.

Comment on lines +98 to +108
const builderJob = await response.json() as BuilderStatusResponse
const jobStatus = builderJob.job.status

const { error: updateError } = await supabase
.from('build_requests')
.update({
status: jobStatus,
last_error: builderJob.job.error || null,
updated_at: new Date().toISOString(),
})
.eq('id', build.id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unbounded external status value written directly to the database.

jobStatus (Line 99) is whatever string the builder API returns, and it's written directly into build_requests.status (Line 104). If the builder returns an unexpected status value (e.g., "queued", "running", or a future new status), it will be stored as-is. If the status column has a CHECK constraint or enum, the update will fail (handled). If not, it could introduce values that other parts of the system don't expect.

Consider validating or mapping jobStatus to your known set of statuses before writing.

🤖 Prompt for AI Agents
In `@supabase/functions/_backend/triggers/cron_reconcile_build_status.ts` around
lines 98 - 108, The code writes an unvalidated builder API status
(builderJob.job.status stored in jobStatus) directly into the
build_requests.status column; instead map or validate jobStatus against your
known set of allowed statuses (e.g., "pending", "in_progress", "completed",
"failed") before calling supabase.from('build_requests').update, and convert
unknown values to a safe fallback (or reject/update to a normalized enum) while
also saving builderJob.job.error to last_error; update the logic around
builderJob/jobStatus to perform this mapping/validation prior to the DB update
so only allowed status values are written.

@riderx riderx marked this pull request as draft February 16, 2026 13:29
@sonarqubecloud
Copy link

@riderx riderx marked this pull request as ready for review February 16, 2026 17:54
Copilot AI review requested due to automatic review settings February 16, 2026 17:54
@riderx riderx merged commit 54c5ad6 into main Feb 16, 2026
15 of 16 checks passed
@riderx riderx deleted the feat/build-status-reconciliation-cron branch February 16, 2026 17:54
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be09942df3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +117 to +119
TERMINAL_STATUSES.has(jobStatus)
&& builderJob.job.started_at
&& builderJob.job.completed_at

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict build-time billing to billable terminal statuses

The reconciliation path currently records build time for any status in TERMINAL_STATUSES, so jobs that end as cancelled, expired, or released will be billed whenever they include timestamps. In contrast, the normal /public/build/status.ts flow only calls recordBuildTime for succeeded/failed, so this introduces inconsistent and potentially inflated billing depending on which path updates the row first.

Useful? React with 👍 / 👎.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds automated reconciliation for stale build statuses in the build_requests table. It implements a cron-based solution to handle cases where the CLI disconnects before reporting final build status, ensuring dashboard accuracy and billing correctness.

Changes:

  • New migration creates a pgmq queue and configures two cron tasks: a producer that triggers every 15 minutes and a consumer that processes the queue
  • New trigger handler cron_reconcile_build_status.ts fetches stale builds, queries the builder API for actual status, updates the database, and records build time for billing
  • Trigger registration in index.ts exposes the new endpoint

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
supabase/migrations/20260216102420_add_build_status_reconciliation_cron.sql Creates pgmq queue and two cron_tasks entries (producer every 15m, consumer every 1m)
supabase/functions/_backend/triggers/cron_reconcile_build_status.ts Implements reconciliation logic: fetches stale builds from DB, queries builder API, updates status, marks orphans as failed, records build time
supabase/functions/triggers/index.ts Registers the cron_reconcile_build_status route

Comment on lines +55 to +61
'reconcile_build_status_queue',
'Process build status reconciliation queue',
'function_queue',
'["cron_reconcile_build_status"]',
null,
null,
1,
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description states the consumer runs "every 5m" but the minute_interval is set to 1, which means it will actually run every 1 minute. This is inconsistent with the description and may cause unexpected resource usage. If 5 minutes was intentional, change minute_interval to 5. If 1 minute was intentional, update the PR description to match.

Copilot uses AI. Check for mistakes.
throw new Error(updateError.message)

if (
TERMINAL_STATUSES.has(jobStatus)
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recordBuildTime function should only be called for 'succeeded' or 'failed' statuses, not for all terminal statuses. The existing pattern in supabase/functions/_backend/public/build/status.ts:128 explicitly checks for these two statuses before calling recordBuildTime. Other terminal statuses like 'expired', 'released', or 'cancelled' may not have valid started_at/completed_at times or should not be billed. Add an explicit status check: if ((jobStatus === 'succeeded' || jobStatus === 'failed') && builderJob.job.started_at && builderJob.job.completed_at)

Suggested change
TERMINAL_STATUSES.has(jobStatus)
(jobStatus === 'succeeded' || jobStatus === 'failed')

Copilot uses AI. Check for mistakes.
Comment on lines +127 to +135
await recordBuildTime(
c,
build.owner_org,
build.requested_by,
build.builder_job_id!,
build.platform,
buildTimeSeconds,
)
}
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing logging before recordBuildTime call. The existing pattern in supabase/functions/_backend/public/build/status.ts:129-136 includes a cloudlog call before recordBuildTime to track when build time is being recorded. This is helpful for debugging billing issues and understanding cron job behavior. Add a similar log here with buildId, orgId, platform, and buildTimeSeconds.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants