Skip to content

Conversation

@perkinsjr
Copy link
Collaborator

@perkinsjr perkinsjr commented Jan 6, 2026

What does this PR do?

  • Added payment recovery detection system that identifies when successful payments follow recent failures
  • Implemented Slack alerts for subscription events (creation, updates, failures, and recoveries)
  • Enhanced Stripe webhook handler with comprehensive payment state tracking

Changes

  • Payment Recovery Detection: New stateless detector using Stripe webhook events to identify payment recoveries within 24-hour and 7-day patterns. This is when Stripe gives up.
  • Slack Alerts: Added alerting system for subscription lifecycle events including failures and successful recoveries
  • Webhook Handler: Enhanced Stripe webhook route with improved event processing and alerting integration

Files Changed

  • web/apps/dashboard/app/api/webhooks/stripe/route.ts - Enhanced webhook processing
  • web/apps/dashboard/lib/utils/paymentRecoveryDetection.ts - New payment recovery detection logic
  • web/apps/dashboard/lib/utils/slackAlerts.ts - New Slack alerting utilities

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • [] Chore (refactoring code, technical debt, workflow improvements)
  • Enhancement (small improvements)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How should this be tested?

To test login to stripe and use the Stripe sandbox, you can do this all locally:

Join #stripe-test channel in slack

Regression tests:

  1. Use the Stripe CLI to listen to your dashboard. (stripe listen --forward-to localhost:3000/api/webhooks/stripe)
  2. Create a new subscription via the dashboard.
  3. Alert will fire.
  4. Upgrade for another alert
  5. Cancel for another one.

New tests

Payment failed:

  1. Using the dashboard open the portal and add the Credit card 4000000000000341 set it as the default.
  2. Go to the sandbox and select the Customer you created, and make the time in the future 6th of the month.
  3. Payment should fail an a new alert will arrive.

Payment recovered

  1. Update the card in your dashboard to the 42424242424242 card and then charge it again.
  2. A new alert should fire.

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Contributing Guide
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand areas
  • Ran pnpm build
  • Ran pnpm fmt
  • Ran make fmt on /go directory
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from main onto my branch with git pull origin main
  • My changes don't cause any responsiveness issues

Appreciated

  • If a UI change was made: Added a screen recording or screenshots to this PR
  • Updated the Unkey Docs if changes were necessary

@vercel
Copy link

vercel bot commented Jan 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Review Updated (UTC)
dashboard Ignored Ignored Preview Jan 8, 2026 7:33pm
engineering Ignored Ignored Preview Jan 8, 2026 7:33pm

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 6, 2026

📝 Walkthrough

Walkthrough

Adds Stripe payment-recovery detection and quota parsing utilities, refactors the Stripe webhook route to validate signatures and handle invoice.payment_failed/payment_succeeded with recovery-aware logic, and adds Slack alerts for payment failures and recoveries.

Changes

Cohort / File(s) Summary
Stripe utilities
web/apps/dashboard/lib/stripe/subscriptionUtils.ts, web/apps/dashboard/lib/stripe/paymentUtils.ts, web/apps/dashboard/lib/stripe/productUtils.ts, web/apps/dashboard/lib/stripe/index.ts
New utilities: subscription-change classifiers (isPaymentFailureRelatedUpdate, isAutomatedBillingRenewal, isCardUpdateOnly, PreviousAttributes), payment-recovery detection (PaymentRecoveryDetector, isPaymentRecoveryUpdate, checkRecentPaymentSuccess, isPaymentRecovery, extractPaymentContext, analyzeRecentFailurePatterns, factory), quota validation (validateAndParseQuotas), and a barrel export.
Webhook handler
web/apps/dashboard/app/api/webhooks/stripe/route.ts
Reworked webhook route: raw-body reading and Stripe signature validation, structured contextual logging, guarded flows for customer.subscription.* events, new handling for invoice.payment_failed and invoice.payment_succeeded with recovery detection, and safer no-op/early-exit responses to avoid retry storms.
Slack alerts
web/apps/dashboard/lib/utils/slackAlerts.ts
Added alertPaymentFailed and alertPaymentRecovered which format amounts and post structured messages to a configured Slack webhook with error handling and logging.

Sequence Diagram(s)

sequenceDiagram
    participant Stripe as Stripe API
    participant Handler as Webhook Route
    participant StripeSDK as Stripe SDK
    participant Detector as PaymentRecoveryDetector
    participant Slack as Slack Alerts

    Stripe->>Handler: POST webhook (event)
    Handler->>Handler: Read raw body & verify signature
    Handler->>StripeSDK: Fetch customer/invoice/subscription as needed
    Handler->>Detector: isPaymentRecovery / isPaymentRecoveryUpdate (for success/events)
    Detector->>StripeSDK: Query invoices/events (recent history)
    Detector-->>Handler: recovery? true/false
    alt payment failed
        Handler->>Slack: alertPaymentFailed(...)
        Slack-->>Handler: response
    else payment succeeded & recovery
        Handler->>Slack: alertPaymentRecovered(...)
        Slack-->>Handler: response
    end
    Handler-->>Stripe: 200 OK (or early-exit OK)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.63% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly and concisely describes the main enhancement: adding Stripe webhook payment recovery detection and Slack alerts.
Description check ✅ Passed The PR description covers objectives, changes, testing instructions, and required checklist items; however, it references a non-existent file path (paymentRecoveryDetection.ts instead of the actual paymentUtils.ts) creating some inconsistency with the actual changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@perkinsjr
Copy link
Collaborator Author

@CodeRabbit review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 6, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Fix all issues with AI Agents 🤖
In @web/apps/dashboard/app/api/webhooks/stripe/route.ts:
- Around line 214-229: The webhook handler correctly separates body parsing and
signature validation into two try/catch blocks, but the error response for the
signature-validation failure uses a string with an unintended trailing space;
update the Response returned in the catch that handles
stripe.webhooks.constructEvent to use "Error" (no trailing space) and ensure the
catch still logs the error (console.error("Webhook signature validation
failed:", error)) while returning new Response("Error", { status: 400 })
referencing the same symbols (event, requestBody, signature,
e.STRIPE_WEBHOOK_SECRET).
- Around line 683-686: The code currently sets failureReason from
invoice.last_finalization_error which is for finalization issues; change the
failureReason extraction to prefer payment failure fields: read
invoice.payment_intent?.last_payment_error?.message first, then fall back to
invoice.charge?.failure_message, and only then to
invoice.last_finalization_error?.message or a default; update the variable
assignment for failureReason (where amount and currency are set) so that payment
failures for the invoice.payment_failed event are reported using the correct
fields.
🧹 Nitpick comments (7)
web/apps/dashboard/lib/utils/slackAlerts.ts (2)

182-250: Currency parameter is unused in the alert message.

The currency parameter is accepted but never used when formatting the amount or in the Slack message text. Consider including the currency in the message for clarity, especially for international customers.

Also, logging customerEmail in error objects (lines 230, 245) includes PII in logs. Consider whether this is necessary or if a customer ID would suffice for debugging.

🔎 Proposed fix to include currency in the message
-    const formattedAmount = formatPrice(amount);
+    const formattedAmount = `${formatPrice(amount)} ${currency.toUpperCase()}`;

252-317: Same currency usage and PII logging observations apply here.

The alertPaymentRecovered function has the same issues as alertPaymentFailed:

  1. currency parameter is unused in the formatted message
  2. customerEmail is logged in error objects (lines 299, 312)

The improved error handling with try/catch and response.ok check is a good pattern compared to the older .catch() approach in the existing functions.

web/apps/dashboard/app/api/webhooks/stripe/route.ts (2)

722-738: Returning 200 on errors masks failures in monitoring.

Returning HTTP 200 when errors occur (lines 717, 738) prevents Stripe retries but also makes it impossible to detect failures via HTTP status monitoring. Consider using a structured approach:

  1. Return 200 for transient/recoverable errors you don't want retried
  2. Return 500 for unexpected errors that indicate bugs

Alternatively, implement alerting/metrics on these error paths since the HTTP response won't indicate failure.


859-862: Same concern about returning 200 on errors.

Same observation as the invoice.payment_failed handler - returning 200 on unexpected errors will mask issues. Consider differentiating between expected vs. unexpected error conditions.

web/apps/dashboard/lib/utils/paymentRecoveryDetection.ts (3)

61-105: Potential performance concern: fetching all failed payment events.

The stripe.events.list call retrieves ALL invoice.payment_failed events in the time window (up to 100), then filters by customer in memory. In high-volume environments with many customers, this could:

  1. Return mostly irrelevant events for other customers
  2. Miss relevant events if limit is exceeded

Consider using the invoice ID directly when possible, since invoiceId is already available. The charges.list approach in checkInvoicePaymentAttempts is more targeted.

🔎 Alternative approach prioritizing invoice-specific check
  async isRecoveryFromFailure(successEvent: Stripe.Event, invoiceId: string): Promise<boolean> {
    try {
      const invoice = successEvent.data.object as Stripe.Invoice;
+     
+     // First, check invoice-specific indicators (more efficient)
+     const hasMultipleAttempts = await this.checkInvoicePaymentAttempts(invoice);
+     if (hasMultipleAttempts) {
+       return true;
+     }
+
      // Get the customer ID for filtering events
      const customerId =
        typeof invoice.customer === "string" ? invoice.customer : invoice.customer?.id;

223-279: Unused public methods: extractPaymentContext and analyzeRecentFailurePatterns.

These methods are exposed publicly but are not used by isPaymentRecovery or anywhere in the route handler. If they're intended for future use, consider:

  1. Making them private until needed, or
  2. Adding a comment indicating their intended future use

analyzeRecentFailurePatterns has a longer lookback (7 days) vs isRecoveryFromFailure (24 hours) - clarify which should be the primary detection mechanism.

Also applies to: 281-365


160-221: Good fallback mechanism with invoice attempt checks.

The checkInvoicePaymentAttempts method provides a reliable fallback using invoice.attempt_count and charge history. This is a more targeted approach than the events list query.

One edge case: charges.data.length > 1 assumes multiple charges indicate retry attempts, but Stripe incremental authorizations (where a PaymentIntent's authorized amount is increased across multiple attempts) also create multiple Charge objects without being retries. This could cause the method to misidentify incremental authorizations as payment recovery attempts.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 887a854 and 05f6084.

📒 Files selected for processing (3)
  • web/apps/dashboard/app/api/webhooks/stripe/route.ts
  • web/apps/dashboard/lib/utils/paymentRecoveryDetection.ts
  • web/apps/dashboard/lib/utils/slackAlerts.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-08T15:33:04.290Z
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 2120
File: apps/dashboard/app/(app)/settings/billing/stripe/success/page.tsx:19-19
Timestamp: 2024-10-08T15:33:04.290Z
Learning: In the `StripeSuccess` function, TypeScript's type-checking of the `new_plan` parameter ensures that only "free", "pro", or undefined values are accepted, so additional runtime validation is not necessary.

Applied to files:

  • web/apps/dashboard/app/api/webhooks/stripe/route.ts
🧬 Code graph analysis (1)
web/apps/dashboard/app/api/webhooks/stripe/route.ts (2)
web/apps/dashboard/lib/utils/slackAlerts.ts (3)
  • alertSubscriptionCancelled (148-180)
  • alertPaymentFailed (182-250)
  • alertPaymentRecovered (252-317)
web/apps/dashboard/lib/utils/paymentRecoveryDetection.ts (1)
  • isPaymentRecovery (386-394)
🔇 Additional comments (4)
web/apps/dashboard/lib/utils/slackAlerts.ts (1)

1-2: LGTM on the import addition.

The formatPrice import is correctly added to support consistent currency formatting in the new alert functions.

web/apps/dashboard/app/api/webhooks/stripe/route.ts (2)

39-80: Well-designed helper for distinguishing payment failure updates.

The isPaymentFailureRelatedUpdate function correctly identifies subscription updates triggered by payment failures vs. manual changes. The logic to check for past_due, unpaid, or incomplete status changes while excluding manual modifications is sound.


791-805: Good defensive handling of recovery detection failures.

Setting isRecovery = false when detection fails is the correct approach to avoid false positive recovery alerts. The error logging with context is helpful for debugging.

web/apps/dashboard/lib/utils/paymentRecoveryDetection.ts (1)

386-394: Clean API design for the main entry point.

The isPaymentRecovery convenience function provides a simple interface for the webhook handler. Creating a new detector instance per call is fine for this stateless design.

@perkinsjr perkinsjr changed the title chore: stripe webhooks improvement v2 DO NOT REVIEW feat: add Stripe webhook payment recovery detection and Slack alerts Jan 6, 2026
- Remove failureReason parameter from payment alert function call
- Simplify alert payload by removing unused variable
- Clean up function arguments to match expected signature
@perkinsjr perkinsjr marked this pull request as ready for review January 6, 2026 20:14
import { stripeEnv } from "@/lib/env";
import { formatPrice } from "@/lib/fmt";
import { freeTierQuotas } from "@/lib/quotas";
import { isPaymentRecovery } from "@/lib/utils/paymentRecoveryDetection";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's weird that this specific one is imported and all other isXxx functions are defined here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because it's another 500 LOC. So I broke it up, I want to extract the others at some point in the future.

@perkinsjr perkinsjr force-pushed the stripe-webhooks-improvement-v2 branch from 1d4bd1b to 951047b Compare January 6, 2026 20:19
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI Agents
In @web/apps/dashboard/lib/utils/slackAlerts.ts:
- Around line 182-247: The currency parameter on alertPaymentFailed is unused
and formatPrice currently hardcodes "USD"/"en-US"; update formatPrice (in
lib/fmt.ts) to accept a currency (and optional locale) argument and use it when
formatting, then call formatPrice(amount, currency) from alertPaymentFailed so
the Slack message displays the actual payment currency; alternatively if you
intend to always use USD remove the currency parameter from alertPaymentFailed
and from its callers — but do one or the other to eliminate the mismatch.
🧹 Nitpick comments (4)
web/apps/dashboard/lib/utils/paymentRecoveryDetection.ts (4)

38-144: Consider configuration for hardcoded constants.

The method uses hardcoded values for the time window (24 hours) and event limit (100). For customers with high transaction volumes, the 100-event limit might miss recent failures. Consider making these configurable via environment variables or constants.

🔎 Suggested approach

Extract constants at the module level:

const RECOVERY_DETECTION_WINDOW_HOURS = 24;
const RECOVERY_DETECTION_EVENT_LIMIT = 100;

Then reference them in the method. This makes the values easier to tune without code changes.


236-242: Remove unused method.

The isRecentFailure method is defined but never called anywhere in the class. Consider removing it or documenting why it's kept for future use.

#!/bin/bash
# Verify isRecentFailure is not used anywhere in the codebase
rg -nP 'isRecentFailure' --type=ts

319-369: Handle unexpanded customer objects.

When invoice.customer is a string ID (not expanded), the method sets customerEmail and customerName to empty strings. This could lead to alerts with missing customer information. Consider fetching customer details when only the ID is present, or document that the caller must ensure customer expansion.


380-444: Unused public method - document or remove.

The analyzeRecentFailurePatterns method is public but doesn't appear to be called anywhere in the codebase. If it's intended for future use or external consumption, add documentation explaining its purpose. Otherwise, consider making it private or removing it.

#!/bin/bash
# Verify analyzeRecentFailurePatterns usage
rg -nP 'analyzeRecentFailurePatterns' --type=ts
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 05f6084 and 1d4bd1b.

📒 Files selected for processing (3)
  • web/apps/dashboard/app/api/webhooks/stripe/route.ts
  • web/apps/dashboard/lib/utils/paymentRecoveryDetection.ts
  • web/apps/dashboard/lib/utils/slackAlerts.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-08T15:33:04.290Z
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 2120
File: apps/dashboard/app/(app)/settings/billing/stripe/success/page.tsx:19-19
Timestamp: 2024-10-08T15:33:04.290Z
Learning: In the `StripeSuccess` function, TypeScript's type-checking of the `new_plan` parameter ensures that only "free", "pro", or undefined values are accepted, so additional runtime validation is not necessary.

Applied to files:

  • web/apps/dashboard/app/api/webhooks/stripe/route.ts
🧬 Code graph analysis (1)
web/apps/dashboard/app/api/webhooks/stripe/route.ts (2)
web/apps/dashboard/lib/utils/slackAlerts.ts (3)
  • alertSubscriptionCancelled (148-180)
  • alertPaymentFailed (182-247)
  • alertPaymentRecovered (249-314)
web/apps/dashboard/lib/utils/paymentRecoveryDetection.ts (1)
  • isPaymentRecovery (465-473)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Adds emoji reaction to slack message when a PR is closed or reviewed
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (12)
web/apps/dashboard/lib/utils/slackAlerts.ts (2)

1-1: LGTM!

The import is properly used in both new alert functions for consistent price formatting.


249-314: LGTM - Consistent implementation.

The function follows the same robust error handling pattern as alertPaymentFailed. The currency parameter concern applies here as well (see previous comment).

web/apps/dashboard/lib/utils/paymentRecoveryDetection.ts (3)

153-227: LGTM - Robust subscription change detection.

The method correctly identifies subscription changes by checking proration lines and mid-cycle timing. Error handling safely defaults to false, avoiding false positives.


250-311: LGTM - Comprehensive payment attempt checking.

The method properly checks multiple indicators of payment retries: attempt_count, multiple charges, and failed-then-successful charge sequences. Error handling is robust.


453-473: LGTM - Clean API surface.

The factory function and isPaymentRecovery utility provide a clean, simple API for webhook handlers. The implementation correctly delegates to the detector class.

web/apps/dashboard/app/api/webhooks/stripe/route.ts (7)

39-80: LGTM - Clear payment failure detection.

The function correctly identifies payment-failure-related updates by checking status changes and invoice updates while excluding manual subscription changes. The logic is well-documented and sound.


143-256: LGTM - Robust payment recovery detection.

The recovery detection logic correctly identifies status transitions from payment failure states to active, while filtering out manual changes. The 2-hour "recent payment" threshold is reasonable for avoiding false positives.


329-344: LGTM - Improved error handling separation.

The separation of body parsing and signature validation into distinct try/catch blocks provides clearer error messages. The trailing space issue from the previous review has been resolved.


369-398: LGTM - Prevents duplicate processing.

The skip logic correctly prevents duplicate processing of payment failures and recoveries that are handled by dedicated invoice webhook handlers. The detailed logging aids debugging.


561-643: LGTM - Resilient subscription deletion handling.

The enhanced error handling ensures that customer retrieval failures don't prevent subscription cancellation processing. Structured logging provides good debugging context.


765-868: LGTM - Correct failure reason extraction.

The failure reason extraction now correctly prioritizes payment_intent?.last_payment_error?.message, then charge?.failure_message, before falling back to last_finalization_error?.message. This addresses the previous review concern about using the appropriate fields for payment failures.


870-991: LGTM - Smart recovery alerting.

The handler correctly uses isPaymentRecovery to determine when to send recovery alerts, avoiding false positives for normal successful payments. Error handling appropriately returns 200 to prevent unnecessary retries.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI Agents
In @web/apps/dashboard/app/api/webhooks/stripe/route.ts:
- Around line 349-363: Change the three webhook early-return responses that
currently use new Response("OK", { status: 201 }) to return 200 instead of 201:
update the return statements in the branches where
isAutomatedBillingRenewal(sub, previousAttributes),
isPaymentFailureRelatedUpdate(sub, previousAttributes), and the isRecovery
branch after await isPaymentRecoveryUpdate(stripe, sub, previousAttributes,
event) so they use new Response("OK", { status: 200 }) to reflect a
skipped/no-op event rather than a resource creation.

In @web/apps/dashboard/lib/utils/paymentRecoveryDetection.ts:
- Around line 224-230: Remove the dead private method isRecentFailure or
refactor isRecoveryFromFailure to call it; specifically either delete the unused
isRecentFailure(failureEvent: Stripe.Event, successTimestamp: number)
implementation, or replace the inline time-window logic inside
isRecoveryFromFailure with a call to isRecentFailure to centralize the 24-hour
check (referencing isRecentFailure and isRecoveryFromFailure by name so the edit
is easy to locate).
- Around line 335-339: For invoice.payment_failed events, replace the hardcoded
failureReason with logic that checks the event payload for a payment intent
error: look at event.data.object.payment_intent and, if present, inspect
payment_intent.last_payment_error (or
payment_intent.last_payment_error.type/message/code) to construct a more
specific failureReason; if that's not available, fall back to any charge-level
failure info on event.data.object.charge or retain the generic "Payment failed"
as last resort. Update the code around the failureReason variable (referencing
failureReason, event.type, event.data.object.payment_intent, and
payment_intent.last_payment_error) to prefer these detailed fields when
available.
🧹 Nitpick comments (3)
web/apps/dashboard/app/api/webhooks/stripe/route.ts (1)

197-240: Consider adding timeout protection for Stripe API call.

The function has good error handling, but the stripe.invoices.retrieve call on line 212 lacks a timeout. In webhook contexts, long-running API calls can cause timeouts and retries.

🔎 Proposed enhancement with AbortController
 async function checkRecentPaymentSuccess(
   stripe: Stripe,
   sub: Stripe.Subscription,
   event: Stripe.Event,
 ): Promise<boolean> {
   try {
     // Instead of fetching many events, check the subscription's latest invoice directly
     if (!sub.latest_invoice) {
       return false;
     }

     const invoiceId =
       typeof sub.latest_invoice === "string" ? sub.latest_invoice : sub.latest_invoice.id;

-    // Retrieve the latest invoice to check its payment status
-    const invoice = await stripe.invoices.retrieve(invoiceId);
+    // Retrieve the latest invoice to check its payment status with timeout
+    const controller = new AbortController();
+    const timeoutId = setTimeout(() => controller.abort(), 10000); // 10 second timeout
+    
+    try {
+      const invoice = await stripe.invoices.retrieve(invoiceId, {
+        // @ts-expect-error - AbortSignal support
+        signal: controller.signal,
+      });
+      clearTimeout(timeoutId);
web/apps/dashboard/lib/utils/paymentRecoveryDetection.ts (2)

66-86: Events list retrieval lacks customer filtering, impacting performance.

The stripe.events.list call retrieves up to 100 invoice.payment_failed events across all customers in the account, then filters by invoice ID locally (line 96). This is inefficient for accounts with high webhook volume.

Unfortunately, the Stripe Events API doesn't support filtering by customer ID directly. Consider these alternatives:

  1. Reduce the time window to minimize events fetched
  2. Reduce the limit since you only need to find failures for one specific invoice
  3. Rely primarily on checkInvoicePaymentAttempts which uses invoice-specific data
🔎 Proposed optimization
       try {
         // Retrieve recent events for this customer to look for payment failures
         recentEvents = await this.stripe.events.list({
           type: "invoice.payment_failed",
           created: {
             gte: earliestFailureTime,
             lte: successTimestamp,
           },
-          limit: 100, // Reasonable limit to check recent failures
+          limit: 25, // Smaller limit since we're filtering for a specific invoice
         });
       } catch (eventsError) {

182-188: Consider removing or demoting the console.info log in production.

This console.info will log for every mid-cycle subscription change, which could be noisy in production. Consider using a debug-level logger or removing it after validation.

🔎 Proposed change
          // If more than 1 day until next billing cycle, a mid-cycle change
          const oneDayInSeconds = 24 * 60 * 60;
          if (timeUntilNextCycle > oneDayInSeconds) {
-            console.info("Invoice created mid-cycle, likely subscription change", {
-              invoiceId: invoice.id,
-              subscriptionId,
-              timeUntilNextCycle,
-            });
            return true;
          }
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d4bd1b and 951047b.

📒 Files selected for processing (2)
  • web/apps/dashboard/app/api/webhooks/stripe/route.ts
  • web/apps/dashboard/lib/utils/paymentRecoveryDetection.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-08T15:33:04.290Z
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 2120
File: apps/dashboard/app/(app)/settings/billing/stripe/success/page.tsx:19-19
Timestamp: 2024-10-08T15:33:04.290Z
Learning: In the `StripeSuccess` function, TypeScript's type-checking of the `new_plan` parameter ensures that only "free", "pro", or undefined values are accepted, so additional runtime validation is not necessary.

Applied to files:

  • web/apps/dashboard/lib/utils/paymentRecoveryDetection.ts
  • web/apps/dashboard/app/api/webhooks/stripe/route.ts
🧬 Code graph analysis (1)
web/apps/dashboard/app/api/webhooks/stripe/route.ts (2)
web/apps/dashboard/lib/utils/slackAlerts.ts (3)
  • alertSubscriptionCancelled (148-180)
  • alertPaymentFailed (182-247)
  • alertPaymentRecovered (249-314)
web/apps/dashboard/lib/utils/paymentRecoveryDetection.ts (1)
  • isPaymentRecovery (453-461)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: autofix
🔇 Additional comments (11)
web/apps/dashboard/app/api/webhooks/stripe/route.ts (7)

6-14: LGTM on new imports for payment recovery and alerting.

The imports are well-organized and align with the new functionality for detecting payment recoveries and sending Slack alerts.


39-80: Well-designed detection logic for payment failure updates.

The function correctly distinguishes between payment-failure-related subscription updates and manual user-initiated changes. The logic of checking for status changes to failure states combined with invoice changes, while excluding manual modification keys, provides good accuracy.


143-195: LGTM on payment recovery update detection.

The function properly handles the different recovery scenarios: direct status recovery and invoice-based recovery detection. The exclusion of manual changes prevents false positives.


313-328: LGTM on webhook validation flow.

The separation of body parsing and signature validation into distinct try/catch blocks provides clear error isolation and better debugging. The error responses are now correctly formatted without trailing spaces.


731-828: Well-structured payment failure handler with appropriate error handling.

The handler correctly:

  • Returns 200 on all paths to prevent unnecessary Stripe retries
  • Validates invoice and customer data before processing
  • Logs warnings for edge cases (negative amounts) without failing
  • Wraps the alert in a try/catch to ensure the webhook succeeds even if alerting fails

This aligns well with webhook best practices for payment processing.


872-886: Good defensive handling of recovery detection failures.

The try/catch around isPaymentRecovery with a fallback to false prevents recovery detection errors from affecting normal payment success processing. This avoids false positives when detection fails.


575-593: LGTM on customer retrieval error handling.

The nested try/catch for customer retrieval ensures that a failure to fetch customer details (e.g., deleted customer, network issue) doesn't prevent the core subscription cancellation logic from completing. The comment "Continue without sending alert rather than failing" clearly documents the intent.

web/apps/dashboard/lib/utils/paymentRecoveryDetection.ts (4)

23-28: LGTM on class structure.

Clean constructor with minimal initialization. The private Stripe client storage follows good encapsulation practices.


238-299: LGTM on payment attempts checking logic.

The method correctly identifies recovery scenarios by examining both attempt_count and the underlying charge history. The nested error handling for charge retrieval is appropriate.


453-461: LGTM on the convenience function design.

The isPaymentRecovery function provides a clean, simple API for the webhook handler. Creating a new detector instance per call is appropriate since the class maintains no cached state.


368-432: Public method analyzeRecentFailurePatterns appears unused.

This method is marked public but isn't called anywhere in the codebase. If it's intended for future use, consider documenting the intended usage. Otherwise, consider making it private or removing it.

@perkinsjr perkinsjr marked this pull request as draft January 6, 2026 21:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI Agents
In @web/apps/dashboard/app/api/webhooks/stripe/route.ts:
- Around line 89-104: The handler returns 201 ("Created") for skipped webhook
branches which is semantically incorrect; update the early return responses in
the branches that call isAutomatedBillingRenewal(sub, previousAttributes),
isPaymentFailureRelatedUpdate(sub, previousAttributes), and the isRecovery check
from isPaymentRecoveryUpdate(stripe, sub, previousAttributes, event) so that
they return a Response("OK", { status: 200 }) instead of Response("OK", {
status: 201 }); locate and replace the three Response("OK", { status: 201 })
occurrences tied to those checks.

In @web/apps/dashboard/lib/stripe/paymentUtils.ts:
- Around line 171-191: The stripe.events.list call (recentEvents) is fetching
invoice.payment_failed globally and then filtered client-side; update the
request to use Stripe Events V2 filtering by adding object_id: customerId to the
parameters passed into this.stripe.events.list while keeping type, created range
and limit intact so the API filters by customer; keep the catch fallback to
this.checkInvoicePaymentAttempts(invoice) unchanged.
🧹 Nitpick comments (2)
web/apps/dashboard/lib/stripe/productUtils.ts (1)

26-28: Consider adding radix parameter and validating non-negative values.

Number.parseInt without a radix can produce unexpected results if metadata contains leading zeros (interpreted as octal in some contexts). Additionally, negative quota values would pass validation but are likely invalid.

🔎 Proposed fix
-  const requestsPerMonth = Number.parseInt(product.metadata.quota_requests_per_month);
-  const logsRetentionDays = Number.parseInt(product.metadata.quota_logs_retention_days);
-  const auditLogsRetentionDays = Number.parseInt(product.metadata.quota_audit_logs_retention_days);
+  const requestsPerMonth = Number.parseInt(product.metadata.quota_requests_per_month, 10);
+  const logsRetentionDays = Number.parseInt(product.metadata.quota_logs_retention_days, 10);
+  const auditLogsRetentionDays = Number.parseInt(product.metadata.quota_audit_logs_retention_days, 10);

   if (
     Number.isNaN(requestsPerMonth) ||
     Number.isNaN(logsRetentionDays) ||
-    Number.isNaN(auditLogsRetentionDays)
+    Number.isNaN(auditLogsRetentionDays) ||
+    requestsPerMonth < 0 ||
+    logsRetentionDays < 0 ||
+    auditLogsRetentionDays < 0
   ) {
-    console.error(`Invalid quota metadata - parsed to NaN for product: ${product.id}`);
+    console.error(`Invalid quota metadata - parsed to NaN or negative for product: ${product.id}`);
     return { valid: false };
   }
web/apps/dashboard/app/api/webhooks/stripe/route.ts (1)

509-511: Type narrowing after deleted check.

After confirming !customer.deleted, the type should be narrowed to Stripe.Customer. However, the additional !("email" in customer) check is redundant since non-deleted customers always have the email property (though it may be null). The !customer.email check handles the null case.

🔎 Proposed fix
-        if (customer.deleted || !("email" in customer) || !customer.email) {
+        if (customer.deleted || !customer.email) {
           return new Response("OK", { status: 200 });
         }
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 951047b and 8aa92dd.

📒 Files selected for processing (5)
  • web/apps/dashboard/app/api/webhooks/stripe/route.ts
  • web/apps/dashboard/lib/stripe/index.ts
  • web/apps/dashboard/lib/stripe/paymentUtils.ts
  • web/apps/dashboard/lib/stripe/productUtils.ts
  • web/apps/dashboard/lib/stripe/subscriptionUtils.ts
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2024-10-08T15:33:04.290Z
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 2120
File: apps/dashboard/app/(app)/settings/billing/stripe/success/page.tsx:19-19
Timestamp: 2024-10-08T15:33:04.290Z
Learning: In the `StripeSuccess` function, TypeScript's type-checking of the `new_plan` parameter ensures that only "free", "pro", or undefined values are accepted, so additional runtime validation is not necessary.

Applied to files:

  • web/apps/dashboard/lib/stripe/productUtils.ts
  • web/apps/dashboard/lib/stripe/subscriptionUtils.ts
  • web/apps/dashboard/app/api/webhooks/stripe/route.ts
📚 Learning: 2025-06-02T11:08:56.397Z
Learnt from: ogzhanolguncu
Repo: unkeyed/unkey PR: 3292
File: apps/dashboard/lib/vault.ts:80-97
Timestamp: 2025-06-02T11:08:56.397Z
Learning: The vault.ts file in apps/dashboard/lib/vault.ts is a duplicate of the vault package from the `api` directory and should be kept consistent with that original implementation.

Applied to files:

  • web/apps/dashboard/lib/stripe/index.ts
🧬 Code graph analysis (3)
web/apps/dashboard/lib/stripe/productUtils.ts (1)
web/apps/dashboard/lib/stripe/index.ts (1)
  • validateAndParseQuotas (19-19)
web/apps/dashboard/app/api/webhooks/stripe/route.ts (4)
web/apps/dashboard/lib/stripe/index.ts (4)
  • isAutomatedBillingRenewal (4-4)
  • isPaymentFailureRelatedUpdate (3-3)
  • isPaymentRecoveryUpdate (10-10)
  • isPaymentRecovery (14-14)
web/apps/dashboard/lib/stripe/subscriptionUtils.ts (2)
  • isAutomatedBillingRenewal (78-130)
  • isPaymentFailureRelatedUpdate (32-67)
web/apps/dashboard/lib/stripe/paymentUtils.ts (2)
  • isPaymentRecoveryUpdate (27-73)
  • isPaymentRecovery (570-578)
web/apps/dashboard/lib/utils/slackAlerts.ts (3)
  • alertSubscriptionCancelled (148-180)
  • alertPaymentFailed (182-247)
  • alertPaymentRecovered (249-314)
web/apps/dashboard/lib/stripe/paymentUtils.ts (1)
web/apps/dashboard/lib/stripe/subscriptionUtils.ts (1)
  • PreviousAttributes (132-132)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (17)
web/apps/dashboard/lib/stripe/productUtils.ts (1)

1-24: Clean validation structure with early returns.

The function correctly validates required metadata fields before parsing and provides useful error messages with the product ID for debugging.

web/apps/dashboard/lib/stripe/subscriptionUtils.ts (3)

3-23: Well-structured interface for Stripe subscription changes.

The PreviousAttributes interface accurately models the fields that Stripe includes in previous_attributes during subscription updates. Good documentation with comments explaining each field category.


32-67: Clear logic for detecting payment-failure related updates.

The function correctly identifies updates that are due to payment issues rather than manual subscription changes, enabling proper routing to the invoice.payment_failed webhook handler.


104-121: Code pattern is correct per Stripe subscription design.

The function correctly checks only the first subscription item (data[0]), which represents the current product. Stripe subscriptions have one active price per subscription at any given time, making the first item the relevant one to inspect. This pattern is consistent throughout the codebase and confirmed by prior work in this project. The early return on unexpected structure (line 106) provides a safe fallback.

web/apps/dashboard/lib/stripe/index.ts (1)

1-19: Clean barrel file consolidating Stripe utilities.

The re-exports are well-organized with section comments and correctly expose the public API surface for subscription, payment, and product utilities.

web/apps/dashboard/app/api/webhooks/stripe/route.ts (4)

143-149: Good integration of quota validation with early return.

The validateAndParseQuotas usage correctly validates product metadata before database updates. Returning 200 OK on invalid quotas prevents retries while logging the error in the utility function.


267-349: Robust subscription deletion handling with graceful degradation.

The handler properly resets workspace quotas to free tier, logs audit events, and gracefully handles customer retrieval failures for alerting without failing the webhook.


570-684: Well-structured payment success handler with recovery detection.

The handler correctly uses isPaymentRecovery to detect and alert on payment recoveries. The conservative approach of defaulting to false on detection errors avoids false positive alerts. Same minor redundancy in the type check at line 608 as noted for the payment_failed handler.


250-263: Consistent structured error logging pattern.

The error serialization pattern (message, stack, name for Error instances, raw value otherwise) with contextual IDs (eventId, eventType, subscriptionId) is applied consistently across all handlers, improving observability and debugging.

web/apps/dashboard/lib/stripe/paymentUtils.ts (8)

7-18: Clear PaymentContext interface for webhook data extraction.

The interface captures essential payment information needed for recovery detection and alerting, with appropriate optional fields for context-dependent data.


27-73: Comprehensive recovery detection for subscription updates.

The function correctly identifies payment recovery by checking status transitions from failure states to active, while excluding manual subscription changes. The fallback to checkRecentPaymentSuccess handles edge cases where only the invoice changed.


97-102: Reasonable 2-hour threshold with clock skew protection.

The 2-hour window for considering a payment "recent" is appropriate for payment processing delays. The timeSincePayment >= 0 check correctly handles potential clock synchronization issues between Stripe and the application.


250-347: Robust heuristics for distinguishing subscription changes from recoveries.

The method uses multiple signals (proration lines, invoice timing relative to billing period, attempt count) to avoid false positive recovery alerts on upgrade/downgrade payments. The informational logging aids debugging.


355-416: Thorough payment attempt analysis using multiple signals.

The method correctly checks attempt_count, charge counts, and charge status combinations to detect retry scenarios. Error handling ensures graceful degradation if Stripe API calls fail.


452-456: Generic failure reason is intentional per previous discussion.

The hardcoded "Payment failed" message was previously discussed - the author confirmed they don't need reason-based logic, so this is acceptable. The context extraction handles various customer object shapes correctly.


570-578: New detector instance created per call.

isPaymentRecovery creates a new PaymentRecoveryDetector on each invocation. Since the class only holds a Stripe client reference with no caching, this is acceptable. If detection becomes a hot path, consider reusing the detector instance.


485-549: Invoice-specific failure pattern analysis.

The analyzeRecentFailurePatterns method correctly filters events by invoice ID rather than just customer ID, preventing false positives from unrelated invoice failures. The 7-day lookback window is reasonable for catching delayed payment resolutions.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
web/apps/dashboard/app/api/webhooks/stripe/route.ts (3)

90-111: Consider HTTP status code semantics and async call performance.

Two observations:

  1. Status code 201 vs 200: Lines 91, 97, 104, and 110 return status code 201 ("Created"), but these cases represent successfully processed events with no action taken. Status 200 ("OK") would be more semantically accurate.

  2. Async recovery check performance: Line 102 calls await isPaymentRecoveryUpdate(stripe, sub, previousAttributes, event), which likely makes Stripe API calls to detect recovery patterns. This async operation runs on every subscription.updated event and could impact webhook response times or hit rate limits during high event volume.

Consider:

  • Using 200 instead of 201 for consistency
  • Profiling the recovery check to ensure it doesn't cause timeouts
  • Adding a timeout wrapper around the async call with graceful degradation if it takes too long
Status code consistency fix
     // Skip database updates and notifications for automated billing renewals
     if (isAutomatedBillingRenewal(sub, previousAttributes)) {
-      return new Response("OK", { status: 201 });
+      return new Response("OK", { status: 200 });
     }

     // Skip database updates and notifications for payment failure related updates
     // Payment failures are handled by the invoice.payment_failed webhook
     if (isPaymentFailureRelatedUpdate(sub, previousAttributes)) {
-      return new Response("OK", { status: 201 });
+      return new Response("OK", { status: 200 });
     }

     // Skip database updates and notifications for payment recovery scenarios
     // Payment recoveries are handled by the invoice.payment_succeeded webhook
     const isRecovery = await isPaymentRecoveryUpdate(stripe, sub, previousAttributes, event);
     if (isRecovery) {
-      return new Response("OK", { status: 201 });
+      return new Response("OK", { status: 200 });
     }

     // Skip database updates and notifications for card/payment method updates only
     // These don't affect subscription pricing, quotas, or other business logic
     if (isCardUpdateOnly(sub, previousAttributes)) {
-      return new Response("OK", { status: 201 });
+      return new Response("OK", { status: 200 });
     }

521-531: Clarify negative amount handling.

Lines 525-531 validate that amount >= 0 and log a warning for negative amounts, but processing continues and the alert is sent with the potentially negative value.

In normal Stripe operations, invoice amounts should not be negative (refunds have separate events). If a negative amount occurs, it likely indicates a data integrity issue. Consider:

  1. Returning early if amount is negative to prevent incorrect alerts
  2. Using a fallback value (e.g., 0 or Math.abs(amount))
  3. Documenting expected behavior if negative amounts are valid in your workflow
Proposed: Return early on negative amounts
     // Validate amount and currency
     if (amount < 0) {
       console.warn("Payment failed event with negative amount", {
         amount,
         invoiceId: invoice.id,
         eventId: event.id,
       });
+      return new Response("OK", { status: 200 });
     }

Note: The same issue exists at lines 641-647 in the invoice.payment_succeeded handler.


90-111: Consider webhook event ordering and observability.

The architecture now splits payment lifecycle handling between subscription and invoice events. This is sound, but introduces operational considerations:

  1. Event ordering: Stripe webhooks may arrive out of order during high volume. If invoice.payment_succeeded arrives before the related subscription.updated, recovery detection could miss the context. Consider whether event ordering guarantees are needed for correct detection.

  2. Observability: Payment failures and recoveries are only captured via Slack alerts. Consider adding database records or metrics for:

    • Payment failure counts per customer/workspace
    • Recovery patterns and timing
    • Alert delivery failures (which currently fail silently per lines 552, 666)
  3. Monitoring webhook latency: The multiple async Stripe API calls (lines 102, 623) could cause webhook timeouts. Monitor P95/P99 latency of webhook handlers, especially during payment volume spikes.

Also applies to: 478-575, 577-691

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8aa92dd and 7ee2f85.

📒 Files selected for processing (2)
  • web/apps/dashboard/app/api/webhooks/stripe/route.ts
  • web/apps/dashboard/lib/stripe/subscriptionUtils.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-08T15:33:04.290Z
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 2120
File: apps/dashboard/app/(app)/settings/billing/stripe/success/page.tsx:19-19
Timestamp: 2024-10-08T15:33:04.290Z
Learning: In the `StripeSuccess` function, TypeScript's type-checking of the `new_plan` parameter ensures that only "free", "pro", or undefined values are accepted, so additional runtime validation is not necessary.

Applied to files:

  • web/apps/dashboard/app/api/webhooks/stripe/route.ts
  • web/apps/dashboard/lib/stripe/subscriptionUtils.ts
🧬 Code graph analysis (1)
web/apps/dashboard/lib/stripe/subscriptionUtils.ts (1)
web/apps/dashboard/lib/stripe/index.ts (3)
  • PreviousAttributes (5-5)
  • isPaymentFailureRelatedUpdate (3-3)
  • isAutomatedBillingRenewal (4-4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test Dashboard / Test Dashboard
  • GitHub Check: autofix
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (9)
web/apps/dashboard/lib/stripe/subscriptionUtils.ts (4)

3-26: LGTM! Well-structured interface.

The PreviousAttributes interface is comprehensive and well-documented, covering all relevant subscription change scenarios.


35-70: LGTM! Payment failure detection logic is sound.

The function correctly identifies payment-failure-related updates by checking for status changes to failure states or invoice changes, while excluding manual subscription modifications. The allowlist approach for manual changes is appropriate.


81-133: LGTM! Comprehensive automated renewal detection.

The function properly distinguishes automated billing renewals from manual changes through careful validation of changed attributes. The nested items validation at lines 107-124 correctly ensures that only period dates changed, not pricing or quantities.


141-157: LGTM! Clean card-update detection.

The function correctly identifies payment method updates by checking for a single change to default_payment_method with no other modifications.

web/apps/dashboard/app/api/webhooks/stripe/route.ts (5)

54-69: LGTM! Robust request validation.

The defensive error handling for request body reading and webhook signature validation properly guards against malformed requests while providing clear error context.


258-270: LGTM! Excellent structured error logging.

The enhanced error context with eventId, eventType, and proper Error serialization (lines 260-265) significantly improves debuggability. This pattern is consistently applied throughout the webhook handlers.


323-340: LGTM! Graceful degradation for customer retrieval.

The nested try-catch at lines 324-339 properly handles customer retrieval failures. Continuing without sending an alert (rather than failing the entire webhook) is the right choice since the database updates have already succeeded.


478-575: Verify error response strategy for webhook reliability.

The invoice.payment_failed handler implements defensive error handling with graceful degradation. However, the error response strategy raises questions:

  1. 400 vs 200 responses: Line 485 returns 400 for invalid invoice data, which triggers Stripe retries. Lines 513, 552, and 573 return 200 even on errors to prevent retries. Consider whether all non-critical errors should return 200 for consistency.

  2. No database state updates: The handler only sends Slack alerts without recording payment failures. Should payment failure events be persisted for audit trails or analytics?

  3. Silent failure on errors: Line 573 returns 200 even when the outer catch block is triggered, which could mask unexpected errors. Consider logging these at a higher severity level (e.g., with alerting to engineering) to ensure visibility into webhook processing issues.

Consider whether this approach aligns with your operational requirements for webhook processing and error visibility.


619-633: Recovery detection trade-offs: performance and false negatives.

The payment recovery detection logic has two trade-offs to consider:

  1. Performance: Line 623 calls await isPaymentRecovery(stripe, event) for every successful payment, which may involve Stripe API calls to check recent payment history. This could be expensive during high payment volume and impact webhook response times.

  2. False negatives: Lines 631-632 default to isRecovery = false if detection fails, which prevents false positive alerts but could miss genuine recoveries when the detection fails due to transient errors or rate limits.

Consider:

  • Monitoring the latency of the isPaymentRecovery calls to ensure they don't approach Stripe's webhook timeout threshold
  • Whether missed recovery alerts (false negatives) are acceptable vs the alternative of potential false positives
  • Adding metrics to track detection failures so you have visibility into the false negative rate
Optional: Add timeout protection
     let isRecovery = false;

     try {
-      // Use recovery detection logic to determine if success follows failure
-      isRecovery = await isPaymentRecovery(stripe, event);
+      // Use recovery detection logic with timeout protection
+      const recoveryPromise = isPaymentRecovery(stripe, event);
+      const timeoutPromise = new Promise((resolve) => setTimeout(() => resolve(false), 3000));
+      isRecovery = await Promise.race([recoveryPromise, timeoutPromise]) as boolean;
     } catch (recoveryError) {

This ensures recovery detection doesn't delay webhook responses beyond 3 seconds.

Copy link
Collaborator

@MichaelUnkey MichaelUnkey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
web/apps/dashboard/app/api/webhooks/stripe/route.ts (2)

91-111: Consider using status 200 instead of 201 for no-op scenarios.

Lines 91, 97, 104, and 110 return HTTP 201 (Created) for skip/no-op scenarios. Since no resource is created in these cases, status 200 (OK) would be more semantically appropriate per HTTP conventions.

♻️ Proposed status code adjustment
     // Skip database updates and notifications for automated billing renewals
     if (isAutomatedBillingRenewal(sub, previousAttributes)) {
-      return new Response("OK", { status: 201 });
+      return new Response("OK", { status: 200 });
     }

     // Skip database updates and notifications for payment failure related updates
     // Payment failures are handled by the invoice.payment_failed webhook
     if (isPaymentFailureRelatedUpdate(sub, previousAttributes)) {
-      return new Response("OK", { status: 201 });
+      return new Response("OK", { status: 200 });
     }

     // Skip database updates and notifications for payment recovery scenarios
     // Payment recoveries are handled by the invoice.payment_succeeded webhook
     const isRecovery = await isPaymentRecoveryUpdate(stripe, sub, previousAttributes, event);
     if (isRecovery) {
-      return new Response("OK", { status: 201 });
+      return new Response("OK", { status: 200 });
     }

     // Skip database updates and notifications for card/payment method updates only
     // These don't affect subscription pricing, quotas, or other business logic
     if (isCardUpdateOnly(sub, previousAttributes)) {
-      return new Response("OK", { status: 201 });
+      return new Response("OK", { status: 200 });
     }

478-575: Stripe webhook best practices confirmed — refactor to return 2xx immediately and defer alert work asynchronously.

Your error handling strategy (returning 200 to prevent Stripe retries) is correct per Stripe's documented best practices. Negative amounts in invoice.payment_failed events are also expected and normal; they represent credits or refunds from prior payment attempts.

However, refactor to return the 200 response immediately rather than awaiting the alert. Per Stripe's guidance, webhooks should "acknowledge quickly (return a 2xx) and do heavy work asynchronously." Currently, the handler awaits alertPaymentFailed (line 543) before returning, which delays the acknowledgment.

Move the alert sending to a background job queue or async task to return 200 immediately:

try {
  // Acknowledge webhook immediately
  const response = new Response("OK", { status: 200 });
  
  // Queue alert for async processing (background job, queue, etc.)
  // Don't await here
  queuePaymentFailureAlert(
    (customer as Stripe.Customer).email,
    (customer as Stripe.Customer).name || "Unknown",
    amount,
    currency,
  );
  
  return response;
} catch (error) {
  // Log and return 200 immediately
  return new Response("OK", { status: 200 });
}

This keeps webhook reliability while following Stripe's optimal async pattern.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ee2f85 and 0093a7a.

📒 Files selected for processing (2)
  • web/apps/dashboard/app/api/webhooks/stripe/route.ts
  • web/apps/dashboard/lib/stripe/subscriptionUtils.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • web/apps/dashboard/lib/stripe/subscriptionUtils.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-08T15:33:04.290Z
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 2120
File: apps/dashboard/app/(app)/settings/billing/stripe/success/page.tsx:19-19
Timestamp: 2024-10-08T15:33:04.290Z
Learning: In the `StripeSuccess` function, TypeScript's type-checking of the `new_plan` parameter ensures that only "free", "pro", or undefined values are accepted, so additional runtime validation is not necessary.

Applied to files:

  • web/apps/dashboard/app/api/webhooks/stripe/route.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test Dashboard / Test Dashboard
  • GitHub Check: autofix
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (3)
web/apps/dashboard/app/api/webhooks/stripe/route.ts (3)

54-69: LGTM: Robust webhook validation.

The explicit separation of body parsing and signature validation with dedicated error handling improves resilience and debugging. Returning 400 for client-side validation failures is semantically correct.


577-691: LGTM: Solid payment success handler with recovery detection.

The handler correctly:

  • Validates invoice and customer data with appropriate guards
  • Uses isPaymentRecovery to detect recovery scenarios
  • Defaults to false on detection failures to avoid false positives (lines 631-632)
  • Treats alert failures as non-fatal to prevent unnecessary retries

The trade-off of defaulting to non-recovery on detection errors is reasonable and well-documented.


80-84: Excellent: Context-rich error logging throughout.

The consistent addition of event IDs, subscription IDs, invoice IDs, and structured error details across all error handlers significantly improves debuggability and incident response capabilities.

Also applies to: 236-241, 258-270, 333-337, 393-396, 462-474

Copy link
Collaborator

@mcstepp mcstepp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@perkinsjr perkinsjr merged commit e3222c1 into main Jan 8, 2026
20 checks passed
@perkinsjr perkinsjr deleted the stripe-webhooks-improvement-v2 branch January 8, 2026 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants