batmn-dev · batmn-dev · Mar 2, 2026 · Feb 27, 2026 · Mar 2, 2026 · Mar 2, 2026
diff --git a/.agents/plans/payment-status-replay-hardening-multi-agent.md b/.agents/plans/payment-status-replay-hardening-multi-agent.md
diff --git a/.agents/troubleshooting/payment-guardrail-rollout.md b/.agents/troubleshooting/payment-guardrail-rollout.md
@@ -0,0 +1,197 @@
+# Payment Guardrail Rollout Runbook
+
+## Purpose
+
+Use this runbook when operating the payment status + replay hardening feature flags, diagnosing guardrail-related incidents, or evaluating readiness for enforcement escalation.
+
+This guide covers:
+- rollout order and feature flag controls
+- symptom patterns and log tags
+- likely root causes and remediation
+- decision gates for enforcement escalation
+
+## Feature Flags
+
+| Flag | Values | Purpose |
+|------|--------|---------|
+| `PAYMENT_STATUS_GUARDRAILS_V1` | `"1"` / `"true"` to enable | Master switch for intent classification + tool policy |
+| `PAYMENT_CHAT_STATE_V1` | `"1"` / `"true"` to enable | Canonical `chatToolState` ledger reads/writes |
+| `PAYMENT_GUARDRAIL_MODE` | `"observe"` (default) / `"enforce"` | Observe logs only; enforce removes denied tools |
+| `PAYMENT_CHAT_STATE_BACKFILL_V1` | `"1"` / `"true"` to enable | Legacy chat lazy backfill from `toolCallLog` |
+
+## Rollout Order
+
+### Stage 1: Observe-Only (Low Risk)
+
+Enable logging without blocking any tool calls.
+
+```env
+PAYMENT_STATUS_GUARDRAILS_V1=1
+PAYMENT_CHAT_STATE_V1=1
+PAYMENT_GUARDRAIL_MODE=observe
+PAYMENT_CHAT_STATE_BACKFILL_V1=0
+```
+
+Monitor for 48+ hours. Confirm:
+- [ ] `payment_intent_classified` logs appear with expected intent distributions.
+- [ ] `payment_policy_decision` logs show correct allow/deny reasoning.
+- [ ] No unexpected errors in chat streaming.
+- [ ] `chatToolState` rows are created for new payment tool calls.
+
+### Stage 2: Observe + Backfill (Medium Risk)
+
+Enable lazy backfill for legacy chats.
+
+```env
+PAYMENT_CHAT_STATE_BACKFILL_V1=1
+```
+
+Monitor for 24+ hours. Confirm:
+- [ ] Backfill creates state rows for legacy chats on first payment interaction.
+- [ ] Backfill does not create state that incorrectly enables `pay_purchase`.
+- [ ] No increase in error rates for legacy chats.
+
+### Stage 3: Enforce for Canary (High Risk — Gate B)
+
+Enable hard guardrails for a canary cohort.
+
+```env
+PAYMENT_GUARDRAIL_MODE=enforce
+```
+
+Deploy to canary/staging first. Confirm:
+- [ ] `pay_purchase` is removed from tool list when intent is `status_check`.
+- [ ] `pay_purchase` remains available when intent is `new_purchase` and no active job exists.
+- [ ] Users can still complete legitimate purchase flows.
+- [ ] `payment_guardrail_block` rate is within expected bounds.
+
+### Stage 4: Global Enforcement (Gate B + Gate C)
+
+Roll out globally after stable canary metrics.
+
+**Gate B — Hard enforcement**: Canary `payment_guardrail_block` rate is stable, no false-positive reports.
+**Gate C — Backfill confidence**: Backfill failure rate < 1% of legacy chat payment interactions.
+
+## Log Tags to Filter
+
+Search logs by tag or key:
+
+- `payment_intent_classified`
+  - Emitted after intent classification for every chat request with payment tools.
+  - Key fields: `intentClass`, `hasActiveJob`, `latestStatus`, `isTerminal`.
+- `payment_policy_decision`
+  - Emitted after policy override computation.
+  - Key fields: `denyTools`, `allowTools`, `reason`, `mode` (`observe` / `enforce`).
+- `payment_guardrail_block`
+  - Emitted when enforce mode removes a tool from the request.
+  - Key fields: `toolName`, `intentClass`, `reason`.
+- `payment_state_upsert`
+  - Emitted when `chatToolState` is created or updated.
+  - Key fields: `chatId`, `chatVersion`, `activePurchaseJobId`, `latestStatus`.
+- `payment_backfill_attempt`
+  - Emitted when lazy backfill runs for a legacy chat.
+  - Key fields: `chatId`, `toolCallLogCount`, `success`.
+- `payment_purchase_blocked_execute`
+  - Emitted by defense-in-depth `isPurchaseBlocked` callback at tool execution time.
+  - Key fields: `chatId`, `toolCallId`.
+
+## Symptom -> Likely Root Cause -> Remediation
+
+### 1) User cannot initiate a new purchase (false positive block)
+
+Symptoms:
+- User reports `pay_purchase` tool is unavailable.
+- `payment_guardrail_block` logs show `reason: "active_job_exists"` or `reason: "status_check_intent"`.
+
+Likely root causes:
+- `chatToolState` has a stale `activePurchaseJobId` from a previous (completed) purchase that was not marked terminal.
+- Intent classifier misclassified `new_purchase` as `status_check` due to keyword overlap.
+
+Remediation:
+1. Check `chatToolState` for the affected `chatId`: is `latestStatusIsTerminal` set correctly?
+2. If stale state, user can start a new chat or edit a message to trigger `truncateFromVersion`.
+3. If intent misclassification, capture the user message text and add a regression test case.
+
+Rollback: Set `PAYMENT_GUARDRAIL_MODE=observe` to stop blocking while investigating.
+
+### 2) Duplicate purchase triggered on edit/resend
+
+Symptoms:
+- User edits a message and a new `pay_purchase` call fires when only status was intended.
+- `payment_intent_classified` shows `intentClass: "new_purchase"` for what should have been a status check.
+
+Likely root causes:
+- `chatToolState` was truncated by the edit (correct behavior) but the new message didn't contain status keywords.
+- Backfill failed silently, so the edit started with no state context.
+
+Remediation:
+1. Check `payment_backfill_attempt` logs for the chat.
+2. Verify `chatVersion` in the request body matches expected message count.
+3. If backfill gap, investigate `toolCallLog` entries for the chat.
+
+Rollback: Set `PAYMENT_STATUS_GUARDRAILS_V1=0` to fully disable.
+
+### 3) Replay continuity summary missing after model switch
+
+Symptoms:
+- After switching providers, the model has no context about prior payment state.
+- `replay_compile_stage` shows platform tool blocks were dropped without continuity text.
+
+Likely root causes:
+- `platformToolContext` was not populated during normalization (tool name not in `PLATFORM_PAYMENT_TOOLS` set).
+- Compiler `synthesizePlatformToolFallback` path not reached.
+
+Remediation:
+1. Check `replay_normalize_stage` for `tool_non_replayable` warnings on payment tools.
+2. Verify the tool name in `toolCallLog` matches an entry in `PLATFORM_PAYMENT_TOOLS`.
+3. Add missing tool name variant and expand matrix test coverage.
+
+### 4) High backfill failure rate
+
+Symptoms:
+- `payment_backfill_attempt` logs show `success: false` for many legacy chats.
+- Legacy chats don't get guardrail protection.
+
+Likely root causes:
+- `toolCallLog.outputPreview` doesn't contain parseable JSON for `jobId` extraction.
+- Tool names in `toolCallLog` don't match expected payment tool name patterns.
+
+Remediation:
+1. Sample failed backfill chats and inspect their `toolCallLog` entries.
+2. If `outputPreview` format changed, update the backfill parser.
+3. If tool names diverged, expand the `BACKFILL_TOOL_NAMES` set in `chatToolStateBackfill.ts`.
+
+Rollback: Set `PAYMENT_CHAT_STATE_BACKFILL_V1=0` to disable backfill only.
+
+## SLO Monitors
+
+| Metric | Source Tag | Threshold | Action |
+|--------|-----------|-----------|--------|
+| Guardrail block rate | `payment_guardrail_block` | < 5% of payment-tool requests | Investigate false positives if exceeded |
+| Replay compile fallback rate | `replay_compile_fallback_activated` (payment tools) | < 2% of cross-provider replays | Verify continuity summaries are generating |
+| Status-intent tool mismatch | `payment_policy_decision` where `intentClass` != expected | < 1% of classified requests | Add regression test cases |
+| Backfill failure rate | `payment_backfill_attempt` with `success: false` | < 1% of legacy chat interactions | Expand parser or disable backfill |
+| Defense-in-depth block rate | `payment_purchase_blocked_execute` | Should be ~0 in enforce mode | If nonzero, primary policy layer has a gap |
+
+## On-Call Checklist
+
+- [ ] Confirm incident scope (affected chat IDs, user IDs, first seen time).
+- [ ] Check current feature flag values in environment.
+- [ ] Filter logs for `payment_intent_classified` and `payment_policy_decision` for affected chats.
+- [ ] Check `chatToolState` records for stale or missing state.
+- [ ] If false positive blocks: set `PAYMENT_GUARDRAIL_MODE=observe` immediately.
+- [ ] If duplicate purchases: set `PAYMENT_STATUS_GUARDRAILS_V1=0` and investigate.
+- [ ] Verify `chatVersion` values in request bodies match expected message counts.
+- [ ] Check `payment_backfill_attempt` logs if legacy chats are affected.
+- [ ] Validate replay continuity is working via `replay_compile_stage` logs.
+- [ ] Open follow-up for any classifier or backfill parser improvements needed.
+
+## Escalation Notes
+
+Include in incident handoff:
+- affected `chatId` values and user scope
+- current feature flag configuration
+- top log tags and counts
+- whether observe or enforce mode was active
+- `chatToolState` snapshots for affected chats (redact user PII)
+- sanitized intent classification inputs (never include raw API keys or full payloads)
diff --git a/app/api/chat/intent/__tests__/payment-intent.test.ts b/app/api/chat/intent/__tests__/payment-intent.test.ts
@@ -0,0 +1,88 @@
+import { describe, test, expect } from "vitest"
+import {
+  classifyPaymentIntent,
+  type PaymentIntentContext,
+} from "../payment-intent"
+
+describe("classifyPaymentIntent", () => {
+  const baseContext: PaymentIntentContext = {
+    userMessage: "",
+    hasActiveJob: false,
+    hasAnyJob: false,
+  }
+
+  test("status keywords with active job -> status_check", () => {
+    const result = classifyPaymentIntent({
+      ...baseContext,
+      userMessage: "What's the status of my order?",
+      hasActiveJob: true,
+      hasAnyJob: true,
+    })
+    expect(result.intent).toBe("status_check")
+    expect(result.confidence).toBe("high")
+  })
+
+  test("purchase keywords without active job -> new_purchase", () => {
+    const result = classifyPaymentIntent({
+      ...baseContext,
+      userMessage: "I want to buy the ergonomic mouse",
+      hasActiveJob: false,
+      hasAnyJob: false,
+    })
+    expect(result.intent).toBe("new_purchase")
+    expect(result.confidence).toBe("high")
+  })
+
+  test("purchase keywords WITH active job -> status_check (safety-first)", () => {
+    const result = classifyPaymentIntent({
+      ...baseContext,
+      userMessage: "Buy me another one",
+      hasActiveJob: true,
+      hasAnyJob: true,
+    })
+    // Safety-first: reclassified to status_check when active job exists
+    expect(result.intent).toBe("status_check")
+  })
+
+  test("ambiguous message with active job -> unknown", () => {
+    const result = classifyPaymentIntent({
+      ...baseContext,
+      userMessage: "Hello, can you help me?",
+      hasActiveJob: true,
+      hasAnyJob: true,
+    })
+    expect(result.intent).toBe("unknown")
+  })
+
+  test("status keywords without any job -> unknown", () => {
+    // Use a message with only status keywords (no purchase keywords like "order")
+    const result = classifyPaymentIntent({
+      ...baseContext,
+      userMessage: "Where is my delivery?",
+      hasActiveJob: false,
+      hasAnyJob: false,
+    })
+    expect(result.intent).toBe("unknown")
+    expect(result.confidence).toBe("low")
+  })
+
+  test("both status and purchase keywords with active job -> status_check", () => {
+    const result = classifyPaymentIntent({
+      ...baseContext,
+      userMessage:
+        "Buy me a new one and check the status of the previous order",
+      hasActiveJob: true,
+      hasAnyJob: true,
+    })
+    expect(result.intent).toBe("status_check")
+  })
+
+  test("no matching keywords -> unknown with low confidence", () => {
+    const result = classifyPaymentIntent({
+      ...baseContext,
+      userMessage: "Tell me about the weather today",
+    })
+    expect(result.intent).toBe("unknown")
+    expect(result.confidence).toBe("low")
+  })
+})