Skip to content

Commit 0e904b1

Browse files
committed
feat: enhance agent stop node, projector, and diagnostic tooling
- Update skills documentation for backend debugging and webapp testing - Improve agent machine factory and stop node implementation - Enhance graph projector functionality - Update frontend app-info page - Add diagnostic scripts for agent state and run inspection - Document BUG-010 regressions with RCA - Add FR-021 for stop node metrics validation
1 parent 0509a89 commit 0e904b1

File tree

16 files changed

+1321
-45
lines changed

16 files changed

+1321
-45
lines changed

.claude-skills/backend-debugging_skill/SKILL.md

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,3 +82,201 @@ encore test
8282
- Never use `console.log` (use `encore.dev/log`)
8383
- Always include structured context
8484

85+
---
86+
87+
## BUG-010 Case Study: Advanced Debugging Techniques
88+
89+
### Diagnostic Scripts Arsenal
90+
91+
Created during BUG-010 investigation (all in `backend/scripts/`):
92+
93+
1. **`inspect-run.ts`** - Complete run event timeline
94+
```bash
95+
bunx tsx backend/scripts/inspect-run.ts <runId>
96+
# Shows: events, graph outcomes, cursor state, run record
97+
```
98+
99+
2. **`check-agent-state.ts`** - Agent state snapshots
100+
```bash
101+
bunx tsx backend/scripts/check-agent-state.ts <runId>
102+
# Shows: nodeName, status, counters, budgets, timestamps
103+
```
104+
105+
3. **`check-cursor-ordering.ts`** - Projector cursor health
106+
```bash
107+
bunx tsx backend/scripts/check-cursor-ordering.ts
108+
# Reveals: cursor limit issues, stuck cursors, ordering problems
109+
```
110+
111+
4. **`find-completed-runs.ts`** / **`find-latest-run.ts`**
112+
```bash
113+
bunx tsx backend/scripts/find-completed-runs.ts # Successful runs
114+
bunx tsx backend/scripts/find-latest-run.ts # Recent runs (any status)
115+
```
116+
117+
5. **`test-projector.ts`** - Isolated projector testing
118+
```bash
119+
bunx tsx backend/scripts/test-projector.ts <runId>
120+
# Tests: cursor hydration, event fetch, screen projection
121+
```
122+
123+
### Git Forensics for Regressions
124+
125+
**Timeline Method:**
126+
```bash
127+
# 1. Find last successful run
128+
bunx tsx backend/scripts/find-completed-runs.ts
129+
# Example: 01K9G8YXY6MG7J7875A5AM9Z4H at 2025-11-07 17:03
130+
131+
# 2. Find first failed run
132+
bunx tsx backend/scripts/find-latest-run.ts
133+
# Example: 01K9GDQF9JQFM8A4Q5WGMARPAT at 2025-11-07 18:26
134+
135+
# 3. Identify commits in regression window
136+
git log --oneline --since="Nov 7 17:00" --until="Nov 7 19:00"
137+
138+
# 4. Examine suspect commits
139+
git show <commit_hash> --stat # Files changed
140+
git show <commit_hash> <file_path> # Detailed diff
141+
git show <commit_hash>~1:<file_path> # Before version
142+
```
143+
144+
**Binary Search Method:**
145+
```bash
146+
git bisect start
147+
git bisect bad HEAD # Current broken state
148+
git bisect good <last_known_good_commit> # From timeline
149+
# Test each commit automatically until culprit found
150+
git bisect reset # Exit bisect mode
151+
```
152+
153+
### Database Query Analysis
154+
155+
**Stop Node Hang (BUG-010 Example):**
156+
```typescript
157+
// PROBLEM: Query inside node execution blocks XState machine
158+
const rows = await db.query`SELECT COUNT(*) FROM graph_persistence_outcomes WHERE run_id = ${runId}`;
159+
160+
// SYMPTOMS:
161+
// - Worker times out after 30s lease
162+
// - Agent state shows "running" but stuck
163+
// - No "agent.node.finished" event emitted
164+
165+
// DIAGNOSIS:
166+
// 1. Check worker lease timeout logs
167+
// 2. Inspect agent state (last snapshot shows incomplete node)
168+
// 3. Test query in isolation (encore exec bunx tsx test-query.ts)
169+
// 4. Profile query execution time
170+
171+
// FIX:
172+
// Move heavy queries OUTSIDE critical execution path
173+
// Use lightweight operations in terminal nodes
174+
```
175+
176+
### Cursor Limit Investigation
177+
178+
**Projector Stalling Pattern:**
179+
```typescript
180+
// SYMPTOM: Recent runs never get graph_persistence_outcomes
181+
// CHECK: backend/graph/projector.ts
182+
const CURSOR_LIMIT = 50; // ❌ Only processes 50 oldest cursors
183+
184+
// DIAGNOSIS:
185+
bunx tsx backend/scripts/check-cursor-ordering.ts
186+
// Output: 75 total cursors, positions 51-75 never processed
187+
188+
// VALIDATION:
189+
SELECT COUNT(*) FROM graph_projection_cursors; -- Shows 75
190+
SELECT * FROM graph_projection_cursors ORDER BY updated_at ASC LIMIT 50; -- Top 50
191+
SELECT * FROM graph_projection_cursors ORDER BY updated_at DESC LIMIT 10; -- Recent (excluded)
192+
193+
// FIX:
194+
const CURSOR_LIMIT = 200; // Scale with concurrent runs
195+
```
196+
197+
### Worker State Inspection
198+
199+
**Understanding Worker Lifecycle:**
200+
```bash
201+
# 1. Check run claim status
202+
SELECT processing_by, processing_started_at FROM runs WHERE run_id = '<runId>';
203+
204+
# 2. Verify lease heartbeat
205+
# Watch Encore logs for "extending lease" messages
206+
207+
# 3. Inspect final disposition
208+
SELECT status, stop_reason FROM runs WHERE run_id = '<runId>';
209+
# status=failed indicates worker crash/timeout before Stop node
210+
```
211+
212+
### Phase 11: Advanced Regression Analysis (NEW)
213+
214+
When standard phases 1-10 don't reveal the issue:
215+
216+
1. **Compare successful vs failed run events side-by-side**
217+
```bash
218+
diff <(bunx tsx backend/scripts/inspect-run.ts <good_run>) \
219+
<(bunx tsx backend/scripts/inspect-run.ts <bad_run>)
220+
```
221+
222+
2. **Identify missing events in sequence**
223+
- Successful run: 19 events (includes Stop at step 6)
224+
- Failed run: 15 events (stops at WaitIdle step 5)
225+
- Missing: `agent.node.started Stop`, `agent.run.finished`
226+
227+
3. **Trace XState machine transitions**
228+
- Add logging to guards and actions in `agent.machine.factory.ts`
229+
- Monitor which guards evaluate true/false
230+
- Identify unexpected state transitions
231+
232+
4. **Test node execution in isolation**
233+
```typescript
234+
// scripts/test-node-isolation.ts
235+
import { stop } from "../agent/nodes/terminal/Stop/node";
236+
const input = { /* build input from failed run state */ };
237+
const result = await stop(input);
238+
console.log("Node output:", result);
239+
```
240+
241+
### Common Backend Regression Patterns
242+
243+
| Issue | Symptom | Investigation | Common Cause |
244+
|-------|---------|---------------|--------------|
245+
| **Cursor Limit** | Recent runs stuck at seq=1 | `check-cursor-ordering.ts` | `CURSOR_LIMIT` too low |
246+
| **Node Hangs** | Agent state "running" indefinitely | `check-agent-state.ts` | DB query blocks execution |
247+
| **Lease Timeout** | Run fails after 30s | Worker logs, database `processing_by` | Heavy sync operations |
248+
| **Missing Events** | Timeline incomplete | `inspect-run.ts`, compare with baseline | Event not emitted or lost |
249+
| **State Machine Stuck** | No transitions after event | XState logs, guard evaluation | Guard logic error |
250+
251+
### Lesson: Avoid Heavy Operations in Critical Path
252+
253+
**Bad Pattern (BUG-010):**
254+
```typescript
255+
export async function stop(input: StopInput) {
256+
// ❌ DB query inside terminal node execution
257+
const rows = await db.query`SELECT COUNT(*) ...`;
258+
// If query hangs, entire machine stalls
259+
}
260+
```
261+
262+
**Good Pattern:**
263+
```typescript
264+
export async function stop(input: StopInput) {
265+
// ✅ Use pre-computed metrics from input
266+
const metrics = input.finalRunMetrics;
267+
// Terminal nodes must be lightweight and deterministic
268+
}
269+
```
270+
271+
**Rationale:**
272+
- Terminal nodes finalize run state → must complete reliably
273+
- Heavy queries → post-run analytics layer
274+
- Critical path → optimized for latency, not accuracy
275+
276+
---
277+
278+
## References
279+
- BUG-010 RCA: `jira/bugs/BUG-010-run-page-regressions/RCA.md`
280+
- Diagnostic Scripts: `backend/scripts/`
281+
- Encore Debugging: `backend_coding_rules.mdc`
282+

.claude-skills/webapp-testing_skill/SKILL.md

Lines changed: 100 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -229,9 +229,107 @@ Reusable helpers live in `./lib/playwright-helpers.ts` (launch settings, safe cl
229229

230230
---
231231

232-
## 5. References
232+
## 5. Regression Debugging Playbook (BUG-010 Case Study)
233+
234+
### Systematic RCA for UI Regressions
235+
236+
**Real Example:** Three regressions on `/run` page (Nov 2025)
237+
- Graph events missing
238+
- Screenshots not visible
239+
- Stop node not executing
240+
241+
**Investigation Flow:**
242+
243+
1. **Visual Comparison**
244+
```bash
245+
# Capture current state
246+
browser_take_screenshot({ fullPage: true, filename: "current-state.png" })
247+
248+
# Compare with baseline
249+
# .playwright-mcp/drift-detection-with-screenshot.png (working)
250+
# vs current broken state
251+
```
252+
253+
2. **Browser MCP Diagnostics**
254+
```text
255+
browser_navigate("http://localhost:5173")
256+
browser_click("Detect My First Drift")
257+
browser_snapshot() # Check UI tree for missing elements
258+
browser_console_messages() # Catch JS errors
259+
browser_network_requests() # Verify SSE streams
260+
```
261+
262+
3. **Timeline Forensics**
263+
```bash
264+
# Find last successful run
265+
bunx tsx backend/scripts/find-completed-runs.ts
266+
267+
# Compare with failed run
268+
bunx tsx backend/scripts/inspect-run.ts <run_id>
269+
270+
# Look for missing events (e.g., Stop node at step 6)
271+
```
272+
273+
4. **Git Bisect**
274+
```bash
275+
# Identify regression window
276+
git log --oneline --since="<last_success_time>" --until="<first_failure_time>"
277+
278+
# Examine suspect commits
279+
git show <commit_hash> --stat
280+
git show <commit_hash> <specific_file>
281+
```
282+
283+
5. **Backend State Inspection**
284+
```bash
285+
# Check agent state
286+
bunx tsx backend/scripts/check-agent-state.ts <run_id>
287+
288+
# Verify graph projector cursor
289+
bunx tsx backend/scripts/check-cursor-ordering.ts
290+
291+
# Test projector functions in isolation
292+
bunx tsx backend/scripts/test-projector.ts <run_id>
293+
```
294+
295+
6. **Root Cause Validation**
296+
- Remove suspect code changes
297+
- Restart services
298+
- Run fresh test
299+
- Compare events sequence with baseline
300+
301+
### Key Diagnostic Scripts Created
302+
- `backend/scripts/inspect-run.ts` - Full run event timeline
303+
- `backend/scripts/check-agent-state.ts` - Agent state snapshots
304+
- `backend/scripts/check-cursor-ordering.ts` - Projector cursor health
305+
- `backend/scripts/find-completed-runs.ts` - Identify successful runs for comparison
306+
- `backend/scripts/test-projector.ts` - Isolated projector function testing
307+
308+
### Common Regression Patterns
309+
| Symptom | Check | Common Cause |
310+
|---------|-------|--------------|
311+
| Graph events missing | Cursor limit, projector logs | `CURSOR_LIMIT` too low, cursor stuck |
312+
| Screenshots not rendering | dataUrl in stream, CORS | Missing field in projection output |
313+
| Stop node not executing | Agent state, XState logs | Node execution error, budget exhaustion |
314+
| Run fails prematurely | Worker logs, lease timeout | Database query hangs, lease expired |
315+
316+
### Evidence Collection Checklist
317+
- [ ] Screenshot comparison (baseline vs current)
318+
- [ ] Browser console logs
319+
- [ ] Network tab (SSE streams)
320+
- [ ] Backend logs (Encore dashboard)
321+
- [ ] Database state (run_events, outcomes, cursors)
322+
- [ ] Git diff of regression window
323+
- [ ] Agent state snapshots
324+
325+
**See:** `jira/bugs/BUG-010-run-page-regressions/RCA.md` for complete case study
326+
327+
---
328+
329+
## 6. References
233330
- Playwright Docs: https://playwright.dev/docs/intro
234331
- Encore/Svelte debugging: see `backend_coding_rules.mdc` and `frontend_engineer.mdc`
235332
- Automation commands: `.cursor/commands/start-services`, `.cursor/commands/run-default-test`, `task founder:rules:check`
333+
- BUG-010 RCA: `jira/bugs/BUG-010-run-page-regressions/RCA.md`
236334

237-
Use this playbook whenever you need reproducible UI testing. Playwright gives you deterministic coverage; Cursors tools remain on standby for exploratory analysis.
335+
Use this playbook whenever you need reproducible UI testing. Playwright gives you deterministic coverage; Cursor's tools remain on standby for exploratory analysis.

backend/agent/engine/xstate/agent.machine.factory.ts

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,11 +77,17 @@ export class AgentMachineFactory {
7777
clearPendingStop: assign(() => ({ pendingStop: null } satisfies Partial<AgentMachineContext>)),
7878

7979
// Stores the execution result and updates machine context with new state
80-
storeExecutionResult: assign(({ event }) => {
80+
storeExecutionResult: assign(({ event, context }) => {
8181
const output = "output" in event ? (event.output as RunNodeActorOutput | undefined) : undefined;
8282
if (!output) {
83+
dependencies.logger.warn("storeExecutionResult: no output in event", { event });
8384
return {} satisfies Partial<AgentMachineContext>;
8485
}
86+
dependencies.logger.info("storeExecutionResult", {
87+
nodeName: output.execution.nodeName,
88+
decision: output.decision.kind,
89+
nextNode: output.decision.kind === "advance" ? output.decision.nextNode : null,
90+
});
8591
return {
8692
agentState: output.nextState,
8793
latestExecution: output.execution,
@@ -321,6 +327,8 @@ export class AgentMachineFactory {
321327
nodeName: executionResult.execution.nodeName,
322328
outcome: executionResult.execution.outcome,
323329
decision: decision.kind,
330+
nextNode: decision.kind === "advance" ? decision.nextNode : null,
331+
budgetExhausted: budgetStopReason !== null,
324332
});
325333

326334
return {

backend/agent/nodes/terminal/Stop/node.ts

Lines changed: 3 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -51,35 +51,15 @@ export async function stop(
5151
}
5252
});
5353

54-
// Query actual discovered screens from graph_persistence_outcomes
55-
const discoveredScreensRows = await db.query<{ count: number }>`
56-
SELECT COUNT(DISTINCT screen_id) as count
57-
FROM graph_persistence_outcomes
58-
WHERE run_id = ${input.runId}
59-
AND outcome_kind = 'discovered'
60-
`;
61-
62-
let actualDiscoveredScreens = 0;
63-
for await (const row of discoveredScreensRows) {
64-
actualDiscoveredScreens = row.count;
65-
}
54+
// Use metrics from input (DB query removed to fix regression)
55+
const correctedMetrics = input.finalRunMetrics;
6656

6757
logger.info("Stop node details", {
68-
actualDiscoveredScreens,
69-
reportedScreens: input.finalRunMetrics.uniqueScreensDiscoveredCount,
70-
totalIterationsExecuted: input.finalRunMetrics.totalIterationsExecuted,
71-
uniqueActionsPersistedCount: input.finalRunMetrics.uniqueActionsPersistedCount,
72-
runDurationInMilliseconds: input.finalRunMetrics.runDurationInMilliseconds,
58+
metrics: correctedMetrics,
7359
stepOrdinal: input.stepOrdinal,
7460
iterationOrdinalNumber: input.iterationOrdinalNumber,
7561
});
7662

77-
// Override the counter with actual database count
78-
const correctedMetrics = {
79-
...input.finalRunMetrics,
80-
uniqueScreensDiscoveredCount: actualDiscoveredScreens,
81-
};
82-
8363
const output: StopOutput = {
8464
runId: input.runId,
8565
confirmedTerminalDisposition: input.intendedTerminalDisposition,

backend/graph/projector.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ import type {
1616
} from "./types";
1717

1818
const POLL_INTERVAL_MS = 300;
19-
const CURSOR_LIMIT = 50;
19+
const CURSOR_LIMIT = 200; // Increased from 50 to handle more concurrent runs
2020
const HYDRATE_LIMIT = 20;
2121
const EVENT_BATCH_SIZE = 100;
2222

0 commit comments

Comments
 (0)