fix(lifecycle): implement stuck detection using agent-stuck threshold#376
Conversation
The agent-stuck reaction config supported a threshold field (e.g. "10m"), but determineStatus() never returned "stuck" — there was no code path that consumed the threshold or transitioned sessions based on idle time. Sessions would stay parked at pr_open/working forever even when the agent had been idle for hours. Added idle-time check in determineStatus(): when getActivityState() reports "idle" or "blocked" with a timestamp, compare the idle duration against the agent-stuck.threshold config. If exceeded, return "stuck" so the reaction system can fire notifications. Also removed the priority !== "info" guard on transition notifications, so all priority levels (including info) are routed through notificationRouting. This lets the config control which notifiers receive each priority level, rather than silently dropping info-level transition events.
The original stuck check in step 2 (before PR checks) can be bypassed when getActivityState() returns null (session file not found, cache miss, I/O failure). When this happens, the code falls through to the PR path which returns 'pr_open' without ever checking idle duration. Fix: extract isIdleBeyondThreshold() helper and call it in three places: 1. Step 2: before PR checks (fast path, catches most cases) 2. Step 4b: after PR checks return 'pr_open' (safety net) 3. Step 5: after all checks, for agents that finish without a PR This ensures stuck detection fires even when the JSONL activity detection fails to return idle state. Sessions can no longer get permanently stuck at 'pr_open' when the agent has been idle beyond the threshold. Also removes the debug console.error calls from the previous commit.
…diness PRs with no required reviewers never reached 'mergeable' status because getReviewDecision returned 'none', which was not handled. The lifecycle poll fell through to 'review_pending' or the default, so merge.ready never fired and the approved-and-green reaction never triggered. Also: skip stuck short-circuit when session has an open PR so merge readiness checks in step 4 can still run. Without this, idle agents with open PRs get stuck status and never transition to mergeable. Closes composio#0 (internal fix)
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Stuck threshold lookup skips global config when project overrides exist
- isIdleBeyondThreshold now reads the merged per-session reaction via getReactionConfigForSession so a global threshold is preserved when project overrides omit it.
Or push these changes by commenting:
@cursor push bde1c87f0c
Preview (bde1c87f0c)
diff --git a/packages/core/src/lifecycle-manager.ts b/packages/core/src/lifecycle-manager.ts
--- a/packages/core/src/lifecycle-manager.ts
+++ b/packages/core/src/lifecycle-manager.ts
@@ -182,9 +182,7 @@
/** Check if idle time exceeds the agent-stuck threshold. */
function isIdleBeyondThreshold(session: Session, idleTimestamp: Date): boolean {
- const stuckReaction =
- config.projects[session.projectId]?.reactions?.["agent-stuck"] ??
- config.reactions["agent-stuck"];
+ const stuckReaction = getReactionConfigForSession(session, "agent-stuck");
const thresholdStr = (stuckReaction as Record<string, unknown> | undefined)?.threshold;
if (typeof thresholdStr !== "string") return false;
const stuckThresholdMs = parseDuration(thresholdStr);This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
|
@cursor review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
harsh-batheja
left a comment
There was a problem hiding this comment.
All bot findings addressed; local lifecycle/core and opencode verifications pass.
|
All automated review findings are resolved and CI checks are green. Added reviewer requests for maintainer approval to clear branch protection. |
|
Added additional collaborator review requests to satisfy branch protection approval requirements. |


Problem
The
agent-stuckreaction config supported athresholdfield:But
determineStatus()never returned"stuck". There was no code path that consumed the threshold or transitioned sessions based on idle time. TheSESSION_STATUS.STUCKconstant existed, the reaction config schema accepted it, the event mapping was wired (session.stuck→agent-stuck), but the actual detection was missing.Sessions would stay parked at
pr_openorworkingindefinitely, even when the agent had been idle for hours. No webhook, no notification, no reaction.Fix
1. Stuck detection in
determineStatus()After the existing activity state checks (waiting_input → needs_input, exited → killed), added:
This respects both project-level and global
agent-stuckreaction configs, and usesparseDuration()(already in the file) to parse the threshold string.2. Remove info-priority notification suppression
The previous code had a
priority !== "info"guard that silently dropped all info-level transition notifications. This prevented legitimate info events (session spawned, PR opened, CI passed) from reaching configured notifiers. Removed the guard so all priorities route throughnotificationRouting, letting the config control delivery.Testing
threshold: "10m"immediately transitionedpr_open → stuckand fired the webhook notificationDepends on
This fix works independently, but reaches full effectiveness when combined with #375 (Codex session file matching fix), which ensures
getActivityState()returns actual idle timestamps instead ofnull.