Skip to content

fix(lifecycle): implement stuck detection using agent-stuck threshold#376

Merged
harsh-batheja merged 7 commits intoComposioHQ:mainfrom
sigvardt:fix/lifecycle-stuck-detection
Mar 13, 2026
Merged

fix(lifecycle): implement stuck detection using agent-stuck threshold#376
harsh-batheja merged 7 commits intoComposioHQ:mainfrom
sigvardt:fix/lifecycle-stuck-detection

Conversation

@sigvardt
Copy link
Contributor

@sigvardt sigvardt commented Mar 8, 2026

Problem

The agent-stuck reaction config supported a threshold field:

reactions:
  agent-stuck:
    threshold: "10m"
    action: notify
    priority: urgent

But determineStatus() never returned "stuck". There was no code path that consumed the threshold or transitioned sessions based on idle time. The SESSION_STATUS.STUCK constant existed, the reaction config schema accepted it, the event mapping was wired (session.stuckagent-stuck), but the actual detection was missing.

Sessions would stay parked at pr_open or working indefinitely, even when the agent had been idle for hours. No webhook, no notification, no reaction.

Fix

1. Stuck detection in determineStatus()

After the existing activity state checks (waiting_input → needs_input, exited → killed), added:

if ((activityState.state === "idle" || activityState.state === "blocked") && activityState.timestamp) {
  // Look up agent-stuck threshold from project or global reaction config
  // If idle duration exceeds threshold, return "stuck"
}

This respects both project-level and global agent-stuck reaction configs, and uses parseDuration() (already in the file) to parse the threshold string.

2. Remove info-priority notification suppression

The previous code had a priority !== "info" guard that silently dropped all info-level transition notifications. This prevented legitimate info events (session spawned, PR opened, CI passed) from reaching configured notifiers. Removed the guard so all priorities route through notificationRouting, letting the config control delivery.

Testing

  • All 398 core tests pass (including 24 lifecycle-manager tests)
  • Verified end-to-end: a session idle for 2+ hours with threshold: "10m" immediately transitioned pr_open → stuck and fired the webhook notification

Depends on

This fix works independently, but reaches full effectiveness when combined with #375 (Codex session file matching fix), which ensures getActivityState() returns actual idle timestamps instead of null.

sigvardt and others added 5 commits March 8, 2026 17:40
The agent-stuck reaction config supported a threshold field (e.g.
"10m"), but determineStatus() never returned "stuck" — there was no
code path that consumed the threshold or transitioned sessions based
on idle time. Sessions would stay parked at pr_open/working forever
even when the agent had been idle for hours.

Added idle-time check in determineStatus(): when getActivityState()
reports "idle" or "blocked" with a timestamp, compare the idle
duration against the agent-stuck.threshold config. If exceeded,
return "stuck" so the reaction system can fire notifications.

Also removed the priority !== "info" guard on transition
notifications, so all priority levels (including info) are routed
through notificationRouting. This lets the config control which
notifiers receive each priority level, rather than silently dropping
info-level transition events.
The original stuck check in step 2 (before PR checks) can be bypassed
when getActivityState() returns null (session file not found, cache miss,
I/O failure). When this happens, the code falls through to the PR path
which returns 'pr_open' without ever checking idle duration.

Fix: extract isIdleBeyondThreshold() helper and call it in three places:
1. Step 2: before PR checks (fast path, catches most cases)
2. Step 4b: after PR checks return 'pr_open' (safety net)
3. Step 5: after all checks, for agents that finish without a PR

This ensures stuck detection fires even when the JSONL activity detection
fails to return idle state. Sessions can no longer get permanently stuck
at 'pr_open' when the agent has been idle beyond the threshold.

Also removes the debug console.error calls from the previous commit.
…diness

PRs with no required reviewers never reached 'mergeable' status because
getReviewDecision returned 'none', which was not handled. The lifecycle
poll fell through to 'review_pending' or the default, so merge.ready
never fired and the approved-and-green reaction never triggered.

Also: skip stuck short-circuit when session has an open PR so merge
readiness checks in step 4 can still run. Without this, idle agents
with open PRs get stuck status and never transition to mergeable.

Closes composio#0 (internal fix)
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Stuck threshold lookup skips global config when project overrides exist
    • isIdleBeyondThreshold now reads the merged per-session reaction via getReactionConfigForSession so a global threshold is preserved when project overrides omit it.

Create PR

Or push these changes by commenting:

@cursor push bde1c87f0c
Preview (bde1c87f0c)
diff --git a/packages/core/src/lifecycle-manager.ts b/packages/core/src/lifecycle-manager.ts
--- a/packages/core/src/lifecycle-manager.ts
+++ b/packages/core/src/lifecycle-manager.ts
@@ -182,9 +182,7 @@
 
   /** Check if idle time exceeds the agent-stuck threshold. */
   function isIdleBeyondThreshold(session: Session, idleTimestamp: Date): boolean {
-    const stuckReaction =
-      config.projects[session.projectId]?.reactions?.["agent-stuck"] ??
-      config.reactions["agent-stuck"];
+    const stuckReaction = getReactionConfigForSession(session, "agent-stuck");
     const thresholdStr = (stuckReaction as Record<string, unknown> | undefined)?.threshold;
     if (typeof thresholdStr !== "string") return false;
     const stuckThresholdMs = parseDuration(thresholdStr);

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

@harsh-batheja
Copy link
Collaborator

@cursor review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Copy link
Collaborator

@harsh-batheja harsh-batheja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All bot findings addressed; local lifecycle/core and opencode verifications pass.

@harsh-batheja
Copy link
Collaborator

All automated review findings are resolved and CI checks are green. Added reviewer requests for maintainer approval to clear branch protection.

@harsh-batheja
Copy link
Collaborator

Added additional collaborator review requests to satisfy branch protection approval requirements.

@harsh-batheja harsh-batheja merged commit a64c051 into ComposioHQ:main Mar 13, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants