fix: Improve background agent stability and completion detection #715

Gladdonilli · 2026-01-12T14:21:07Z

Summary

Complete overhaul of background agent stability, combining improvements from closed PRs #697, #655, and #714.

Changes

Agent Safety Guards (from #697)

explore agent: max_steps: 25 to prevent infinite loops
librarian agent: max_steps: 30 + blocks task, sisyphus_task, call_omo_agent tools
Prevents subagents from spawning their own subagents

Background Completion Detection (from #655, #697)

Push-based notification: Fire-and-forget prompt() instead of awaiting (fixes JSON parse errors with thinking models)
MIN_IDLE_TIME_MS (5s): Minimum time before considering session idle
MAX_RUN_TIME_MS (15min): Global timeout to prevent stuck tasks
Simplified idle handler: Removed validateSessionHasOutput and checkSessionTodos guards that were causing stuck tasks
Timeout cleanup: Clear timeoutTimer on task completion

Memory Leak Prevention (from #714)

Release concurrency key immediately on session.idle completion (not just on session.deleted)
Prevent double-release: Set concurrencyKey = undefined after release
Clean up on resume error: Proper cleanup of timeout and concurrency on resume failures

Other Fixes

Category model fix: Pass categoryModel for category-based tasks in sync mode
JSONC config support: Priority check for .jsonc before .json
Reset startedAt on resume: Proper timing for resumed tasks

Testing

TypeScript type check passes
Build succeeds
Manual stress test with 30+ parallel background agents

Related PRs

Supersedes fix: Improve background agent stability and completion detection #697 (closed to retrigger reviews)
Incorporates fix(sisyphus-task): complete overhaul of background agent model handling and sync mode #655 changes (closed as outdated by refactor)
Includes fix(background-agent): prevent memory leaks on task completion #714 memory leak fixes (closed)

References

Pattern inspired by kdcokenny/opencode-background-agents

Summary by cubic

Improves background agent stability and reliable completion by adding safety limits, simplifying idle handling, and fixing cleanup. Tasks now complete predictably, avoid leaks, and subagents can’t spiral.

Bug Fixes
- Added 15-minute global timeout for background tasks; clear/unref timers on complete, timeout, resume, and error.
- Treat session.idle as completion only after 5s runtime; removed guards that caused stuck tasks.
- Switched background prompt to fire-and-forget to prevent JSON parse errors with thinking models.
- Always release concurrency keys on completion/timeout/error and set them to undefined to prevent double-release.
- Reset startedAt on resume; added timeout for resumed tasks; proper cleanup on resume errors.
- Subagent safety: max_steps (explore: 25, librarian: 30); librarian blocks task, sisyphus_task, and call_omo_agent tools.
- Fixed category-based sync tasks by passing categoryModel; agent-based tasks use the agent’s configured model.
- Config: prefer .jsonc over .json for user/project paths.

^{Written for commit d59f890. Summary will update on new commits.}

- Add max_steps limit to explore (25) and librarian (30) agents - Block sisyphus_task/call_omo_agent tools in librarian to prevent spawning - Add global 15-minute timeout for background tasks (MAX_RUN_TIME_MS) - Simplify session.idle handler - remove validateSessionHasOutput/checkSessionTodos guards - Add JSONC config file support (.jsonc checked before .json) - Fix categoryModel passing in sisyphus_task sync mode Reference: PR code-yeongyu#655, kdcokenny/opencode-background-agents

- Reset startedAt when resuming tasks to prevent immediate completion (MIN_IDLE_TIME_MS check was passing immediately for resumed tasks) - Previous commit already fixed timeout.unref() and double-release prevention

- Set concurrencyKey = undefined after every release() call to prevent double-release when multiple code paths try to release the same key - Add 15-minute timeout timer for resumed tasks (was missing) - Fixes: promptAsync error, session.deleted, pruneStaleTasksAndNotifications

Address P2 feedback: resume() error handler now properly: - Clears the timeout timer created for the resumed task - Releases concurrency key to unblock queued tasks

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

cubic-dev-ai

2 issues found across 6 files

Confidence score: 3/5

Late prompt rejections in src/tools/sisyphus-task/tools.ts are dropped after the 100 ms fire-and-forget check, so users can see a misleading “No assistant response found” after 10 minutes of polling when the prompt never succeeded.
Early returns on prompt errors leave subagentSessions entries uncleared in src/tools/sisyphus-task/tools.ts, risking stale session tracking and incorrect future runs.
Pay close attention to src/tools/sisyphus-task/tools.ts – prompt error handling needs to avoid losing failures and leaking session entries.

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/tools/sisyphus-task/tools.ts">

<violation number="1" location="src/tools/sisyphus-task/tools.ts:426">
P2: Late prompt rejections are dropped: fire-and-forget prompt is only checked for 100 ms, so failures after that window lead to 10‑minute polling and a misleading "No assistant response found" when the prompt was never sent.</violation>

<violation number="2" location="src/tools/sisyphus-task/tools.ts:445">
P2: Prompt error early return leaves subagentSessions entry uncleared, causing stale session tracking</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-01-12T14:36:55Z

src/tools/sisyphus-task/tools.ts

+        // For category-based tasks, pass the model from category config
+        // For agent-based tasks, use agent's configured model (don't pass model in body)
+        let promptError: Error | undefined
+        client.session.prompt({


P2: Late prompt rejections are dropped: fire-and-forget prompt is only checked for 100 ms, so failures after that window lead to 10‑minute polling and a misleading "No assistant response found" when the prompt was never sent.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/tools/sisyphus-task/tools.ts, line 426: <comment>Late prompt rejections are dropped: fire-and-forget prompt is only checked for 100 ms, so failures after that window lead to 10‑minute polling and a misleading "No assistant response found" when the prompt was never sent.</comment> <file context> @@ -419,21 +419,30 @@ System notifies on completion. Use \`background_output\` with task_id="${task.id + // For category-based tasks, pass the model from category config + // For agent-based tasks, use agent's configured model (don't pass model in body) + let promptError: Error | undefined + client.session.prompt({ + path: { id: sessionID }, + body: { </file context>

cubic-dev-ai · 2026-01-12T14:36:55Z

src/tools/sisyphus-task/tools.ts

+        if (promptError) {
          if (toastManager && taskId !== undefined) {
            toastManager.removeTask(taskId)
          }


P2: Prompt error early return leaves subagentSessions entry uncleared, causing stale session tracking

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/tools/sisyphus-task/tools.ts, line 445: <comment>Prompt error early return leaves subagentSessions entry uncleared, causing stale session tracking</comment> <file context> @@ -419,21 +419,30 @@ System notifies on completion. Use \`background_output\` with task_id="${task.id + // Small delay to let the prompt start + await new Promise(resolve => setTimeout(resolve, 100)) + + if (promptError) { if (toastManager && taskId !== undefined) { toastManager.removeTask(taskId) </file context>

Gladdonilli · 2026-01-12T16:11:45Z

adding dynamic approach for "noreply", updating tormorrow in new pr

kdcokenny · 2026-01-13T03:53:20Z

adding dynamic approach for "noreply", updating tormorrow in new pr

if you're making a new pr please close this one

Gladdonilli added 5 commits January 12, 2026 22:19

fix: clear timeout timer on task completion to prevent memory leak

260d49d

fix: address PR review issues - reset startedAt on resume

440d9bc

- Reset startedAt when resuming tasks to prevent immediate completion (MIN_IDLE_TIME_MS check was passing immediately for resumed tasks) - Previous commit already fixed timeout.unref() and double-release prevention

fix: clean up timeout and concurrency on resume error

d59f890

Address P2 feedback: resume() error handler now properly: - Clears the timeout timer created for the resumed task - Releases concurrency key to unblock queued tasks

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

cubic-dev-ai bot reviewed Jan 12, 2026

View reviewed changes

Gladdonilli closed this Jan 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Improve background agent stability and completion detection #715

fix: Improve background agent stability and completion detection #715

Uh oh!

Gladdonilli commented Jan 12, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Jan 12, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Jan 12, 2026 •

edited

Loading

Uh oh!

Gladdonilli commented Jan 12, 2026

Uh oh!

kdcokenny commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: Improve background agent stability and completion detection #715

fix: Improve background agent stability and completion detection #715

Uh oh!

Conversation

Gladdonilli commented Jan 12, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Agent Safety Guards (from #697)

Background Completion Detection (from #655, #697)

Memory Leak Prevention (from #714)

Other Fixes

Testing

Related PRs

References

Summary by cubic

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Gladdonilli commented Jan 12, 2026

Uh oh!

kdcokenny commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Gladdonilli commented Jan 12, 2026 •

edited by cubic-dev-ai bot

Loading

cubic-dev-ai bot Jan 12, 2026 •

edited

Loading

cubic-dev-ai bot Jan 12, 2026 •

edited

Loading