fix: Improve background agent stability and completion detection #697

Gladdonilli · 2026-01-11T17:43:21Z

Summary

This PR improves background agent reliability by adding safety guards and simplifying completion detection to prevent stuck tasks.

Changes

1. Agent Safety Guards

max_steps limit: Added max_steps: 25 to explore agent and max_steps: 30 to librarian agent to prevent infinite loops
Tool blocking: Librarian now blocks sisyphus_task, call_omo_agent, and task tools to prevent unintended child spawning

2. Background Agent Completion Detection (PR #655 Implementation)

Global timeout: Added MAX_RUN_TIME_MS (15 minutes) to prevent tasks from running forever
Simplified idle handler: Removed validateSessionHasOutput() and checkSessionTodos() guards that were causing stuck tasks
Minimum idle time: MIN_IDLE_TIME_MS (5 seconds) prevents premature completion from early session.idle events
Timeout cleanup: Timer is properly cleared on task completion to prevent memory leaks

3. Configuration Improvements

JSONC support: Config paths now check for .jsonc files before .json, enabling comments in config files
Category model fix: sisyphus_task sync mode now correctly passes categoryModel for category-based tasks

Completion Detection Flow

┌─────────────────────────────────────────────────────────────┐
│                    COMPLETION FLOW                          │
├─────────────────────────────────────────────────────────────┤
│  1. Agent runs, does work (tool calls, thinking, etc.)      │
│  2. Agent goes idle (no more tool calls/responses)          │
│  3. OpenCode SDK fires session.idle event                   │
│  4. GUARD: Check if elapsed time >= MIN_IDLE_TIME_MS (5s)   │
│     - If < 5s: IGNORE (too early, agent still starting)     │
│     - If >= 5s: ACCEPT as complete                          │
│  5. Clear timeout timer (prevent memory leak)               │
│  6. Mark task.status = "completed"                          │
│  7. Notify parent via noReply batching pattern              │
└─────────────────────────────────────────────────────────────┘

Why Guards Were Removed

The previous guards (validateSessionHasOutput, checkSessionTodos) caused tasks to get stuck:

Guard	Problem
`validateSessionHasOutput`	If model returns empty (config issue), waits forever
`checkSessionTodos`	If agent creates todos but never completes them, waits forever

The new approach: Fail fast, surface errors, don't hide them. An empty result is a visible signal of a problem the user can debug.

Safety Nets

Edge Case	Protection
Early completion	`MIN_IDLE_TIME_MS` (5s minimum)
Infinite loop	`max_steps: 25/30`
Total hang	`MAX_RUN_TIME_MS` (15 min timeout)
Bad model config	Empty result surfaces issue
Unintended spawning	Tool blocking in librarian
Memory leak	Timeout timer cleared on completion

Testing

✅ bun run typecheck passes
✅ bun run build succeeds
✅ All 45 background-agent tests pass
✅ Manual testing: explore (background), librarian (background), explore (sync), librarian (sync) all complete successfully

References

Follows pattern from kdcokenny/opencode-background-agents
Implements simplified completion from PR fix(sisyphus-task): complete overhaul of background agent model handling and sync mode #655

Summary by cubic

Improves background agent stability and completion detection to prevent stuck tasks and runaway sessions. Adds model variant support, a workdir-constrained sisyphus_task, and a version-aware installer.

Bug Fixes
- Background tasks: complete on session.idle after 5s; add 15‑min timeout (also on resume) with clear/unref; prevent double-release; reset startedAt on resume; clean session tracking.
- Agent safety: cap max_steps (explore 25, librarian 30); block task-spawning tools in librarian.
- Parent agent: fix parentAgent resolution for background tools to stop Prometheus→Build fallback.
- Sisyphus: pass category model (including variant) in sync mode.
New Features
- Config: JSONC support and a "variant" field for agents/categories; variant applied to messages and task models.
- Sisyphus task: new workdir parameter to constrain agents to a directory (validated and injected into system/prompt and resume).
- Installer: write a version-aware plugin entry (@latest/@beta or pinned) and replace existing entry.

^{Written for commit ec554fc. Summary will update on new commits.}

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

cubic-dev-ai

3 issues found across 6 files

Confidence score: 2/5

Double-releasing the concurrency key in src/features/background-agent/manager.ts risks oversubscribing background tasks and breaking the intended queue limits, making the change high risk.
Resumed tasks failing to reset startedAt in src/features/background-agent/manager.ts means long-running work can be wrongly marked complete on idle, potentially cutting off user sessions.
Global task timeouts in src/features/background-agent/manager.ts are left pending after completion, keeping long-lived timers around and threatening overall process stability.
Pay close attention to src/features/background-agent/manager.ts - concurrency handling, resume logic, and timeout cleanup all need fixes.

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/features/background-agent/manager.ts">

<violation number="1" location="src/features/background-agent/manager.ts:189">
P2: Global task timeout is created but never cleared/unref’d on most completion paths, leaving 15‑minute timers alive after tasks finish and potentially keeping the event loop running.</violation>

<violation number="2" location="src/features/background-agent/manager.ts:197">
P1: Concurrency key is released twice on timeout (timeout handler and cleanup), but `release` is not idempotent—double-release can oversubscribe queued tasks</violation>

<violation number="3" location="src/features/background-agent/manager.ts:423">
P2: Resumed tasks can be prematurely completed: startedAt isn’t reset on resume, so the simplified session.idle handler will immediately complete any long-lived resumed task on the first idle event without verifying output/todos.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

src/features/background-agent/manager.ts

cubic-dev-ai

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/features/background-agent/manager.ts">

<violation number="1" location="src/features/background-agent/manager.ts:361">
P2: Resume error path does not clear the newly created timeout or release the concurrency key, holding resources until retention cleanup and keeping the timer scheduled</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

src/features/background-agent/manager.ts

motonari728 · 2026-01-11T22:12:21Z

Love this PR ❤️ I hope early merge.

- Add max_steps limit to explore (25) and librarian (30) agents - Block sisyphus_task/call_omo_agent tools in librarian to prevent spawning - Add global 15-minute timeout for background tasks (MAX_RUN_TIME_MS) - Simplify session.idle handler - remove validateSessionHasOutput/checkSessionTodos guards - Add JSONC config file support (.jsonc checked before .json) - Fix categoryModel passing in sisyphus_task sync mode Reference: PR code-yeongyu#655, kdcokenny/opencode-background-agents

- Reset startedAt when resuming tasks to prevent immediate completion (MIN_IDLE_TIME_MS check was passing immediately for resumed tasks) - Previous commit already fixed timeout.unref() and double-release prevention

- Set concurrencyKey = undefined after every release() call to prevent double-release when multiple code paths try to release the same key - Add 15-minute timeout timer for resumed tasks (was missing) - Fixes: promptAsync error, session.deleted, pruneStaleTasksAndNotifications

Address P2 feedback: resume() error handler now properly: - Clears the timeout timer created for the resumed task - Releases concurrency key to unblock queued tasks

…sformation

Allow optional model variant config for agents and categories. Propagate category variants into task model payloads so category-driven runs inherit provider-specific variants. Closes: code-yeongyu#647

@latest

Previously, the installer always wrote 'oh-my-opencode' without a version, causing users who installed beta versions (e.g., bunx oh-my-opencode@beta) to unexpectedly load the stable version on next OpenCode startup. Now the installer queries npm dist-tags and writes: - @latest when current version matches the latest tag - @beta when current version matches the beta tag - @<version> when no tag matches (pins to specific version) This ensures: - bunx oh-my-opencode install → @latest (tracks stable) - bunx oh-my-opencode@beta install → @beta (tracks beta tag) - bunx [email protected] install → @3.0.0-beta.2 (pinned)

@latest

Addresses cubic review feedback: installer now replaces existing oh-my-opencode entries with the new version-aware entry, allowing users to switch between @latest, @beta, or pinned versions.

Add optional `workdir` parameter that injects strict directory constraints into spawned agents' system content. Validates absolute paths, existence, and directory type. Enables orchestrators to delegate work to specific git worktrees or project subdirectories. - Validation: absolute path, exists, is directory - Injection: system content (sync/background) or prompt prepend (resume) - Documentation: updated tool description and schema - Tests: validation, injection, and combination scenarios

- Created findFirstMessageWithAgent() to read original session agent from oldest message - Updated parentAgent resolution in sisyphus_task, call_omo_agent, background_task - Fixed message.updated handler to only track agent from user messages (not assistant/system) - Added debug logging for parentAgent resolution troubleshooting Fixes issue where Prometheus agent would switch to Build when background task notifications were injected, caused by OpenCode writing 'agent: build' to message files mid-session.

- Fixed a spelling error. - Clarify when to use google_auth: true vs false based on plugin choice

code-yeongyu · 2026-01-12T04:44:09Z

@Gladdonilli i once modified in this way before but kinda worried about double-releasing as cubit bot, do you have any opinions regarding this

Gladdonilli · 2026-01-12T12:45:55Z

@Gladdonilli i once modified in this way before but kinda worried about double-releasing as cubit bot, do you have any opinions regarding this

Ill polish this up more today, might have a more elegant solution

- Release concurrency key immediately in session.idle handler - Clean up subagentSessions Set on normal completion - Clean up sessionAgentMap on both completion paths - Add documentation for validateSessionHasOutput usage

Gladdonilli · 2026-01-12T13:25:18Z

making a new pr to retrigger reviews.

greptile-apps bot reviewed Jan 11, 2026

View reviewed changes

cubic-dev-ai bot reviewed Jan 11, 2026

View reviewed changes

src/features/background-agent/manager.ts Show resolved Hide resolved

src/features/background-agent/manager.ts Show resolved Hide resolved

src/features/background-agent/manager.ts Outdated Show resolved Hide resolved

cubic-dev-ai bot reviewed Jan 11, 2026

View reviewed changes

src/features/background-agent/manager.ts Show resolved Hide resolved

Gladdonilli and others added 12 commits January 12, 2026 12:24

fix: clear timeout timer on task completion to prevent memory leak

276cc21

fix: address PR review issues - reset startedAt on resume

6cd626d

- Reset startedAt when resuming tasks to prevent immediate completion (MIN_IDLE_TIME_MS check was passing immediately for resumed tasks) - Previous commit already fixed timeout.unref() and double-release prevention

fix: clean up timeout and concurrency on resume error

35005be

Address P2 feedback: resume() error handler now properly: - Clears the timeout timer created for the resumed task - Releases concurrency key to unblock queued tasks

fix(sisyphus-orchestrator): preserve subagent response in output tran…

246a944

…sformation

feat(config): add model variant support

b38cb24

Allow optional model variant config for agents and categories. Propagate category variants into task model payloads so category-driven runs inherit provider-specific variants. Closes: code-yeongyu#647

fix(cli): update existing plugin entry instead of skipping

3318bac

Addresses cubic review feedback: installer now replaces existing oh-my-opencode entries with the new version-aware entry, allowing users to switch between @latest, @beta, or pinned versions.

docs(zh-CN): fix typo and clarify google_auth configuration

d64af50

- Fixed a spelling error. - Clarify when to use google_auth: true vs false based on plugin choice

fix(background-agent): prevent memory leaks on task completion

ec554fc

- Release concurrency key immediately in session.idle handler - Clean up subagentSessions Set on normal completion - Clean up sessionAgentMap on both completion paths - Add documentation for validateSessionHasOutput usage

Gladdonilli force-pushed the fix/subagent-safety-minimal branch from 4833ebb to ec554fc Compare January 12, 2026 13:17

Gladdonilli closed this Jan 12, 2026

Gladdonilli mentioned this pull request Jan 12, 2026

fix: Improve background agent stability and completion detection #715

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Improve background agent stability and completion detection #697

fix: Improve background agent stability and completion detection #697

Uh oh!

Gladdonilli commented Jan 11, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

motonari728 commented Jan 11, 2026

Uh oh!

code-yeongyu commented Jan 12, 2026

Uh oh!

Gladdonilli commented Jan 12, 2026

Uh oh!

Gladdonilli commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

fix: Improve background agent stability and completion detection #697

fix: Improve background agent stability and completion detection #697

Uh oh!

Conversation

Gladdonilli commented Jan 11, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Agent Safety Guards

2. Background Agent Completion Detection (PR #655 Implementation)

3. Configuration Improvements

Completion Detection Flow

Why Guards Were Removed

Safety Nets

Testing

References

Summary by cubic

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

motonari728 commented Jan 11, 2026

Uh oh!

code-yeongyu commented Jan 12, 2026

Uh oh!

Gladdonilli commented Jan 12, 2026

Uh oh!

Gladdonilli commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Gladdonilli commented Jan 11, 2026 •

edited by cubic-dev-ai bot

Loading