-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: Improve background agent stability and completion detection #697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Improve background agent stability and completion detection #697
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 issues found across 6 files
Confidence score: 2/5
- Double-releasing the concurrency key in
src/features/background-agent/manager.tsrisks oversubscribing background tasks and breaking the intended queue limits, making the change high risk. - Resumed tasks failing to reset
startedAtinsrc/features/background-agent/manager.tsmeans long-running work can be wrongly marked complete on idle, potentially cutting off user sessions. - Global task timeouts in
src/features/background-agent/manager.tsare left pending after completion, keeping long-lived timers around and threatening overall process stability. - Pay close attention to
src/features/background-agent/manager.ts- concurrency handling, resume logic, and timeout cleanup all need fixes.
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/features/background-agent/manager.ts">
<violation number="1" location="src/features/background-agent/manager.ts:189">
P2: Global task timeout is created but never cleared/unref’d on most completion paths, leaving 15‑minute timers alive after tasks finish and potentially keeping the event loop running.</violation>
<violation number="2" location="src/features/background-agent/manager.ts:197">
P1: Concurrency key is released twice on timeout (timeout handler and cleanup), but `release` is not idempotent—double-release can oversubscribe queued tasks</violation>
<violation number="3" location="src/features/background-agent/manager.ts:423">
P2: Resumed tasks can be prematurely completed: startedAt isn’t reset on resume, so the simplified session.idle handler will immediately complete any long-lived resumed task on the first idle event without verifying output/todos.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/features/background-agent/manager.ts">
<violation number="1" location="src/features/background-agent/manager.ts:361">
P2: Resume error path does not clear the newly created timeout or release the concurrency key, holding resources until retention cleanup and keeping the timer scheduled</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
Love this PR ❤️ I hope early merge. |
- Add max_steps limit to explore (25) and librarian (30) agents - Block sisyphus_task/call_omo_agent tools in librarian to prevent spawning - Add global 15-minute timeout for background tasks (MAX_RUN_TIME_MS) - Simplify session.idle handler - remove validateSessionHasOutput/checkSessionTodos guards - Add JSONC config file support (.jsonc checked before .json) - Fix categoryModel passing in sisyphus_task sync mode Reference: PR code-yeongyu#655, kdcokenny/opencode-background-agents
- Reset startedAt when resuming tasks to prevent immediate completion (MIN_IDLE_TIME_MS check was passing immediately for resumed tasks) - Previous commit already fixed timeout.unref() and double-release prevention
- Set concurrencyKey = undefined after every release() call to prevent double-release when multiple code paths try to release the same key - Add 15-minute timeout timer for resumed tasks (was missing) - Fixes: promptAsync error, session.deleted, pruneStaleTasksAndNotifications
Address P2 feedback: resume() error handler now properly: - Clears the timeout timer created for the resumed task - Releases concurrency key to unblock queued tasks
Allow optional model variant config for agents and categories. Propagate category variants into task model payloads so category-driven runs inherit provider-specific variants. Closes: code-yeongyu#647
Previously, the installer always wrote 'oh-my-opencode' without a version, causing users who installed beta versions (e.g., bunx oh-my-opencode@beta) to unexpectedly load the stable version on next OpenCode startup. Now the installer queries npm dist-tags and writes: - @latest when current version matches the latest tag - @beta when current version matches the beta tag - @<version> when no tag matches (pins to specific version) This ensures: - bunx oh-my-opencode install → @latest (tracks stable) - bunx oh-my-opencode@beta install → @beta (tracks beta tag) - bunx [email protected] install → @3.0.0-beta.2 (pinned)
Add optional `workdir` parameter that injects strict directory constraints into spawned agents' system content. Validates absolute paths, existence, and directory type. Enables orchestrators to delegate work to specific git worktrees or project subdirectories. - Validation: absolute path, exists, is directory - Injection: system content (sync/background) or prompt prepend (resume) - Documentation: updated tool description and schema - Tests: validation, injection, and combination scenarios
- Created findFirstMessageWithAgent() to read original session agent from oldest message - Updated parentAgent resolution in sisyphus_task, call_omo_agent, background_task - Fixed message.updated handler to only track agent from user messages (not assistant/system) - Added debug logging for parentAgent resolution troubleshooting Fixes issue where Prometheus agent would switch to Build when background task notifications were injected, caused by OpenCode writing 'agent: build' to message files mid-session.
- Fixed a spelling error. - Clarify when to use google_auth: true vs false based on plugin choice
|
@Gladdonilli i once modified in this way before but kinda worried about double-releasing as cubit bot, do you have any opinions regarding this |
Ill polish this up more today, might have a more elegant solution |
- Release concurrency key immediately in session.idle handler - Clean up subagentSessions Set on normal completion - Clean up sessionAgentMap on both completion paths - Add documentation for validateSessionHasOutput usage
4833ebb to
ec554fc
Compare
|
making a new pr to retrigger reviews. |
Summary
This PR improves background agent reliability by adding safety guards and simplifying completion detection to prevent stuck tasks.
Changes
1. Agent Safety Guards
max_stepslimit: Addedmax_steps: 25to explore agent andmax_steps: 30to librarian agent to prevent infinite loopssisyphus_task,call_omo_agent, andtasktools to prevent unintended child spawning2. Background Agent Completion Detection (PR #655 Implementation)
MAX_RUN_TIME_MS(15 minutes) to prevent tasks from running forevervalidateSessionHasOutput()andcheckSessionTodos()guards that were causing stuck tasksMIN_IDLE_TIME_MS(5 seconds) prevents premature completion from earlysession.idleevents3. Configuration Improvements
.jsoncfiles before.json, enabling comments in config filessisyphus_tasksync mode now correctly passescategoryModelfor category-based tasksCompletion Detection Flow
Why Guards Were Removed
The previous guards (
validateSessionHasOutput,checkSessionTodos) caused tasks to get stuck:validateSessionHasOutputcheckSessionTodosThe new approach: Fail fast, surface errors, don't hide them. An empty result is a visible signal of a problem the user can debug.
Safety Nets
MIN_IDLE_TIME_MS(5s minimum)max_steps: 25/30MAX_RUN_TIME_MS(15 min timeout)Testing
bun run typecheckpassesbun run buildsucceedsReferences
Summary by cubic
Improves background agent stability and completion detection to prevent stuck tasks and runaway sessions. Adds model variant support, a workdir-constrained sisyphus_task, and a version-aware installer.
Bug Fixes
New Features
Written for commit ec554fc. Summary will update on new commits.