fix(agents): L1 — crew/agent runtime reliability (parallel gather, per-tool timeout, dead code)#171
Merged
Merged
Conversation
Four findings from the codec_agents.py review: 1. Parallel crews no longer sink on one agent's failure (codec_agents.py:872) asyncio.gather(*coros) had no return_exceptions=True, so if ONE agent raised (e.g. a stuck-detection abort), every other agent's result was discarded and the whole crew failed. Sequential mode isolates per-agent; parallel now does too — a failed agent contributes an error marker, the rest still return + are joined. Each failure gets its own agent_finish audit (outcome=error), matching sequential. 2. Per-tool wall-clock budget (codec_agents.py:_execute_tool_with_hooks) A skill tool with no internal timeout (input(), a no-timeout network call, a deadlock) would hang the agent — and its default-thread-pool worker — forever. The executor call is now wrapped in asyncio.wait_for(120s); on timeout the agent gets a recoverable error string (+ a tool_result/timeout audit) and its ReAct loop continues. Documented residual: the abandoned worker thread can't be killed and stays parked until the blocking call returns — the generous budget keeps that rare. 3. Dead tautology removed (codec_agents.py:838) `if agent.tools != agent.tools or before != len(agent.tools)` — the first operand compares a list to itself (always False); the condition collapsed to the inner `if stripped:`. Simplified; allowlist scoping behavior is unchanged (pinned by test). 4. Dead eager module global removed (codec_agents.py:72) `SERPER_API_KEY = _serper_api_key()` was never read (web_search routes through codec_search.search, which fetches the key itself) yet did a Keychain shellout at every import. Dropped; the _serper_api_key() getter stays for live reads (+ satisfies the test_remaining_secrets source scan). Test surface: tests/test_agents_reliability_l1.py (7 tests) — parallel degrade-vs-all-succeed, hanging-tool-bounded (coroutine returns at the budget, not the orphaned thread's finish), fast-tool-still-works, allowlist-strips + no-allowlist-keeps-all, dead-global-gone. Existing 236 agent/crew tests still pass. Full suite: 2,062 passed / 77 skipped. ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four findings from a focused review of
codec_agents.py(the self-contained multi-agent runtime). All reliability/correctness — no behavior change for the happy path.:872gathernowreturn_exceptions=True_execute_tool_with_hooks:838agent.tools != agent.tools):72SERPER_API_KEYglobal removed (Keychain shellout at import, never read)Detail
1. Parallel crew isolation.
asyncio.gather(*coros)had noreturn_exceptions=True. If one agent raised (e.g. a stuck-detectionabort), every other agent's result was discarded and the whole crew failed — while sequential mode isolates each agent with try/except. Parallel now degrades gracefully: a failed agent contributes[Name failed: Type: msg], the rest still return and join, and each failure gets its ownagent_finish(outcome=error) audit.2. Per-tool timeout. Built-in tools have internal timeouts (web_fetch 30s, shell 30s) but arbitrary skill tools invoked via
_make_lazy_fnhad none —input(), a no-timeout network call, or a deadlock would hang the agent and its default-thread-pool worker indefinitely. The executor call is nowasyncio.wait_for(..., 120s); on timeout the agent gets a recoverable error string + atool_result/timeout audit and its ReAct loop continues. Documented residual: the abandoned worker thread can't be killed (no thread.kill in Python) and stays parked until the blocking call returns — the generous 120s budget keeps that rare and tools doing real work finish well inside it.3 & 4. Pure cleanup — the tautology collapsed to the existing
if stripped:(allowlist scoping unchanged, pinned by test); theSERPER_API_KEYglobal was never read (web_search fetches its own key viacodec_search.search). The_serper_api_key()getter stays (live reads + satisfies thetest_remaining_secretssource scan).Test plan
tests/test_agents_reliability_l1.py— 7 tests: parallel degrade vs all-succeed, hanging-tool-bounded (measures the coroutine's return at the budget, not the orphaned thread's 2s finish), fast-tool-still-works, allowlist-strips + no-allowlist-keeps-all, dead-global-gonepython3.13 -m pytest --ignore=tests/test_skills.py -q→ 2,062 passed, 77 skippedruff check: 0 issues🤖 Generated with Claude Code