fix(agents): L1 — crew/agent runtime reliability (parallel gather, per-tool timeout, dead code) by AVADSA25 · Pull Request #171 · AVADSA25/codec

AVADSA25 · 2026-05-31T15:59:02Z

Summary

Four findings from a focused review of codec_agents.py (the self-contained multi-agent runtime). All reliability/correctness — no behavior change for the happy path.

#	Severity	Line	Fix
1	High	`:872`	Parallel crews no longer sink on one agent's failure — `gather` now `return_exceptions=True`
2	High	`_execute_tool_with_hooks`	Per-tool 120s wall-clock budget — a hanging skill can't hang the agent forever
3	Medium	`:838`	Dead tautology removed (`agent.tools != agent.tools`)
4	Low	`:72`	Dead eager `SERPER_API_KEY` global removed (Keychain shellout at import, never read)

Detail

1. Parallel crew isolation. asyncio.gather(*coros) had no return_exceptions=True. If one agent raised (e.g. a stuck-detection abort), every other agent's result was discarded and the whole crew failed — while sequential mode isolates each agent with try/except. Parallel now degrades gracefully: a failed agent contributes [Name failed: Type: msg], the rest still return and join, and each failure gets its own agent_finish (outcome=error) audit.

2. Per-tool timeout. Built-in tools have internal timeouts (web_fetch 30s, shell 30s) but arbitrary skill tools invoked via _make_lazy_fn had none — input(), a no-timeout network call, or a deadlock would hang the agent and its default-thread-pool worker indefinitely. The executor call is now asyncio.wait_for(..., 120s); on timeout the agent gets a recoverable error string + a tool_result/timeout audit and its ReAct loop continues. Documented residual: the abandoned worker thread can't be killed (no thread.kill in Python) and stays parked until the blocking call returns — the generous 120s budget keeps that rare and tools doing real work finish well inside it.

3 & 4. Pure cleanup — the tautology collapsed to the existing if stripped: (allowlist scoping unchanged, pinned by test); the SERPER_API_KEY global was never read (web_search fetches its own key via codec_search.search). The _serper_api_key() getter stays (live reads + satisfies the test_remaining_secrets source scan).

Test plan

tests/test_agents_reliability_l1.py — 7 tests: parallel degrade vs all-succeed, hanging-tool-bounded (measures the coroutine's return at the budget, not the orphaned thread's 2s finish), fast-tool-still-works, allowlist-strips + no-allowlist-keeps-all, dead-global-gone
Existing 236 agent/crew tests still pass (no regression)
python3.13 -m pytest --ignore=tests/test_skills.py -q → 2,062 passed, 77 skipped
ruff check: 0 issues

Branches off main; touches only codec_agents.py — independent of the other queued PRs.

🤖 Generated with Claude Code

Four findings from the codec_agents.py review: 1. Parallel crews no longer sink on one agent's failure (codec_agents.py:872) asyncio.gather(*coros) had no return_exceptions=True, so if ONE agent raised (e.g. a stuck-detection abort), every other agent's result was discarded and the whole crew failed. Sequential mode isolates per-agent; parallel now does too — a failed agent contributes an error marker, the rest still return + are joined. Each failure gets its own agent_finish audit (outcome=error), matching sequential. 2. Per-tool wall-clock budget (codec_agents.py:_execute_tool_with_hooks) A skill tool with no internal timeout (input(), a no-timeout network call, a deadlock) would hang the agent — and its default-thread-pool worker — forever. The executor call is now wrapped in asyncio.wait_for(120s); on timeout the agent gets a recoverable error string (+ a tool_result/timeout audit) and its ReAct loop continues. Documented residual: the abandoned worker thread can't be killed and stays parked until the blocking call returns — the generous budget keeps that rare. 3. Dead tautology removed (codec_agents.py:838) `if agent.tools != agent.tools or before != len(agent.tools)` — the first operand compares a list to itself (always False); the condition collapsed to the inner `if stripped:`. Simplified; allowlist scoping behavior is unchanged (pinned by test). 4. Dead eager module global removed (codec_agents.py:72) `SERPER_API_KEY = _serper_api_key()` was never read (web_search routes through codec_search.search, which fetches the key itself) yet did a Keychain shellout at every import. Dropped; the _serper_api_key() getter stays for live reads (+ satisfies the test_remaining_secrets source scan). Test surface: tests/test_agents_reliability_l1.py (7 tests) — parallel degrade-vs-all-succeed, hanging-tool-bounded (coroutine returns at the budget, not the orphaned thread's finish), fast-tool-still-works, allowlist-strips + no-allowlist-keeps-all, dead-global-gone. Existing 236 agent/crew tests still pass. Full suite: 2,062 passed / 77 skipped. ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

AVADSA25 merged commit 5e94289 into main May 31, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agents): L1 — crew/agent runtime reliability (parallel gather, per-tool timeout, dead code)#171

fix(agents): L1 — crew/agent runtime reliability (parallel gather, per-tool timeout, dead code)#171
AVADSA25 merged 1 commit into
mainfrom
agents-reliability-fixes

AVADSA25 commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AVADSA25 commented May 31, 2026

Summary

Detail

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants