Skip to content

fix: add file reading guardrails to drafter and verifier sub-agents#84

Merged
derekmisler merged 1 commit intodocker:mainfrom
derekmisler:fix/subagent-hallucination-guardrails
Mar 12, 2026
Merged

fix: add file reading guardrails to drafter and verifier sub-agents#84
derekmisler merged 1 commit intodocker:mainfrom
derekmisler:fix/subagent-hallucination-guardrails

Conversation

@derekmisler
Copy link
Copy Markdown
Contributor

@derekmisler derekmisler commented Mar 11, 2026

Closes: https://github.com/docker/gordon/issues/199

Summary

  • Adds circuit breaker instructions and file read caps to the drafter (20 files max, 3-consecutive-not-found stop) and verifier (10 files max) sub-agents
  • Adds list_directory to both sub-agent toolsets so they can discover files instead of guessing paths
  • Updates root agent delegation to pass directory listings to sub-agents, eliminating path guessing upfront

Context

PR #77 added guardrails to the root agent, but the drafter and verifier had none. In docker/sandboxes run 22930756843, the verifier made 868 read_file calls (432 not-found) brute-forcing /tmp/*.json path permutations until the 40-minute timeout killed the process. The drafter contributed another 330 reads (146 not-found).

This is the same class of bug as the drafter loop in docker/docker-agent#2038 (from #77), but this time the verifier was the primary offender — trying paths like /tmp/verify.json, /tmp/verifier.json, /tmp/spec.json, /tmp/payload.json, etc.

Test plan

  • Trigger a review on a large refactoring PR (many cross-file references) and verify:
    • Drafter completes within reasonable time
    • No more than ~20 read_file calls in the verbose log
    • Zero degenerate path-guessing sequences
    • Review is actually posted
  • Verify verifier stays under 10 source file reads

PR docker#77 added guardrails to the root agent but the drafter and verifier
had none, causing them to enter degenerate read_file loops that exhaust
the 40-minute timeout. Evidence: docker/sandboxes run 22930756843 showed
the verifier making 868 read_file calls (432 not-found) brute-forcing
/tmp/*.json path permutations.

Changes:
- Add circuit breaker instructions to drafter (20-file cap) and verifier
  (10-file cap) with 3-consecutive-not-found stop rule
- Add list_directory to both sub-agent toolsets so they can discover
  files instead of guessing paths
- Root agent now passes directory listings to sub-agents in delegation
  messages to eliminate path guessing upfront
@derekmisler derekmisler requested a review from a team as a code owner March 11, 2026 14:31
@derekmisler derekmisler self-assigned this Mar 11, 2026
Copy link
Copy Markdown
Contributor

@docker-agent docker-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Assessment: 🟢 APPROVE

Summary

The changes add comprehensive file reading guardrails to both the drafter and verifier sub-agents, effectively addressing the path-guessing vulnerability documented in the PR description. The implementation is clean, consistent, and well-documented.

Key improvements:

  • Circuit breaker pattern (3 consecutive not-found → stop)
  • File read caps (drafter: 20 files, verifier: 10 files)
  • Adds list_directory to both agent toolsets
  • Root agent now provides directory listings to sub-agents

No bugs detected in the added code. The guardrails mirror the approach used in PR #77 for the root agent and should prevent the degenerate behavior seen in docker/sandboxes run 22930756843.

Findings

No issues found. The code changes are safe to merge.

@derekmisler derekmisler merged commit 0635d86 into docker:main Mar 12, 2026
11 checks passed
derekmisler added a commit to derekmisler/cagent-action that referenced this pull request Mar 16, 2026
The drafter sub-agent can enter a degenerate loop where it calls
read_file on the same diff chunk path hundreds of times without
producing any analysis output. This consumed the entire 40-minute
timeout in docker/pinata#40026 (629 identical read_file calls).

The existing guardrails from PR docker#84 (circuit breakers, read caps)
only catch file-not-found loops. This adds explicit instructions to
never re-read the same path and to immediately produce output if
about to duplicate a read.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants