Skip to content

Add file reading guardrails to prevent review agent hallucination loops#77

Merged
derekmisler merged 1 commit intodocker:mainfrom
derekmisler:fix-review-agent-hallucination-loop
Mar 10, 2026
Merged

Add file reading guardrails to prevent review agent hallucination loops#77
derekmisler merged 1 commit intodocker:mainfrom
derekmisler:fix-review-agent-hallucination-loop

Conversation

@derekmisler
Copy link
Copy Markdown
Contributor

@derekmisler derekmisler commented Mar 6, 2026

Closes: https://github.com/docker/gordon/issues/190

Summary

  • Adds explicit file reading guardrails to the root review agent prompt to prevent it from entering infinite loops of guessing non-existent file paths
  • Removes directory_tree from root agent toolset to discourage deep repo exploration

Problem

The root review agent was entering degenerate loops of hallucinated read_file calls, burning through the entire 40-minute timeout without ever posting a review. In this observed failure, the agent made 500+ consecutive read_file calls to paths that don't exist, including completely unrelated concepts like gordon/raft.md, gordon/byzantine-fault-tolerance.md, gordon/six-sigma.md, gordon/kaizen.md, etc.

The PR being reviewed was a simple 349-line rebrand diff. The agent read the diff successfully, then started speculatively exploring the repo and never stopped.

Changes

New "File Reading Guardrails" section in root agent instructions:

  1. Only read directly relevant files — diff, AGENTS.md/CLAUDE.md, and files referenced in the diff
  2. Never guess file paths — use list_directory first
  3. Circuit breaker — stop after 3 consecutive "not found" reads
  4. Cap total reads — max 10 files for root agent (drafter/verifier handle deeper analysis)
  5. Never enumerate topics as paths — prevents the hallucinated concept-name loop

Also removes directory_tree from root agent toolset since the drafter and verifier handle file-level analysis.

Test plan

  • Re-run review on the same PR that triggered the failure
  • Verify the agent completes within a reasonable time and posts a review
  • Run on a large diff (>1500 lines) to confirm chunked review still works
  • Run existing evals to check for regressions

@derekmisler derekmisler force-pushed the fix-review-agent-hallucination-loop branch from b1d698d to 0ad0342 Compare March 6, 2026 22:22
@derekmisler derekmisler marked this pull request as ready for review March 6, 2026 22:22
@derekmisler derekmisler requested a review from a team as a code owner March 6, 2026 22:22
Copy link
Copy Markdown
Contributor

@docker-agent docker-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Code Review: APPROVED

Summary

This PR adds file reading guardrails to prevent the review agent from entering hallucination loops. The changes include:

  • New "CRITICAL: File Reading Guardrails" section with 5 rules to limit file exploration
  • Removal of directory_tree from the root agent's filesystem toolset

Analysis: All potential issues identified during review were related to the enforceability of prompt-based rules. However, these are standard and appropriate mechanisms for AI agent configuration in YAML-based agent frameworks. The guardrails use clear, imperative language with high salience ("MUST NOT", "CRITICAL") to guide agent behavior, which is the expected design pattern. The PR also includes technical enforcement by removing directory_tree from the toolset, demonstrating understanding of both prompt-based and technical controls.

Verdict: No bugs found in the changed code. The implementation is sound and follows best practices for AI agent configuration. ✅

Test Coverage

As noted in the PR description, please verify:

  • Re-run on the original failing PR
  • Test with large diffs (>1500 lines)
  • Run existing evals for regression checks

@derekmisler derekmisler self-assigned this Mar 9, 2026
The root review agent was entering infinite loops of guessing file paths
that don't exist, burning through the entire 40-minute timeout without
ever posting a review. In one observed case, the agent made 500+
consecutive read_file calls to hallucinated paths like "gordon/raft.md",
"gordon/six-sigma.md", etc.

This adds explicit guardrails to the root agent instructions:
- Circuit breaker: stop after 3 consecutive "not found" reads
- Cap total file reads at 10 for the root agent
- Prohibit guessing file paths (use list_directory first)
- Prohibit enumerating general concepts as file paths
- Remove directory_tree from root agent toolset to discourage deep
  exploration (drafter/verifier handle file analysis)
@derekmisler derekmisler force-pushed the fix-review-agent-hallucination-loop branch from 1814f19 to 3f02969 Compare March 9, 2026 20:07
@derekmisler derekmisler merged commit 942f87d into docker:main Mar 10, 2026
7 checks passed
Copy link
Copy Markdown

@krissetto krissetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of "do this, dont do that", in general in this bot..hope it keeps behaving, we might need to change prompting strat eventually


The root agent MUST NOT exhaustively explore the repository. Follow these rules strictly:

1. **Only read files that are directly relevant**: the diff, AGENTS.md/CLAUDE.md, and files
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why agents/claude.md?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just following the trends I've noticed. These are the files people tend to add the most, but that prompt-files flag appends the contents of the prompt files to every request, which seems like a waste of tokens. So I'm trying out just telling the agents to read these files once.

derekmisler added a commit to derekmisler/cagent-action that referenced this pull request Mar 11, 2026
PR docker#77 added guardrails to the root agent but the drafter and verifier
had none, causing them to enter degenerate read_file loops that exhaust
the 40-minute timeout. Evidence: docker/sandboxes run 22930756843 showed
the verifier making 868 read_file calls (432 not-found) brute-forcing
/tmp/*.json path permutations.

Changes:
- Add circuit breaker instructions to drafter (20-file cap) and verifier
  (10-file cap) with 3-consecutive-not-found stop rule
- Add list_directory to both sub-agent toolsets so they can discover
  files instead of guessing paths
- Root agent now passes directory listings to sub-agents in delegation
  messages to eliminate path guessing upfront
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants