Add file reading guardrails to prevent review agent hallucination loops by derekmisler · Pull Request #77 · docker/cagent-action

derekmisler · 2026-03-06T22:20:30Z

Closes: https://github.com/docker/gordon/issues/190

Summary

Adds explicit file reading guardrails to the root review agent prompt to prevent it from entering infinite loops of guessing non-existent file paths
Removes directory_tree from root agent toolset to discourage deep repo exploration

Problem

The root review agent was entering degenerate loops of hallucinated read_file calls, burning through the entire 40-minute timeout without ever posting a review. In this observed failure, the agent made 500+ consecutive read_file calls to paths that don't exist, including completely unrelated concepts like gordon/raft.md, gordon/byzantine-fault-tolerance.md, gordon/six-sigma.md, gordon/kaizen.md, etc.

The PR being reviewed was a simple 349-line rebrand diff. The agent read the diff successfully, then started speculatively exploring the repo and never stopped.

Changes

New "File Reading Guardrails" section in root agent instructions:

Only read directly relevant files — diff, AGENTS.md/CLAUDE.md, and files referenced in the diff
Never guess file paths — use list_directory first
Circuit breaker — stop after 3 consecutive "not found" reads
Cap total reads — max 10 files for root agent (drafter/verifier handle deeper analysis)
Never enumerate topics as paths — prevents the hallucinated concept-name loop

Also removes directory_tree from root agent toolset since the drafter and verifier handle file-level analysis.

Test plan

Re-run review on the same PR that triggered the failure
Verify the agent completes within a reasonable time and posts a review
Run on a large diff (>1500 lines) to confirm chunked review still works
Run existing evals to check for regressions

docker-agent

🟢 Code Review: APPROVED

Summary

This PR adds file reading guardrails to prevent the review agent from entering hallucination loops. The changes include:

New "CRITICAL: File Reading Guardrails" section with 5 rules to limit file exploration
Removal of directory_tree from the root agent's filesystem toolset

Analysis: All potential issues identified during review were related to the enforceability of prompt-based rules. However, these are standard and appropriate mechanisms for AI agent configuration in YAML-based agent frameworks. The guardrails use clear, imperative language with high salience ("MUST NOT", "CRITICAL") to guide agent behavior, which is the expected design pattern. The PR also includes technical enforcement by removing directory_tree from the toolset, demonstrating understanding of both prompt-based and technical controls.

Verdict: No bugs found in the changed code. The implementation is sound and follows best practices for AI agent configuration. ✅

Test Coverage

As noted in the PR description, please verify:

Re-run on the original failing PR
Test with large diffs (>1500 lines)
Run existing evals for regression checks

The root review agent was entering infinite loops of guessing file paths that don't exist, burning through the entire 40-minute timeout without ever posting a review. In one observed case, the agent made 500+ consecutive read_file calls to hallucinated paths like "gordon/raft.md", "gordon/six-sigma.md", etc. This adds explicit guardrails to the root agent instructions: - Circuit breaker: stop after 3 consecutive "not found" reads - Cap total file reads at 10 for the root agent - Prohibit guessing file paths (use list_directory first) - Prohibit enumerating general concepts as file paths - Remove directory_tree from root agent toolset to discourage deep exploration (drafter/verifier handle file analysis)

krissetto

lots of "do this, dont do that", in general in this bot..hope it keeps behaving, we might need to change prompting strat eventually

krissetto · 2026-03-10T13:00:02Z

review-pr/agents/pr-review.yaml

+
+      The root agent MUST NOT exhaustively explore the repository. Follow these rules strictly:
+
+      1. **Only read files that are directly relevant**: the diff, AGENTS.md/CLAUDE.md, and files


why agents/claude.md?

Just following the trends I've noticed. These are the files people tend to add the most, but that prompt-files flag appends the contents of the prompt files to every request, which seems like a waste of tokens. So I'm trying out just telling the agents to read these files once.

PR docker#77 added guardrails to the root agent but the drafter and verifier had none, causing them to enter degenerate read_file loops that exhaust the 40-minute timeout. Evidence: docker/sandboxes run 22930756843 showed the verifier making 868 read_file calls (432 not-found) brute-forcing /tmp/*.json path permutations. Changes: - Add circuit breaker instructions to drafter (20-file cap) and verifier (10-file cap) with 3-consecutive-not-found stop rule - Add list_directory to both sub-agent toolsets so they can discover files instead of guessing paths - Root agent now passes directory listings to sub-agents in delegation messages to eliminate path guessing upfront

derekmisler force-pushed the fix-review-agent-hallucination-loop branch from b1d698d to 0ad0342 Compare March 6, 2026 22:22

derekmisler marked this pull request as ready for review March 6, 2026 22:22

derekmisler requested a review from a team as a code owner March 6, 2026 22:22

docker-agent bot reviewed Mar 6, 2026

View reviewed changes

derekmisler self-assigned this Mar 9, 2026

derekmisler force-pushed the fix-review-agent-hallucination-loop branch from 1814f19 to 3f02969 Compare March 9, 2026 20:07

aheritier approved these changes Mar 10, 2026

View reviewed changes

derekmisler merged commit 942f87d into docker:main Mar 10, 2026
7 checks passed

krissetto approved these changes Mar 10, 2026

View reviewed changes

derekmisler mentioned this pull request Mar 11, 2026

fix: add file reading guardrails to drafter and verifier sub-agents #84

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add file reading guardrails to prevent review agent hallucination loops#77

Add file reading guardrails to prevent review agent hallucination loops#77
derekmisler merged 1 commit intodocker:mainfrom
derekmisler:fix-review-agent-hallucination-loop

derekmisler commented Mar 6, 2026 •

edited

Loading

Uh oh!

docker-agent bot left a comment

Uh oh!

Uh oh!

krissetto left a comment

Uh oh!

krissetto Mar 10, 2026

Uh oh!

derekmisler Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		The root agent MUST NOT exhaustively explore the repository. Follow these rules strictly:

		1. Only read files that are directly relevant: the diff, AGENTS.md/CLAUDE.md, and files

Conversation

derekmisler commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

Test plan

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

🟢 Code Review: APPROVED

Summary

Test Coverage

Uh oh!

Uh oh!

krissetto left a comment

Choose a reason for hiding this comment

Uh oh!

krissetto Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

derekmisler Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

derekmisler commented Mar 6, 2026 •

edited

Loading