Add file reading guardrails to prevent review agent hallucination loops#77
Conversation
b1d698d to
0ad0342
Compare
There was a problem hiding this comment.
🟢 Code Review: APPROVED
Summary
This PR adds file reading guardrails to prevent the review agent from entering hallucination loops. The changes include:
- New "CRITICAL: File Reading Guardrails" section with 5 rules to limit file exploration
- Removal of
directory_treefrom the root agent's filesystem toolset
Analysis: All potential issues identified during review were related to the enforceability of prompt-based rules. However, these are standard and appropriate mechanisms for AI agent configuration in YAML-based agent frameworks. The guardrails use clear, imperative language with high salience ("MUST NOT", "CRITICAL") to guide agent behavior, which is the expected design pattern. The PR also includes technical enforcement by removing directory_tree from the toolset, demonstrating understanding of both prompt-based and technical controls.
Verdict: No bugs found in the changed code. The implementation is sound and follows best practices for AI agent configuration. ✅
Test Coverage
As noted in the PR description, please verify:
- Re-run on the original failing PR
- Test with large diffs (>1500 lines)
- Run existing evals for regression checks
The root review agent was entering infinite loops of guessing file paths that don't exist, burning through the entire 40-minute timeout without ever posting a review. In one observed case, the agent made 500+ consecutive read_file calls to hallucinated paths like "gordon/raft.md", "gordon/six-sigma.md", etc. This adds explicit guardrails to the root agent instructions: - Circuit breaker: stop after 3 consecutive "not found" reads - Cap total file reads at 10 for the root agent - Prohibit guessing file paths (use list_directory first) - Prohibit enumerating general concepts as file paths - Remove directory_tree from root agent toolset to discourage deep exploration (drafter/verifier handle file analysis)
1814f19 to
3f02969
Compare
krissetto
left a comment
There was a problem hiding this comment.
lots of "do this, dont do that", in general in this bot..hope it keeps behaving, we might need to change prompting strat eventually
|
|
||
| The root agent MUST NOT exhaustively explore the repository. Follow these rules strictly: | ||
|
|
||
| 1. **Only read files that are directly relevant**: the diff, AGENTS.md/CLAUDE.md, and files |
There was a problem hiding this comment.
Just following the trends I've noticed. These are the files people tend to add the most, but that prompt-files flag appends the contents of the prompt files to every request, which seems like a waste of tokens. So I'm trying out just telling the agents to read these files once.
PR docker#77 added guardrails to the root agent but the drafter and verifier had none, causing them to enter degenerate read_file loops that exhaust the 40-minute timeout. Evidence: docker/sandboxes run 22930756843 showed the verifier making 868 read_file calls (432 not-found) brute-forcing /tmp/*.json path permutations. Changes: - Add circuit breaker instructions to drafter (20-file cap) and verifier (10-file cap) with 3-consecutive-not-found stop rule - Add list_directory to both sub-agent toolsets so they can discover files instead of guessing paths - Root agent now passes directory listings to sub-agents in delegation messages to eliminate path guessing upfront
Closes: https://github.com/docker/gordon/issues/190
Summary
directory_treefrom root agent toolset to discourage deep repo explorationProblem
The root review agent was entering degenerate loops of hallucinated
read_filecalls, burning through the entire 40-minute timeout without ever posting a review. In this observed failure, the agent made 500+ consecutiveread_filecalls to paths that don't exist, including completely unrelated concepts likegordon/raft.md,gordon/byzantine-fault-tolerance.md,gordon/six-sigma.md,gordon/kaizen.md, etc.The PR being reviewed was a simple 349-line rebrand diff. The agent read the diff successfully, then started speculatively exploring the repo and never stopped.
Changes
New "File Reading Guardrails" section in root agent instructions:
list_directoryfirstAlso removes
directory_treefrom root agent toolset since the drafter and verifier handle file-level analysis.Test plan