Add explicit rules to prevent modifying existing models in AGENTS.md#2288
Add explicit rules to prevent modifying existing models in AGENTS.md#2288juanmichelini merged 2 commits intomainfrom
Conversation
Problem: Agents using AGENTS.md were making unnecessary changes to existing model entries (reformatting, reordering, 'improving' existing configs). Solution: Added prominent 'Critical Rules' section at the top that explicitly prohibits: 1. Modifying existing model entries 2. Reformatting existing code 3. Reordering models 4. 'Improving' or 'fixing' existing entries 5. Adding anything beyond the new model entry This ensures agents only add their new model and leave all existing content untouched, resulting in cleaner PRs and reduced risk. Fixes #2287 Co-authored-by: openhands <openhands@all-hands.dev>
API breakage checks (Griffe)Result: Failed Log excerpt (first 1000 characters) |
Agent server REST API breakage checks (OpenAPI)Result: Passed |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Pragmatic solution to a real problem.
This solves the actual issue of agents making noisy changes when adding models. Simple, direct documentation that tells them to stop doing that. No over-engineering, no theoretical nonsense—just explicit rules addressing observed behavior.
Verdict: ✅ Worth merging
Key insight: Sometimes the simplest solution is just telling people (or agents) exactly what not to do.
Problem: Agent in PR #2286 made unnecessary changes beyond adding the new model: - Changed existing test assertions (claude-sonnet -> gpt-4) - Replaced real model tests with mocked tests - "Fixed" test_model to check_model import - Claimed to fix "incorrect assertions" that were actually correct - Modified test approach fundamentally Solution: Added specific prohibitions and examples to prevent these issues: 1. Expanded "What NOT to Do" with 8 explicit rules including: - Never modify existing tests or test assertions - Never replace real model tests with mocked tests - Never fix import names that aren't broken - Never change test assertions even if they "look wrong" 2. Added "What These Rules Prevent" section with real examples from PRs 3. Enhanced test section with: - Clear warnings about not modifying existing test functions - Explicit "What NOT to do in tests" list - Template showing exactly what to add and nothing more 4. Added PR description guidance: - What NOT to claim (no false "fixes") - Examples of inappropriate claims vs appropriate ones - Only describe what was actually added These changes address reviewer concerns that agents were making unnecessary modifications under the guise of "fixes" when code was already working. Related: #2286 (review feedback), #2287 Co-authored-by: openhands <openhands@all-hands.dev>
Additional Improvements Based on PR #2286Analyzed PR #2286 where the agent made unnecessary changes beyond adding the new model. Added specific rules to prevent these issues. Issues Found in PR #2286The agent:
Reviewer's concerns:
New Rules Added1. Expanded "What NOT to Do" section (8 specific rules): 1. Never modify existing model entries
2. Never modify existing tests - especially assertions, mocks, expected values
3. Never reformat existing code
4. Never reorder models or imports
5. Never "fix" existing code - if tests pass, it works
6. Never change test assertions - even if they "look wrong"
7. Never replace real model tests with mocked tests
8. Never fix import names - if test_model exists, don't change it2. Added real violation examples: - Changing assert result[0]["id"] == "claude-sonnet-4-5-20250929" to "gpt-4" ❌
- Replacing real model config tests with mocked/custom model tests ❌
- "Fixing" from resolve_model_config import test_model to check_model ❌
- Adding "Fixed incorrect assertions" without explaining what was incorrect ❌3. Enhanced test section:
4. Added PR description guidance: What NOT to claim:
What TO describe:
ImpactThese additions specifically target the behaviors seen in PR #2286:
This should significantly reduce unnecessary changes in future model addition PRs. |
…penHands#2288) Cherry-pick from upstream 8dc35fd
Summary
Adds a prominent "Critical Rules" section to AGENTS.md that explicitly prohibits agents from modifying existing model entries when adding new models.
Problem
When using AGENTS.md to add new models, agents were making unnecessary changes:
This creates noisy PRs and increases risk of breaking production models.
Solution
Added "Critical Rules" section at the very top (before any instructions) that explicitly states:
Changes
.github/run-eval/AGENTS.mdExpected Impact
Testing
Fixes #2287
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:acc8e6b-pythonRun
All tags pushed for this build
About Multi-Architecture Support
acc8e6b-python) is a multi-arch manifest supporting both amd64 and arm64acc8e6b-python-amd64) are also available if needed