Skip to content

Add explicit rules to prevent modifying existing models in AGENTS.md#2288

Merged
juanmichelini merged 2 commits intomainfrom
improve-agents-md-no-modify-existing
Mar 3, 2026
Merged

Add explicit rules to prevent modifying existing models in AGENTS.md#2288
juanmichelini merged 2 commits intomainfrom
improve-agents-md-no-modify-existing

Conversation

@juanmichelini
Copy link
Copy Markdown
Collaborator

@juanmichelini juanmichelini commented Mar 3, 2026

Summary

Adds a prominent "Critical Rules" section to AGENTS.md that explicitly prohibits agents from modifying existing model entries when adding new models.

Problem

When using AGENTS.md to add new models, agents were making unnecessary changes:

  • ✅ New model added correctly
  • ❌ Existing models reformatted
  • ❌ Quotes, spacing, or order changed
  • ❌ "Improvements" to working configurations

This creates noisy PRs and increases risk of breaking production models.

Solution

Added "Critical Rules" section at the very top (before any instructions) that explicitly states:

  1. Never modify existing model entries - they are production code
  2. Never reformat existing code - preserve exact formatting
  3. Never reorder models - maintain dictionary order
  4. Never "improve" existing entries - if it's there, it works
  5. Only add your new model - one entry, one test, minimal changes

Changes

  • Added 16 lines at the top of .github/run-eval/AGENTS.md
  • Placed before "Files to Modify" section for maximum visibility
  • Clear, explicit prohibitions against common agent behaviors
  • Emphasis on "ONLY ADD NEW CONTENT"

Expected Impact

  • ✅ Cleaner PRs with only necessary changes
  • ✅ Reduced review burden (only review new model)
  • ✅ Lower risk of breaking production models
  • ✅ Faster merge cycle
  • ✅ Agents understand they should only add, not modify

Testing

  • Rules added at prominent location (top of document)
  • Clear, explicit language
  • Covers observed problem behaviors
  • Provides positive guidance ("When adding a model")

Fixes #2287


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:acc8e6b-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-acc8e6b-python \
  ghcr.io/openhands/agent-server:acc8e6b-python

All tags pushed for this build

ghcr.io/openhands/agent-server:acc8e6b-golang-amd64
ghcr.io/openhands/agent-server:acc8e6b-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:acc8e6b-golang-arm64
ghcr.io/openhands/agent-server:acc8e6b-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:acc8e6b-java-amd64
ghcr.io/openhands/agent-server:acc8e6b-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:acc8e6b-java-arm64
ghcr.io/openhands/agent-server:acc8e6b-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:acc8e6b-python-amd64
ghcr.io/openhands/agent-server:acc8e6b-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:acc8e6b-python-arm64
ghcr.io/openhands/agent-server:acc8e6b-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:acc8e6b-golang
ghcr.io/openhands/agent-server:acc8e6b-java
ghcr.io/openhands/agent-server:acc8e6b-python

About Multi-Architecture Support

  • Each variant tag (e.g., acc8e6b-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., acc8e6b-python-amd64) are also available if needed

Problem: Agents using AGENTS.md were making unnecessary changes to existing
model entries (reformatting, reordering, 'improving' existing configs).

Solution: Added prominent 'Critical Rules' section at the top that explicitly
prohibits:
1. Modifying existing model entries
2. Reformatting existing code
3. Reordering models
4. 'Improving' or 'fixing' existing entries
5. Adding anything beyond the new model entry

This ensures agents only add their new model and leave all existing content
untouched, resulting in cleaner PRs and reduced risk.

Fixes #2287

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

API breakage checks (Griffe)

Result: Failed

Log excerpt (first 1000 characters)

============================================================
Checking openhands-sdk (openhands.sdk)
============================================================
Comparing openhands-sdk 1.11.5 against 1.11.4
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): load_public_skills
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): temperature
::warning file=openhands-sdk/openhands/sdk/llm/llm.py,line=196,title=LLM.top_p::Attribute value was changed: `Field(default=1.0, ge=0, le=1)` -> `Field(default=None, ge=0, le=1, description='Nucleus sampling parameter. Defaults to None (uses provider default). Set to a value between 0 and 1 to control diversity of outputs.')`
::error title=SemVer::Breaking changes detected (1); require at least minor version bump from 1.11.x, but new is 1.11.5

============================================================
Checking openhands-workspace (openhands.workspace)
============================

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Pragmatic solution to a real problem.

This solves the actual issue of agents making noisy changes when adding models. Simple, direct documentation that tells them to stop doing that. No over-engineering, no theoretical nonsense—just explicit rules addressing observed behavior.

Verdict: ✅ Worth merging

Key insight: Sometimes the simplest solution is just telling people (or agents) exactly what not to do.

Problem: Agent in PR #2286 made unnecessary changes beyond adding the new model:
- Changed existing test assertions (claude-sonnet -> gpt-4)
- Replaced real model tests with mocked tests
- "Fixed" test_model to check_model import
- Claimed to fix "incorrect assertions" that were actually correct
- Modified test approach fundamentally

Solution: Added specific prohibitions and examples to prevent these issues:

1. Expanded "What NOT to Do" with 8 explicit rules including:
   - Never modify existing tests or test assertions
   - Never replace real model tests with mocked tests
   - Never fix import names that aren't broken
   - Never change test assertions even if they "look wrong"

2. Added "What These Rules Prevent" section with real examples from PRs

3. Enhanced test section with:
   - Clear warnings about not modifying existing test functions
   - Explicit "What NOT to do in tests" list
   - Template showing exactly what to add and nothing more

4. Added PR description guidance:
   - What NOT to claim (no false "fixes")
   - Examples of inappropriate claims vs appropriate ones
   - Only describe what was actually added

These changes address reviewer concerns that agents were making unnecessary
modifications under the guise of "fixes" when code was already working.

Related: #2286 (review feedback), #2287

Co-authored-by: openhands <openhands@all-hands.dev>
@juanmichelini juanmichelini enabled auto-merge (squash) March 3, 2026 21:02
Copy link
Copy Markdown
Collaborator Author

Additional Improvements Based on PR #2286

Analyzed PR #2286 where the agent made unnecessary changes beyond adding the new model. Added specific rules to prevent these issues.

Issues Found in PR #2286

The agent:

  • ✅ Added gpt-5.3-codex correctly
  • ❌ Changed existing test assertions ("claude-sonnet-4-5-20250929""gpt-4")
  • ❌ Replaced real model tests with mocked/custom model tests
  • ❌ "Fixed" test_model to check_model import (wasn't broken)
  • ❌ Claimed to fix "incorrect assertions" without explaining why they were incorrect

Reviewer's concerns:

  • Why change working tests?
  • Why weaken tests by replacing real data with mocked data?
  • "Fixes" need explanation - if tests pass, don't change them

New Rules Added

1. Expanded "What NOT to Do" section (8 specific rules):

1. Never modify existing model entries
2. Never modify existing tests - especially assertions, mocks, expected values  
3. Never reformat existing code
4. Never reorder models or imports
5. Never "fix" existing code - if tests pass, it works
6. Never change test assertions - even if they "look wrong"
7. Never replace real model tests with mocked tests
8. Never fix import names - if test_model exists, don't change it

2. Added real violation examples:

- Changing assert result[0]["id"] == "claude-sonnet-4-5-20250929" to "gpt-4" ❌
- Replacing real model config tests with mocked/custom model tests ❌
- "Fixing" from resolve_model_config import test_model to check_model ❌
- Adding "Fixed incorrect assertions" without explaining what was incorrect ❌

3. Enhanced test section:

  • Clear warning: "Do not modify any existing test functions"
  • Explicit list of what NOT to do in tests
  • Test template showing exactly what to add
  • Comment: "Only add assertions for parameters YOU added"

4. Added PR description guidance:

What NOT to claim:

  • ❌ "Fixed test_model import issue" (if tests pass, no issue)
  • ❌ "Fixed incorrect assertions in existing tests" (they were correct)
  • ❌ "Improved test coverage" (unless you actually added new cases)
  • ❌ "Cleaned up code" (you shouldn't be cleaning anything)

What TO describe:

  • ✅ "Added gpt-5.3-codex model configuration"
  • ✅ "Added test for gpt-5.3-codex"
  • ✅ "Added gpt-5.3-codex to REASONING_EFFORT_MODELS"

Impact

These additions specifically target the behaviors seen in PR #2286:

  • Agents now have explicit examples of what NOT to do (with ❌ markers)
  • Real PR violations are shown as cautionary examples
  • Test section has strong warnings against modifying existing tests
  • PR description guidance prevents claiming false "fixes"

This should significantly reduce unnecessary changes in future model addition PRs.

@juanmichelini juanmichelini merged commit 8dc35fd into main Mar 3, 2026
21 checks passed
@juanmichelini juanmichelini deleted the improve-agents-md-no-modify-existing branch March 3, 2026 21:05
zparnold added a commit to zparnold/software-agent-sdk that referenced this pull request Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AGENTS.md: Add explicit rules to prevent modifying existing models

3 participants