Add explicit rules to prevent modifying existing models in AGENTS.md by juanmichelini · Pull Request #2288 · OpenHands/software-agent-sdk

juanmichelini · 2026-03-03T21:00:03Z

Summary

Adds a prominent "Critical Rules" section to AGENTS.md that explicitly prohibits agents from modifying existing model entries when adding new models.

Problem

When using AGENTS.md to add new models, agents were making unnecessary changes:

✅ New model added correctly
❌ Existing models reformatted
❌ Quotes, spacing, or order changed
❌ "Improvements" to working configurations

This creates noisy PRs and increases risk of breaking production models.

Solution

Added "Critical Rules" section at the very top (before any instructions) that explicitly states:

Never modify existing model entries - they are production code
Never reformat existing code - preserve exact formatting
Never reorder models - maintain dictionary order
Never "improve" existing entries - if it's there, it works
Only add your new model - one entry, one test, minimal changes

Changes

Added 16 lines at the top of .github/run-eval/AGENTS.md
Placed before "Files to Modify" section for maximum visibility
Clear, explicit prohibitions against common agent behaviors
Emphasis on "ONLY ADD NEW CONTENT"

Expected Impact

✅ Cleaner PRs with only necessary changes
✅ Reduced review burden (only review new model)
✅ Lower risk of breaking production models
✅ Faster merge cycle
✅ Agents understand they should only add, not modify

Testing

Rules added at prominent location (top of document)
Clear, explicit language
Covers observed problem behaviors
Provides positive guidance ("When adding a model")

Fixes #2287

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:acc8e6b-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-acc8e6b-python \
  ghcr.io/openhands/agent-server:acc8e6b-python

All tags pushed for this build

ghcr.io/openhands/agent-server:acc8e6b-golang-amd64
ghcr.io/openhands/agent-server:acc8e6b-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:acc8e6b-golang-arm64
ghcr.io/openhands/agent-server:acc8e6b-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:acc8e6b-java-amd64
ghcr.io/openhands/agent-server:acc8e6b-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:acc8e6b-java-arm64
ghcr.io/openhands/agent-server:acc8e6b-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:acc8e6b-python-amd64
ghcr.io/openhands/agent-server:acc8e6b-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:acc8e6b-python-arm64
ghcr.io/openhands/agent-server:acc8e6b-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:acc8e6b-golang
ghcr.io/openhands/agent-server:acc8e6b-java
ghcr.io/openhands/agent-server:acc8e6b-python

About Multi-Architecture Support

Each variant tag (e.g., acc8e6b-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., acc8e6b-python-amd64) are also available if needed

Problem: Agents using AGENTS.md were making unnecessary changes to existing model entries (reformatting, reordering, 'improving' existing configs). Solution: Added prominent 'Critical Rules' section at the top that explicitly prohibits: 1. Modifying existing model entries 2. Reformatting existing code 3. Reordering models 4. 'Improving' or 'fixing' existing entries 5. Adding anything beyond the new model entry This ensures agents only add their new model and leave all existing content untouched, resulting in cleaner PRs and reduced risk. Fixes #2287 Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-03-03T21:00:30Z

API breakage checks (Griffe)

Result: Failed

Log excerpt (first 1000 characters)


============================================================
Checking openhands-sdk (openhands.sdk)
============================================================
Comparing openhands-sdk 1.11.5 against 1.11.4
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): load_public_skills
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): temperature
::warning file=openhands-sdk/openhands/sdk/llm/llm.py,line=196,title=LLM.top_p::Attribute value was changed: `Field(default=1.0, ge=0, le=1)` -> `Field(default=None, ge=0, le=1, description='Nucleus sampling parameter. Defaults to None (uses provider default). Set to a value between 0 and 1 to control diversity of outputs.')`
::error title=SemVer::Breaking changes detected (1); require at least minor version bump from 1.11.x, but new is 1.11.5

============================================================
Checking openhands-workspace (openhands.workspace)
============================

Action log

github-actions · 2026-03-03T21:00:43Z

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

all-hands-bot

🟢 Good taste - Pragmatic solution to a real problem.

This solves the actual issue of agents making noisy changes when adding models. Simple, direct documentation that tells them to stop doing that. No over-engineering, no theoretical nonsense—just explicit rules addressing observed behavior.

Verdict: ✅ Worth merging

Key insight: Sometimes the simplest solution is just telling people (or agents) exactly what not to do.

Problem: Agent in PR #2286 made unnecessary changes beyond adding the new model: - Changed existing test assertions (claude-sonnet -> gpt-4) - Replaced real model tests with mocked tests - "Fixed" test_model to check_model import - Claimed to fix "incorrect assertions" that were actually correct - Modified test approach fundamentally Solution: Added specific prohibitions and examples to prevent these issues: 1. Expanded "What NOT to Do" with 8 explicit rules including: - Never modify existing tests or test assertions - Never replace real model tests with mocked tests - Never fix import names that aren't broken - Never change test assertions even if they "look wrong" 2. Added "What These Rules Prevent" section with real examples from PRs 3. Enhanced test section with: - Clear warnings about not modifying existing test functions - Explicit "What NOT to do in tests" list - Template showing exactly what to add and nothing more 4. Added PR description guidance: - What NOT to claim (no false "fixes") - Examples of inappropriate claims vs appropriate ones - Only describe what was actually added These changes address reviewer concerns that agents were making unnecessary modifications under the guise of "fixes" when code was already working. Related: #2286 (review feedback), #2287 Co-authored-by: openhands <openhands@all-hands.dev>

juanmichelini · 2026-03-03T21:02:24Z

Additional Improvements Based on PR #2286

Analyzed PR #2286 where the agent made unnecessary changes beyond adding the new model. Added specific rules to prevent these issues.

Issues Found in PR #2286

The agent:

✅ Added gpt-5.3-codex correctly
❌ Changed existing test assertions ("claude-sonnet-4-5-20250929" → "gpt-4")
❌ Replaced real model tests with mocked/custom model tests
❌ "Fixed" test_model to check_model import (wasn't broken)
❌ Claimed to fix "incorrect assertions" without explaining why they were incorrect

Reviewer's concerns:

Why change working tests?
Why weaken tests by replacing real data with mocked data?
"Fixes" need explanation - if tests pass, don't change them

New Rules Added

1. Expanded "What NOT to Do" section (8 specific rules):

1. Never modify existing model entries
2. Never modify existing tests - especially assertions, mocks, expected values  
3. Never reformat existing code
4. Never reorder models or imports
5. Never "fix" existing code - if tests pass, it works
6. Never change test assertions - even if they "look wrong"
7. Never replace real model tests with mocked tests
8. Never fix import names - if test_model exists, don't change it

2. Added real violation examples:

- Changing assert result[0]["id"] == "claude-sonnet-4-5-20250929" to "gpt-4" ❌
- Replacing real model config tests with mocked/custom model tests ❌
- "Fixing" from resolve_model_config import test_model to check_model ❌
- Adding "Fixed incorrect assertions" without explaining what was incorrect ❌

3. Enhanced test section:

Clear warning: "Do not modify any existing test functions"
Explicit list of what NOT to do in tests
Test template showing exactly what to add
Comment: "Only add assertions for parameters YOU added"

4. Added PR description guidance:

What NOT to claim:

❌ "Fixed test_model import issue" (if tests pass, no issue)
❌ "Fixed incorrect assertions in existing tests" (they were correct)
❌ "Improved test coverage" (unless you actually added new cases)
❌ "Cleaned up code" (you shouldn't be cleaning anything)

What TO describe:

✅ "Added gpt-5.3-codex model configuration"
✅ "Added test for gpt-5.3-codex"
✅ "Added gpt-5.3-codex to REASONING_EFFORT_MODELS"

Impact

These additions specifically target the behaviors seen in PR #2286:

Agents now have explicit examples of what NOT to do (with ❌ markers)
Real PR violations are shown as cautionary examples
Test section has strong warnings against modifying existing tests
PR description guidance prevents claiming false "fixes"

This should significantly reduce unnecessary changes in future model addition PRs.

…penHands#2288) Cherry-pick from upstream 8dc35fd

all-hands-bot approved these changes Mar 3, 2026

View reviewed changes

juanmichelini enabled auto-merge (squash) March 3, 2026 21:02

juanmichelini merged commit 8dc35fd into main Mar 3, 2026
21 checks passed

juanmichelini deleted the improve-agents-md-no-modify-existing branch March 3, 2026 21:05

zparnold added a commit to zparnold/software-agent-sdk that referenced this pull request Mar 5, 2026

Add explicit rules to prevent modifying existing models in AGENTS.md (O…

4659289

…penHands#2288) Cherry-pick from upstream 8dc35fd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add explicit rules to prevent modifying existing models in AGENTS.md#2288

Add explicit rules to prevent modifying existing models in AGENTS.md#2288
juanmichelini merged 2 commits intomainfrom
improve-agents-md-no-modify-existing

juanmichelini commented Mar 3, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

juanmichelini commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

juanmichelini commented Mar 3, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Changes

Expected Impact

Testing

Uh oh!

github-actions bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API breakage checks (Griffe)

Uh oh!

github-actions bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent server REST API breakage checks (OpenAPI)

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

juanmichelini commented Mar 3, 2026

Additional Improvements Based on PR #2286

Issues Found in PR #2286

New Rules Added

Impact

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

juanmichelini commented Mar 3, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Mar 3, 2026 •

edited

Loading

github-actions bot commented Mar 3, 2026 •

edited

Loading