-
Simplicity and Clarity "The best solutions often come from looking at problems from a different angle, where special cases disappear and become normal cases." • Prefer solutions that eliminate edge cases rather than adding conditional checks • Good design patterns emerge from experience and careful consideration • Simple, clear code is easier to maintain and debug
-
Backward Compatibility "Stability is a feature, not a constraint." • Changes should not break existing functionality • Consider the impact on users and existing integrations • Compatibility enables trust and adoption
-
Pragmatic Problem-Solving "Focus on solving real problems with practical solutions." • Address actual user needs rather than theoretical edge cases • Prefer proven, straightforward approaches over complex abstractions • Code should serve real-world requirements
-
Maintainable Architecture "Keep functions focused and code readable." • Functions should be short and have a single responsibility • Avoid deep nesting - consider refactoring when indentation gets complex • Clear naming and structure reduce cognitive load
• **Constructive**: Focus on helping improve code and solutions
• **Collaborative**: Work together as partners toward better outcomes
• **Clear**: Provide specific, actionable feedback
• **Respectful**: Maintain a supportive tone while being technically rigorous
When reviewing a requirement, confirm understanding by restating it clearly:
"Based on your description, I understand you need: [clear restatement of the requirement]. Is this correct?"
"Well-designed data structures often lead to simpler code." • What are the core data elements and their relationships? • How does data flow through the system? • Are there opportunities to simplify data handling?
"Let's look for ways to simplify this." • What's the essential functionality we need to implement? • Which parts of the current approach add unnecessary complexity? • How can we make this more straightforward?
"Let's make sure this doesn't break existing functionality." • What existing features might be affected? • How can we implement this change safely? • What migration path do users need?
"Let's focus on the real-world use case." • Does this solve an actual problem users face? • Is the complexity justified by the benefit? • What's the simplest approach that meets the need?
After analysis, provide feedback in this format:
Assessment: [Clear evaluation of the approach]
Key Observations:
- Data Structure: [insights about data organization]
- Complexity: [areas where we can simplify]
- Compatibility: [potential impact on existing code]
Suggested Approach: If the solution looks good:
- Start with the simplest data structure that works
- Eliminate special cases where possible
- Implement clearly and directly
- Ensure backward compatibility
If there are concerns: "I think we might be able to simplify this. The core issue seems to be [specific problem]. What if we tried [alternative approach]?"
When reviewing code, provide constructive feedback:
Overall Assessment: [Helpful evaluation]
Specific Suggestions:
- [Concrete improvements with explanations]
- [Alternative approaches to consider]
- [Ways to reduce complexity]
Next Steps: [Clear action items]
- Programmatic settings live in
openhands-sdk/openhands/sdk/settings/. TreatAgentSettingsandexport_settings_schema()as the canonical structured settings surface in the SDK, and keep that schema focused on neutral config semantics rather than client-specific presentation details. SettingsFieldSchemaintentionally does not export arequiredflag. If a consumer needs nullability semantics, inspect the underlying Python typing rather than inferring from SDK defaults.AgentSettings.toolsis part of the exported settings schema so the schema stays aligned with the settings payload that round-trips throughAgentSettingsand drivescreate_agent().AgentSettings.mcp_confignow uses FastMCP's typedMCPConfigat runtime. When serializing settings back to plain data (e.g.model_dump()orcreate_agent()), keep the output compact withexclude_none=True, exclude_defaults=Trueso callers still see the familiar.mcp.json-style dict shape.- Anthropic malformed tool-use/tool-result history errors (for example, missing or duplicated
tool_resultblocks) are intentionally mapped to a dedicatedLLMMalformedConversationHistoryErrorand caught separately inAgent.step(), so recovery can still use condensation while logs preserve that this was malformed history rather than a true context-window overflow. - AgentSkills progressive disclosure goes through
AgentContext.get_system_message_suffix()into<available_skills>, andopenhands.sdk.context.skills.to_prompt()truncates each prompt description to 1024 characters because the AgentSkills specification capsdescriptionat 1-1024 characters. - Workspace-wide uv resolver guardrails belong in the repository root
[tool.uv]table. Whenexclude-neweris configured there,uv lockpersists it into the rootuv.lock[options]section as both an absolute cutoff andexclude-newer-span, anduv sync --frozencontinues to use that locked workspace state. pr-review-by-openhandsdelegates toOpenHands/extensions/plugins/pr-review@main. Repo-specific reviewer instructions live in.agents/skills/custom-codereview-guide.md, and because task-trigger matching is substring-based, that/codereviewskill is also auto-injected for the workflow's/codereview-roastedprompt.- Auto-title generation should not re-read
ConversationState.eventsfrom a background task triggered by a freshly receivedMessageEvent; extract message text synchronously from the incoming event and then reuse shared title helpers (extract_message_text,generate_title_from_message) to avoid persistence-order races. - The agent-server REST API breakage workflow now does two comparisons when a PR base ref is available: it still checks the latest published baseline from PyPI for release compatibility, and it also diffs the current OpenAPI schema against the PR base branch to catch unreleased endpoint removals or incompatible contract edits already present on
main.
When reviewing or modifying code, read the closest AGENTS file for the package(s) containing the changed files. If a PR spans multiple packages, consult each relevant package-level AGENTS.md.
- SDK: openhands-sdk/openhands/sdk/AGENTS.md
- Subagents: openhands-sdk/openhands/sdk/subagent/AGENTS.md
- Tools: openhands-tools/openhands/tools/AGENTS.md
- Workspace: openhands-workspace/openhands/workspace/AGENTS.md
- Agent server: openhands-agent-server/AGENTS.md
- Eval config: .github/run-eval/AGENTS.md
- For SDK Python API deprecation/removal policy, read openhands-sdk/openhands/sdk/AGENTS.md. Public API removals require deprecation before removal, and breaking SDK API changes require at least a MINOR SemVer bump.
- The SDK API breakage checker should treat metadata-only changes to
Pydantic
Field(...)declarations as non-breaking, including adding, removing, or editingdescription,title,examples,json_schema_extra, anddeprecatedkwargs. - For public REST APIs, read openhands-agent-server/AGENTS.md. REST contract breaks need a deprecation notice and a runway of 5 minor releases before removing the old contract or making an incompatible replacement mandatory.
<DEV_SETUP>
- Make sure you
make buildto configure the dependencies first - We use pre-commit hooks
.pre-commit-config.yamlthat includes:- type check through pyright
- linting and formatter with
uv ruff
- NEVER USE
mypy! - Do NOT commit ALL the file, just commit the relevant file you've changed!
- In every commit message, you should add "Co-authored-by: openhands openhands@all-hands.dev"
- You can run pytest with
uv run pytest
- If it is just code, you can modify it so it spans multiple lines.
- If it is a single-line string, you can break it into a multi-line string by doing "ABC" -> ("A"\n"B"\n"C")
- If it is a long multi-line string (e.g., docstring), you should just add type ignore AFTER the ending """. You should NEVER ADD IT INSIDE the docstring.
</DEV_SETUP>
<PR_ARTIFACTS>
When working on a PR that requires design documents, scripts meant for development-only, or other temporary artifacts that should NOT be merged to main, store them in a .pr/ directory at the repository root.
# Create the directory if it doesn't exist
mkdir -p .pr
# Add your PR-specific documents
.pr/
├── design.md # Design decisions and architecture notes
├── analysis.md # Investigation or debugging notes
└── notes.md # Any other PR-specific content- Notification: When
.pr/exists, a single comment is posted to the PR conversation alerting reviewers - Auto-cleanup: When the PR is approved, the
.pr/directory is automatically removed via commit - Fork PRs: Auto-cleanup cannot push to forks, so manual removal is required before merging
- Do NOT put anything in
.pr/that needs to be preserved - The
.pr/check passes (green ✅) during development - it only posts a notification, not a blocking error - For fork PRs: You must manually remove
.pr/before the PR can be merged
- Complex refactoring that benefits from written design rationale
- Debugging sessions where you want to document your investigation
- Feature implementations that need temporary planning docs
- Temporary script that are intended to show reviewers that the feature works
- Any analysis that helps reviewers understand the PR but isn't needed long-term </PR_ARTIFACTS>
<REVIEW_HANDLING>
- Critically evaluate each review comment before acting on it. Not all feedback is worth implementing:
- Does it fix a real bug or improve clarity significantly?
- Does it align with the project's engineering principles (simplicity, maintainability)?
- Is the suggested change proportional to the benefit, or does it add unnecessary complexity?
- It's acceptable to respectfully decline suggestions that add verbosity without clear benefit, over-engineer for hypothetical edge cases, or contradict the project's pragmatic approach.
- After addressing (or deciding not to address) inline review comments, mark the corresponding review threads as resolved.
- Before resolving a thread, leave a reply comment that either explains the reason for dismissing the feedback or references the specific commit (e.g., commit SHA) that addressed the issue.
- Prefer resolving threads only once fixes are pushed or a clear decision is documented.
- Use the GitHub GraphQL API to reply to and resolve review threads (see below).
The CI check Review Thread Gate/unresolved-review-threads will fail if there are unresolved review threads. To resolve threads programmatically:
- Get the thread IDs (replace
<OWNER>,<REPO>,<PR_NUMBER>):
gh api graphql -f query='
{
repository(owner: "<OWNER>", name: "<REPO>") {
pullRequest(number: <PR_NUMBER>) {
reviewThreads(first: 20) {
nodes {
id
isResolved
comments(first: 1) {
nodes { body }
}
}
}
}
}
}'- Reply to the thread explaining how the feedback was addressed:
gh api graphql -f query='
mutation {
addPullRequestReviewThreadReply(input: {
pullRequestReviewThreadId: "<THREAD_ID>"
body: "Fixed in <COMMIT_SHA>"
}) {
comment { id }
}
}'- Resolve the thread:
gh api graphql -f query='
mutation {
resolveReviewThread(input: {threadId: "<THREAD_ID>"}) {
thread { isResolved }
}
}'- Get the failed workflow run ID and rerun it:
# Find the run ID from the failed check URL, or use:
gh run list --repo <OWNER>/<REPO> --branch <BRANCH> --limit 5
# Rerun failed jobs
gh run rerun <RUN_ID> --repo <OWNER>/<REPO> --failed</REVIEW_HANDLING>
- Avoid hacky trick like `sys.path.insert` when resolving package dependency
- Use existing packages/libraries instead of implementing yourselves whenever possible.
- Avoid using # type: ignore. Treat it only as a last resort. In most cases, issues should be resolved by improving type annotations, adding assertions, or adjusting code/tests—rather than silencing the type checker.
- Please AVOID using # type: ignore[attr-defined] unless absolutely necessary. If the issue can be addressed by adding a few extra assert statements to verify types, prefer that approach instead!
- For issue like # type: ignore[call-arg]: if you discover that the argument doesn’t actually exist, do not try to mock it again in tests. Instead, simply remove it.
- Avoid doing in-line imports unless absolutely necessary (e.g., circular dependency).
- Avoid getattr/hasattr guards and instead enforce type correctness by relying on explicit type assertions and proper object usage, ensuring functions only receive the expected Pydantic models or typed inputs. Prefer type hints and validated models over runtime shape checks.
- Prefer accessing typed attributes directly. If necessary, convert inputs up front into a canonical shape; avoid purely hypothetical fallbacks.
- Use real newlines in commit messages; do not write literal "\n".
- AFTER you edit ONE file, you should run pre-commit hook on that file via `uv run pre-commit run --files [filepath]` to make sure you didn't break it.
- Don't write TOO MUCH test, you should write just enough to cover edge cases.
- Check how we perform tests in .github/workflows/tests.yml
- Put unit tests under the corresponding domain folder in `tests/` (e.g., `tests/sdk`, `tests/tools`, `tests/workspace`). For example, changes to `openhands-sdk/openhands/sdk/tool/tool.py` should be covered in `tests/sdk/tool/test_tool.py`.
- DON'T write TEST CLASSES unless absolutely necessary!
- If you find yourself duplicating logics in preparing mocks, loading data etc, these logic should be fixtures in conftest.py!
- Please test only the logic implemented in the current codebase. Do not test functionality (e.g., BaseModel.model_dumps()) that is not implemented in this repository.
- For changes to prompt templates, tool descriptions, or agent decision logic, add the `integration-test` label to trigger integration tests and verify no unexpected impact on benchmark performance.
Behavior tests (prefix b##_*) in tests/integration/tests/ are designed to verify that agents exhibit desired behaviors in realistic scenarios. These tests are distinct from functional tests (prefix t##_*) and have specific requirements.
Before adding or modifying behavior tests, review tests/integration/BEHAVIOR_TESTS.md for the latest workflow, expectations, and examples.
<AGENT_TMP_DIRECTORY>
When tools need to store observation files (e.g., browser session recordings, task tracker data), use .agent_tmp as the directory name for consistency.
The browser session recording tool saves recordings to .agent_tmp/observations/recording-{timestamp}/.
This convention ensures tool-generated observation files are stored in a predictable location that can be easily:
- Added to
.gitignore - Cleaned up after agent sessions
- Identified as agent-generated artifacts
Note: This is separate from persistence_dir which is used for conversation state persistence.
</AGENT_TMP_DIRECTORY>
<QUICK_COMMANDS>
- Set up the dev environment:
make build(runsuv sync --devand installs pre-commit; requires uv >= 0.8.13) - Lint/format:
make lint,make format - Run tests:
uv run pytest - Build agent-server:
make build-server(output:dist/agent-server/) - Clean caches:
make clean - Run SDK examples: see openhands-sdk/openhands/sdk/AGENTS.md.
- The example workflow runs
uv run pytest tests/examples/test_examples.py --run-examples; each successful example must print anEXAMPLE_COST: ...line to stdout (useEXAMPLE_COST: 0for non-LLM examples). - Conversation plugins passed via
plugins=[...]are lazy-loaded on the firstsend_message()orrun(), so example code should inspect plugin-added skills orresolved_pluginsonly after that first interaction. - Programmatic settings live in
openhands-sdk/openhands/sdk/settings/. Keep the exported schema focused on neutral config structure and semantics; downstream apps should own client-specific ordering, icons, widgets, and slash-command presentation. </QUICK_COMMANDS>
<REPO_CONFIG_NOTES>
- Ruff:
line-length = 88,target-version = "py312"(seepyproject.toml). - Ruff ignores
ARG(unused arguments) undertests/**/*.pyto allow pytest fixtures. - Repository guidance lives in the project root AGENTS.md (loaded as a third-party skill file). </REPO_CONFIG_NOTES>