test: use default agent preset for integration tests#1996
test: use default agent preset for integration tests#1996
Conversation
Update integration tests to use the default agent preset from openhands.tools.preset.default, ensuring tests validate the same agent configuration shipped to production (GUI/CLI). Changes: - Update BaseIntegrationTest to use get_default_tools() by default - Add enable_browser property (default False) for tests to override - Remove manual tool registration from t01-t08 tests - Update behavior test helpers to use default preset - Add enable_browser=True for browsing tests (t05, t06) This ensures integration tests validate the production agent configuration, catching any issues before they reach users. Fixes #372 Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Good Refactoring with Minor Design Smell
Verdict: ✅ Worth merging - Solid refactoring that removes duplication and aligns tests with production config.
Key Insight: The -85 line reduction and single source of truth for tool configuration is excellent. The enable_browser flag works but represents a scaling problem for future tool configurations.
See inline comments for specific feedback.
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
enyst
left a comment
There was a problem hiding this comment.
Unless you'd like to negotiate with the reviewer agent, LGTM 😅
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
@OpenHands please check the github workflow in this PR https://github.com/OpenHands/software-agent-sdk/actions/runs/21925641027/job/63317348622 Tell me why the integration test in this PR never finish running? |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Analysis: Why the Integration Test Never FinishesAfter analyzing the GitHub workflow logs and the PR changes, I've identified the root cause of why the integration tests are hanging for 6+ hours before being cancelled. Root CauseThe PR changes The problem is:
Evidence from Logs
Recommended FixThe tests Alternatively, if browser testing is desired:
|
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
Co-authored-by: openhands <openhands@all-hands.dev>
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
@OpenHands review https://github.com/OpenHands/software-agent-sdk/actions/runs/22763604507/job/66025865386?pr=1996 - why it gets stuck there? I think we need to setup chromium for the integration test too, similar to the test-examples tests |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Browser-enabled integration tests (t05, t06) require Chromium to be available. Without a system-installed Chromium, the browser tool's auto-install via uvx playwright hangs or installs without system deps, causing the entire test run to get stuck indefinitely. This mirrors the Chromium setup already present in run-examples.yml. Co-authored-by: openhands <openhands@all-hands.dev>
SummaryProblem: The integration tests in PR #1996 were getting stuck indefinitely because the Fix: Added an "Install Chromium" step to the - name: Install Chromium
run: |
sudo apt-get update
sudo apt-get install -y chromium-browserStatus: The change (1 file, +5 lines) has already been committed and pushed to the |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
@OpenHands please check the workflow https://github.com/OpenHands/software-agent-sdk/actions/runs/22766812391/job/66036717130?pr=1996 and the PR context, why it is still broken after adding chromium |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
After browser-enabled tests (t05, t06) complete, the Chrome process and its async event loop remain alive in ProcessPoolExecutor workers because LocalConversation.close() is never explicitly called. This prevents the worker processes from exiting, causing the CI job to hang indefinitely. Call self.conversation.close() in teardown() to ensure all tool executors (including BrowserToolExecutor) are properly shut down after each test. Co-authored-by: openhands <openhands@all-hands.dev>
SummaryProblemThe integration tests in PR #1996 were hanging indefinitely (3+ hours until GitHub Actions timeout) even after adding chromium-browser installation to the CI workflow. Root CauseThe browser tests (t05, t06) were passing successfully — chromium was working fine. The hang occurred after all tests completed because:
Evidence from the cancelled run's cleanup confirmed orphan Fix (commit
|
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @xingyaoww, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
4 similar comments
|
[Automatic Post]: It has been a while since there was any activity on this PR. @xingyaoww, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @xingyaoww, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @xingyaoww, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @xingyaoww, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
@OpenHands pls understand the context of this pr, then understand why https://github.com/OpenHands/software-agent-sdk/actions/runs/23972109855/job/69930673480?pr=1996 failed and fix it |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
- Rename t01-t04, t07-t08 test files back from _t*.py to t*.py so the test runner discovers them again (they were temporarily disabled for debugging but never re-enabled). - Change ProcessPoolExecutor to use shutdown(wait=False) after all results are collected, preventing the runner from hanging indefinitely when browser/Chrome child processes survive worker teardown. - Clear BrowserToolSet._shared_executor class variable on close() so stale references don't prevent worker process exit. Co-authored-by: openhands <openhands@all-hands.dev>
Summary of ChangesThe PR comment asked me to understand the PR context, diagnose why the CI workflow failed, and fix it. Was the request fully addressed?
Are the changes concise?Yes, all three changes directly address the failure:
No extraneous changes were made. |
Coverage Report •
|
||||||||||||||||||||
Summary
Update integration tests to use the default agent preset from
openhands.tools.preset.default, ensuring tests validate the same agent configuration shipped to production (GUI/CLI).Changes:
tests/integration/base.py: UpdatedBaseIntegrationTestto:get_default_tools()from the presetenable_browserproperty (defaultFalse) for tests to overridetoolsproperty non-abstract with default implementation using presetIntegration tests (t01-t08): Removed manual tool registration, now using default preset:
t01_fix_simple_typo.pyt02_add_bash_hello.pyt03_jupyter_write_file.pyt04_git_staging.pyt05_simple_browsing.py(withenable_browser=True)t06_github_pr_browsing.py(withenable_browser=True)t07_interactive_commands.pyt08_image_file_viewing.pyBehavior tests: Updated
default_behavior_tools()inbehavior_helpers.pyto use presetb05_do_not_create_redundant_files.py: Removed manual tool registrationNet change: -85 lines of code
This ensures integration tests validate the production agent configuration, catching any issues before they reach users.
Fixes #372
Checklist
@xingyaoww can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:1f4e52c-pythonRun
All tags pushed for this build
About Multi-Architecture Support
1f4e52c-python) is a multi-arch manifest supporting both amd64 and arm641f4e52c-python-amd64) are also available if needed