Skip to content

Commit 03fa67e

Browse files
Sync repro guide: add example test tracker and fix workflow link
Co-authored-by: openhands <openhands@all-hands.dev>
1 parent 926d29c commit 03fa67e

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

mlsys_repro_guide.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -203,8 +203,9 @@ These tests exercise real agent-LLM interactions including tool invocation, file
203203

204204
- **Unit tests** (run on every commit): [github.com/OpenHands/software-agent-sdk/actions/workflows/tests.yml](https://github.com/OpenHands/software-agent-sdk/actions/workflows/tests.yml)
205205
- **Integration tests** (run nightly across multiple models): [github.com/OpenHands/software-agent-sdk/actions/workflows/integration-runner.yml](https://github.com/OpenHands/software-agent-sdk/actions/workflows/integration-runner.yml)
206-
- **Example tests** (run periodically): [github.com/OpenHands/software-agent-sdk/actions/workflows/test-examples.yml](https://github.com/OpenHands/software-agent-sdk/actions/workflows/test-examples.yml)
207-
- **Nightly results tracker** (pass rates, costs, and links to detailed agent logs): [github.com/OpenHands/software-agent-sdk/issues/2078](https://github.com/OpenHands/software-agent-sdk/issues/2078)
206+
- **Example tests** (run periodically): [github.com/OpenHands/software-agent-sdk/actions/workflows/run-examples.yml](https://github.com/OpenHands/software-agent-sdk/actions/workflows/run-examples.yml)
207+
- **Integration test results tracker** (pass rates, costs, and links to detailed agent logs): [github.com/OpenHands/software-agent-sdk/issues/2078](https://github.com/OpenHands/software-agent-sdk/issues/2078)
208+
- **Example test results tracker** (per-example status, duration, and cost): [github.com/OpenHands/software-agent-sdk/issues/976](https://github.com/OpenHands/software-agent-sdk/issues/976)
208209

209210
Each workflow run includes full logs, and the nightly tracker issue aggregates results with per-model breakdowns. This provides an independent, continuously-updated record of the SDK's testing methodology in action — no API keys required to inspect.
210211

0 commit comments

Comments
 (0)