Skip to content

Commit 5065d14

Browse files
docs(.pr): document smoke test behavior and cleanup workflow
Co-authored-by: openhands <openhands@all-hands.dev>
1 parent 1e4aec9 commit 5065d14

File tree

2 files changed

+81
-0
lines changed

2 files changed

+81
-0
lines changed

.pr/local_plugin_install_test/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,7 @@ OPENHANDS_EXAMPLE_ARTIFACT_DIR=.pr/local_plugin_install_test \
2727
- `persistence/`: persisted `base_state.json` and `events/` from the smoke-test conversation
2828

2929
> Note: `.pr/` content is automatically cleaned up on PR approval.
30+
>
31+
> Cleanup workflow: `.github/workflows/pr-artifacts.yml` (job: `cleanup-on-approval`).
32+
>
33+
> See `console_log.md` for a detailed explanation + the console output.

.pr/local_plugin_install_test/console_log.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,83 @@
33
This file is **temporary** and lives under `.pr/`, so it will be automatically
44
cleaned up by workflow on PR approval.
55

6+
7+
## Cleanup guarantee
8+
9+
- This `.pr/` directory is **temporary** and will be automatically removed when the PR is approved.
10+
- Cleanup is handled by: **`.github/workflows/pr-artifacts.yml`** (job: `cleanup-on-approval`).
11+
12+
## How the TestLLM smoke test works (verbatim explanation)
13+
14+
With **TestLLM**, the “live test” worked like this:
15+
16+
### What we were trying to prove
17+
We wanted an end-to-end-ish smoke test that:
18+
19+
1) installs a *local* plugin directory into an `installed_dir` (no real `~/.openhands/` writes)
20+
2) loads that installed plugin as a `Plugin` object
21+
3) merges the plugin’s `skills/` into an `Agent`’s `agent_context`
22+
4) runs a real `Conversation` loop and confirms the skill actually **triggers**
23+
5) persists the conversation to disk (so reviewers can inspect `base_state.json` + `events/*.json`)
24+
25+
### Why TestLLM is enough here
26+
`TestLLM` is a real `LLM` subclass in the SDK (`openhands.sdk.testing.TestLLM`). Instead of calling an external provider, it returns **scripted** assistant messages.
27+
28+
In the example we used:
29+
30+
```py
31+
TestLLM.from_messages([
32+
Message(role="assistant", content=[TextContent(text="Done")])
33+
])
34+
```
35+
36+
So when the conversation calls the LLM once, it deterministically returns `"Done"` (and cost stays 0). No network calls, no API keys.
37+
38+
### The important “real behavior” that still happens
39+
Even though the LLM response is scripted, the SDK still does the real steps around it:
40+
41+
- The installed plugin’s `skills/hello/SKILL.md` is loaded as a `Skill` object (via `Plugin.load_all()``Plugin.load()``_load_skills()`).
42+
- We merge it into the agent via `plugin.add_skills_to(...)`.
43+
- When we do:
44+
45+
```py
46+
conversation.send_message("hello")
47+
conversation.run()
48+
```
49+
50+
the `AgentContext` keyword trigger logic runs, and we actually see:
51+
52+
- the skill gets activated (`conversation.state.activated_knowledge_skills` becomes `['hello']`)
53+
- logs show: `Skill 'hello' triggered by keyword 'hello'`
54+
55+
That’s the key proof that “installed plugin → loaded plugin → skill in agent context → skill triggers in a conversation” works.
56+
57+
### What got persisted
58+
Because `Conversation(..., persistence_dir=...)` was set, it wrote:
59+
60+
- `base_state.json`
61+
- `events/event-*.json`
62+
63+
Those events include (at minimum) the user message event and the agent message event (from TestLLM). In artifact mode, we also delete `events/.eventlog.lock` so the persisted folder is clean for review.
64+
65+
### What this does *not* test
66+
It doesn’t test:
67+
- external LLM behavior (reasoning quality/tool calls/etc.)
68+
- git fetching from GitHub (we used a local path source)
69+
- plugin hooks/MCP/etc
70+
71+
But it **does** test the installed-plugin utilities + the skill loading/activation path in a “real Conversation”.
72+
73+
If you still want a “real-world-like” run with a cheap real model (e.g. `gpt-5-nano`) we can swap `TestLLM` for `LLM(...)` in the example and keep the same artifact directory behavior—just note the example would then require `OPENAI_API_KEY` (or `LLM_API_KEY` + base_url) and would incur small cost.
74+
75+
## Result from this run
76+
77+
From the log below, you can see:
78+
79+
- `Skill 'hello' triggered by keyword 'hello'`
80+
- `Activated skills: ['hello']`
81+
- persistence written to: `.../persistence/<conversation-id>`
82+
683
Command used:
784

885
```bash

0 commit comments

Comments
 (0)