Skip to content

Commit 89a7f30

Browse files
authored
Optimize tool inventory: merge redundant tools, improve descriptions (#289)
* Optimize tool inventory: merge redundant tools, improve descriptions Reduce agent tool count from 44 to 41 by merging and removing redundant tools that caused LLM selection confusion: - Merge memory_recall into memory_search: add `category` parameter for structured fact DB filtering, eliminating the confusion between two tools that both "search memory" - Merge set_heartbeat into set_cron: add `heartbeat` boolean parameter to find-and-update existing heartbeat jobs - Remove vault_status: fully redundant with vault_list - Remove list_custom_skills: achievable with list_files Additional improvements: - Add wait_for_subagent skill for blocking on subagent completion instead of polling read_shared_state - Clarify browser_solve_captcha description (auto-detection vs manual) - Improve list_shared_state description (previews vs full values) - Improve read_agent_history description (workspace logs, not history) - Add wait_for_subagent to _UNSAFE_SKILLS (no self-waiting) - Update all documentation and tests * Fix lint: break long line in wait_for_subagent return * Fix bugs and edge cases found in principal review - Fix wait_for_subagent: check `bb is not None` instead of `bb.get("exists")` — read_blackboard returns raw BlackboardEntry dict without an "exists" field, so the success path was dead code - Fix memory_search category path: return explicit error when category is set but memory_store is None, even if workspace_manager is available (previously fell through to workspace search, ignoring the caller's category filter intent) - Fix memory_search category path: distinguish empty results (search worked, nothing matched) from search failure (facts is None) - Fix set_cron: validate that message is non-empty for non-heartbeat cron jobs (was previously required, became optional as side effect of the heartbeat merge) - Fix vault_tool.py docstring: remove "check" reference to removed vault_status Tests added: - memory_search category + no store + workspace (no fallthrough) - memory_search category with no matching facts (empty, not error) - set_cron heartbeat=True creates/updates (3 tests) - set_cron regular requires non-empty message - wait_for_subagent with no mesh_client - wait_for_subagent with blackboard read failure - wait_for_subagent with blackboard 404 (returns None) - Clone registry excludes wait_for_subagent
1 parent c840357 commit 89a7f30

20 files changed

+404
-257
lines changed

README.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -414,18 +414,17 @@ canonicalized parameters and results over a 15-call sliding window.
414414
| `save_artifact` | Save deliverable file and register on blackboard |
415415
| `update_workspace` | Update identity files (HEARTBEAT.md, USER.md) |
416416
| `notify_user` | Send notification to user across all connected channels |
417-
| `set_cron` | Schedule a recurring job |
418-
| `set_heartbeat` | Enable autonomous monitoring with probes |
417+
| `set_cron` | Schedule a recurring job (set `heartbeat=true` for autonomous wakeups) |
419418
| `list_cron` / `remove_cron` | Manage scheduled jobs |
420419
| `create_skill` | Write a new Python skill at runtime |
421-
| `list_custom_skills` | List all custom skills the agent has created |
422420
| `reload_skills` | Hot-reload all skills |
423421
| `spawn_agent` | Spawn an ephemeral sub-agent in a new container |
424422
| `spawn_subagent` | Spawn a lightweight in-container subagent for parallel subtasks |
425423
| `list_subagents` | List active subagents and their status |
424+
| `wait_for_subagent` | Wait for a subagent to complete and return its result |
426425
| `vault_generate_secret` | Generate and store a random secret (returns opaque handle) |
427426
| `vault_capture_from_page` | Capture text from browser element and store as credential |
428-
| `vault_list` / `vault_status` | List credential names or check if a credential exists |
427+
| `vault_list` | List credential names (names only, never values) |
429428
| `introspect` | Query own runtime state: permissions, budget, fleet, cron, health |
430429
| `read_agent_history` | Read another agent's conversation logs |
431430

docs/agent-tools.md

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -45,9 +45,8 @@ All agents use a single browser architecture: **Chrome + KasmVNC**. A Chromium i
4545

4646
| Tool | Parameters | Description |
4747
|------|-----------|-------------|
48-
| `memory_search` | `query`, `max_results` | Hybrid search across workspace files and structured DB |
48+
| `memory_search` | `query`, `category`, `max_results` | Hybrid search across workspace files and structured DB. Provide `category` to search only the fact database filtered to that category. |
4949
| `memory_save` | `content` | Save a fact to workspace daily log and structured memory |
50-
| `memory_recall` | `query`, `category`, `max_results` | Semantic search with optional category filtering |
5150

5251
### Mesh / Fleet
5352

@@ -73,8 +72,7 @@ All agents use a single browser architecture: **Chrome + KasmVNC**. A Chromium i
7372

7473
| Tool | Parameters | Description |
7574
|------|-----------|-------------|
76-
| `set_cron` | `schedule`, `message` | Schedule a recurring job (cron expression or interval) |
77-
| `set_heartbeat` | `schedule` | Enable autonomous monitoring (probes run automatically) |
75+
| `set_cron` | `schedule`, `message`, `heartbeat` | Schedule a recurring job (cron expression or interval). Set `heartbeat=true` to update your autonomous wakeup schedule. |
7876
| `list_cron` | -- | List scheduled jobs |
7977
| `remove_cron` | `job_id` | Remove a scheduled job |
8078

@@ -89,7 +87,6 @@ All agents use a single browser architecture: **Chrome + KasmVNC**. A Chromium i
8987
| Tool | Parameters | Description |
9088
|------|-----------|-------------|
9189
| `create_skill` | `name`, `code` | Write a new Python skill at runtime |
92-
| `list_custom_skills` | -- | List all custom skills the agent has created |
9390
| `reload_skills` | -- | Hot-reload all skills from disk |
9491
| `spawn_agent` | `role`, `system_prompt`, `ttl` | Spawn an ephemeral sub-agent in a new container (default TTL: 3600s) |
9592

@@ -103,6 +100,7 @@ Lightweight subagents that run inside the same process as the parent agent, shar
103100
|------|-----------|-------------|
104101
| `spawn_subagent` | `task`, `role`, `ttl_seconds` | Spawn a lightweight subagent for parallel subtask execution |
105102
| `list_subagents` | -- | List active subagents spawned by this agent and their status |
103+
| `wait_for_subagent` | `subagent_id`, `timeout` | Wait for a subagent to complete and return its result |
106104

107105
### Credential Vault
108106

@@ -113,7 +111,6 @@ Agents never see credential values. All operations return opaque `$CRED{name}` h
113111
| `vault_generate_secret` | `name`, `length`, `charset` | Generate a random secret and store it (returns handle only) |
114112
| `vault_capture_from_page` | `name`, `selector` or `ref` | Read text from a browser element and store as credential |
115113
| `vault_list` | -- | List credential names the agent can access (names only, filtered by permissions) |
116-
| `vault_status` | `name` | Check if an accessible credential exists in the vault |
117114

118115
### System Introspection
119116

docs/memory.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,7 @@ Session 1: User says "My timezone is PST"
170170
=== Chat Reset ===
171171
172172
Session 2: User asks "What timezone am I in?"
173-
Agent: memory_recall("user timezone")
173+
Agent: memory_search("user timezone")
174174
-> Returns "PST" via semantic search
175175
```
176176

docs/triggering.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ Cron tick
114114

115115
```
116116
Agent: I'll monitor the system every 30 minutes.
117-
set_heartbeat(schedule="every 30m")
117+
set_cron(schedule="every 30m", heartbeat=true)
118118
```
119119

120120
The probes (disk_usage, pending_signals, pending_tasks) run automatically -- you don't need to specify them. Define your escalation rules in `HEARTBEAT.md` instead.
@@ -294,6 +294,6 @@ The watcher polls at a configurable interval, tracks file modification times, an
294294
| `src/host/server.py` | Webhook endpoints, cron management API |
295295
| `src/host/orchestrator.py` | Workflow executor (triggered by pub/sub and webhooks) |
296296
| `src/host/mesh.py` | PubSub system for event-driven triggering |
297-
| `src/agent/builtins/mesh_tool.py` | Agent-side `set_cron`, `set_heartbeat`, `list_cron`, `remove_cron` tools |
297+
| `src/agent/builtins/mesh_tool.py` | Agent-side `set_cron`, `list_cron`, `remove_cron` tools |
298298
| `src/host/watchers.py` | Polling-based file watchers for Docker volume compatibility |
299299
| `config/cron.json` | Persisted job state |

skills/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,9 +49,9 @@ Every agent automatically has access to built-in tools defined in
4949
- `read_file`, `write_file`, `list_files` — File I/O
5050
- `http_request` — HTTP requests
5151
- `browser_navigate`, `browser_screenshot`, etc. — Browser automation
52-
- `memory_search`, `memory_save`, `memory_recall` — Persistent memory
52+
- `memory_search`, `memory_save` — Persistent memory
5353
- `list_agents`, `spawn_agent`, `spawn_subagent`, `notify_user`, `publish_event` — Team coordination
5454
- `read_shared_state`, `write_shared_state`, `list_shared_state` — Shared blackboard
55-
- `vault_generate_secret`, `vault_capture_from_page`, `vault_list`, `vault_status` — Credential vault
55+
- `vault_generate_secret`, `vault_capture_from_page`, `vault_list` — Credential vault
5656
- `introspect` — Runtime state queries
5757
- `create_skill`, `reload_skills` — Self-extension

src/agent/builtins/browser_tool.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1127,9 +1127,12 @@ async def browser_evaluate(script: str, *, mesh_client=None) -> dict:
11271127
@skill(
11281128
name="browser_solve_captcha",
11291129
description=(
1130-
"Detect and solve a CAPTCHA on the current page. Supports reCAPTCHA "
1131-
"v2/v3/Enterprise, hCaptcha, and Cloudflare Turnstile. Requires a "
1132-
"CAPTCHA API key in the vault (2captcha_key or capsolver_key)."
1130+
"Manually trigger CAPTCHA detection and solving on the current page. "
1131+
"Usually NOT needed — browser_navigate auto-detects and solves CAPTCHAs. "
1132+
"Use this only when a CAPTCHA appears AFTER navigation (e.g. after clicking "
1133+
"a button or on a lazy-loaded challenge). Supports reCAPTCHA v2/v3/Enterprise, "
1134+
"hCaptcha, and Cloudflare Turnstile. Requires 2captcha_key or capsolver_key "
1135+
"in the vault."
11331136
),
11341137
parameters={},
11351138
)

src/agent/builtins/memory_tool.py

Lines changed: 42 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -48,31 +48,65 @@ def _parse_fact(content: str) -> tuple[str, str]:
4848
@skill(
4949
name="memory_search",
5050
description=(
51-
"Search your long-term memory for relevant information. "
52-
"Searches both workspace files (BM25) and structured fact database (vector+BM25). "
53-
"Use this when you need to recall facts, preferences, or past events."
51+
"Search your long-term memory. By default searches both workspace files "
52+
"and your structured fact database. Provide a category to search only "
53+
"the fact database filtered to that category. Use this to recall facts, "
54+
"preferences, decisions, or past events before answering questions."
5455
),
5556
parameters={
5657
"query": {"type": "string", "description": "What to search for"},
58+
"category": {
59+
"type": "string",
60+
"description": (
61+
"Optional: filter to a fact category (e.g. 'user_preferences', "
62+
"'decisions'). When set, searches only the structured fact database."
63+
),
64+
"default": "",
65+
},
5766
"max_results": {
5867
"type": "integer",
5968
"description": "Maximum results to return (default 5)",
6069
"default": 5,
6170
},
6271
},
6372
)
64-
async def memory_search(query: str, max_results: int = 5, *, workspace_manager=None, memory_store=None) -> dict:
73+
async def memory_search(
74+
query: str, category: str = "", max_results: int = 5,
75+
*, workspace_manager=None, memory_store=None,
76+
) -> dict:
6577
"""Search workspace memory files and structured fact database."""
6678
results = []
6779

68-
# Workspace BM25 search
80+
# Category-filtered search: only the structured fact DB
81+
if category:
82+
if memory_store is None:
83+
return {"error": "No memory_store available for category search", "results": []}
84+
# Over-fetch when filtering by category since post-fetch filtering
85+
# may discard many results
86+
fetch_k = max_results * 3
87+
facts = await _search_with_fallback(memory_store, query, fetch_k)
88+
if facts is None:
89+
return {"error": "Memory search failed", "results": []}
90+
for fact in facts:
91+
if fact.category.lower() != category.lower():
92+
continue
93+
results.append({
94+
"key": fact.key,
95+
"value": fact.value,
96+
"category": fact.category,
97+
"confidence": fact.confidence,
98+
"access_count": fact.access_count,
99+
"source": "memory_db",
100+
})
101+
return {"results": results, "count": len(results)}
102+
103+
# Default: search both workspace and DB
69104
if workspace_manager is not None:
70105
ws_hits = workspace_manager.search(query, max_results=max_results)
71106
for hit in ws_hits:
72107
hit["source"] = "workspace"
73108
results.append(hit)
74109

75-
# Structured memory DB search (vector + BM25)
76110
if memory_store is not None:
77111
db_facts = await _search_with_fallback(memory_store, query, max_results)
78112
if db_facts:
@@ -96,7 +130,7 @@ async def memory_search(query: str, max_results: int = 5, *, workspace_manager=N
96130
description=(
97131
"Save an important fact or note to long-term memory. "
98132
"Saved to both the daily session log and the structured fact database, "
99-
"so it can be recalled later with memory_recall or memory_search. "
133+
"so it can be recalled later with memory_search. "
100134
"Examples: user preferences, decisions made, key findings."
101135
),
102136
parameters={
@@ -116,7 +150,7 @@ async def memory_save(content: str, *, workspace_manager=None, memory_store=None
116150
workspace_manager.append_daily_log(content)
117151
saved_workspace = True
118152

119-
# 2. Structured memory DB (searchable via memory_recall)
153+
# 2. Structured memory DB (searchable via memory_search)
120154
if memory_store is not None:
121155
try:
122156
# Parse content into key/value — use first sentence or clause as key
@@ -132,54 +166,3 @@ async def memory_save(content: str, *, workspace_manager=None, memory_store=None
132166
return {"error": "No memory backends available"}
133167

134168
return {"saved": True, "saved_workspace": saved_workspace, "saved_db": saved_db, "content": content}
135-
136-
137-
@skill(
138-
name="memory_recall",
139-
description=(
140-
"Search your structured fact database using semantic similarity. "
141-
"Better than memory_search for recalling specific facts, preferences, and decisions. "
142-
"Supports optional category filtering."
143-
),
144-
parameters={
145-
"query": {"type": "string", "description": "What to recall"},
146-
"category": {
147-
"type": "string",
148-
"description": "Optional: filter by category name",
149-
"default": "",
150-
},
151-
"max_results": {
152-
"type": "integer",
153-
"description": "Max results (default 5)",
154-
"default": 5,
155-
},
156-
},
157-
)
158-
async def memory_recall(
159-
query: str, category: str = "", max_results: int = 5, *, memory_store=None,
160-
) -> dict:
161-
"""Search structured fact database with optional category filter."""
162-
if memory_store is None:
163-
return {"error": "No memory_store available", "results": []}
164-
165-
# Over-fetch when filtering by category since post-fetch filtering
166-
# may discard many results
167-
fetch_k = max_results * 3 if category else max_results
168-
169-
facts = await _search_with_fallback(memory_store, query, fetch_k)
170-
if facts is None:
171-
return {"error": "Memory search failed", "results": []}
172-
173-
results = []
174-
for fact in facts:
175-
if category and fact.category.lower() != category.lower():
176-
continue
177-
results.append({
178-
"key": fact.key,
179-
"value": fact.value,
180-
"category": fact.category,
181-
"confidence": fact.confidence,
182-
"access_count": fact.access_count,
183-
})
184-
185-
return {"results": results, "count": len(results)}

src/agent/builtins/mesh_tool.py

Lines changed: 43 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -145,10 +145,10 @@ async def write_shared_state(key: str, value: str, *, mesh_client=None) -> dict:
145145
@skill(
146146
name="list_shared_state",
147147
description=(
148-
"List all entries on the shared blackboard matching a key prefix. "
149-
"Use this to see what other agents have shared: list_shared_state(prefix='goals/') "
150-
"to see all goals, or list_shared_state(prefix='context/') for shared context. "
151-
"The blackboard is for agent-to-agent coordination."
148+
"Discover what's on the shared blackboard by listing entries matching a key "
149+
"prefix. Returns key names, authors, timestamps, and value previews — but "
150+
"NOT full values (use read_shared_state for that). Use when you don't know "
151+
"the exact key: prefix='tasks/' to find tasks, prefix='' to see everything."
152152
),
153153
parameters={
154154
"prefix": {
@@ -362,7 +362,8 @@ async def save_artifact(
362362
"Schedule a recurring job for yourself. The mesh will send you the "
363363
"specified message on the given schedule. Use cron syntax "
364364
"(e.g. '0 9 * * 1-5' for weekdays at 9 AM) or natural intervals "
365-
"(e.g. 'every 30m'). Returns the job ID."
365+
"(e.g. 'every 30m'). Set heartbeat=true to update your autonomous "
366+
"wakeup schedule instead."
366367
),
367368
parameters={
368369
"schedule": {
@@ -372,57 +373,51 @@ async def save_artifact(
372373
"message": {
373374
"type": "string",
374375
"description": "Message the mesh will send you on each trigger",
376+
"default": "",
377+
},
378+
"heartbeat": {
379+
"type": "boolean",
380+
"description": (
381+
"If true, sets your autonomous heartbeat schedule (finds and "
382+
"updates existing heartbeat, or creates one). Only change if "
383+
"the USER explicitly asks — each heartbeat costs API credits."
384+
),
385+
"default": False,
375386
},
376387
},
377388
)
378-
async def set_cron(schedule: str, message: str, *, mesh_client=None) -> dict:
389+
async def set_cron(
390+
schedule: str, message: str = "", heartbeat: bool = False,
391+
*, mesh_client=None,
392+
) -> dict:
379393
if mesh_client is None:
380394
return {"error": "No mesh_client available"}
381395
try:
396+
if heartbeat:
397+
# Check for existing heartbeat job and update it
398+
jobs = await mesh_client.list_cron()
399+
existing = next((j for j in jobs if j.get("heartbeat")), None)
400+
if existing:
401+
result = await mesh_client.update_cron(
402+
existing["id"], schedule=schedule,
403+
)
404+
return {"updated": True, "type": "heartbeat", **result}
405+
# No existing heartbeat — create one
406+
result = await mesh_client.create_cron(
407+
schedule=schedule,
408+
message=message or "heartbeat",
409+
heartbeat=True,
410+
)
411+
return {"created": True, "type": "heartbeat", **result}
412+
# Regular cron job
413+
if not message:
414+
return {"error": "message is required for non-heartbeat cron jobs"}
382415
result = await mesh_client.create_cron(schedule=schedule, message=message)
383416
return {"created": True, **result}
384417
except Exception as e:
385418
return {"error": f"Failed to create cron job: {e}"}
386419

387420

388-
@skill(
389-
name="set_heartbeat",
390-
description=(
391-
"Set or update your heartbeat schedule. You are automatically given a "
392-
"heartbeat on startup — only change it if the USER explicitly asks you "
393-
"to. Each heartbeat costs API credits, so NEVER increase frequency on "
394-
"your own. The mesh will periodically wake you to work toward your "
395-
"goals and check for pending tasks. Define your autonomous rules in "
396-
"HEARTBEAT.md in your workspace."
397-
),
398-
parameters={
399-
"schedule": {
400-
"type": "string",
401-
"description": "How often to wake (e.g. 'every 15m', 'every 1h', '*/30 * * * *')",
402-
},
403-
},
404-
)
405-
async def set_heartbeat(schedule: str, *, mesh_client=None) -> dict:
406-
if mesh_client is None:
407-
return {"error": "No mesh_client available"}
408-
try:
409-
# Check for existing heartbeat job and update it
410-
jobs = await mesh_client.list_cron()
411-
existing = next((j for j in jobs if j.get("heartbeat")), None)
412-
if existing:
413-
result = await mesh_client.update_cron(
414-
existing["id"], schedule=schedule,
415-
)
416-
return {"updated": True, "type": "heartbeat", **result}
417-
# No existing heartbeat — create one
418-
result = await mesh_client.create_cron(
419-
schedule=schedule, message="heartbeat", heartbeat=True,
420-
)
421-
return {"created": True, "type": "heartbeat", **result}
422-
except Exception as e:
423-
return {"error": f"Failed to set heartbeat: {e}"}
424-
425-
426421
@skill(
427422
name="list_cron",
428423
description=(
@@ -502,10 +497,10 @@ async def spawn_agent(
502497
@skill(
503498
name="read_agent_history",
504499
description=(
505-
"Read another agent's conversation history (daily logs). Use this to "
506-
"understand what another agent has been doing, what it learned, and "
507-
"what context it has. Permission-checked — you can only read agents "
508-
"you're allowed to message."
500+
"Read another agent's workspace daily logs to understand their recent "
501+
"activity — tasks worked on, tools called, and facts learned. Use this "
502+
"to get context before coordinating with another agent. Permission-checked: "
503+
"you can only read agents you're allowed to message."
509504
),
510505
parameters={
511506
"agent_id": {

0 commit comments

Comments
 (0)