Read this before adding or changing a tool definition, tool result, or tool error that will be consumed by a Thane loop. If a tool may be used by an orchestrator, a delegate, a wake handler, a planning run, or any other model-driven path, the audience is a model.
This document is a sibling to Model-Facing Context. That page is about context injection and prompt assembly. This page is about tool surfaces and the data contracts around them. Assume the conventions in the companion document still apply here. This page covers the tool-specific rules that are easiest to get wrong.
Tool use has to survive three layers:
- the semantic tool catalog
- the prompt/tool advertisement rendered for the model
- the provider/model runtime that turns model output into actual tool calls
Do not assume every model family emits native tool-call structures just because one premium model does. Some local/open models emit raw JSON or fenced pseudo-tool payloads instead. Thane should centralize those compatibility rules instead of scattering parser heuristics across providers.
When model families differ:
- keep one semantic tool catalog
- add a shared model-family adapter/profile layer
- centralize raw-text tool-call recovery in one runtime seam
- vary prompt-side tool contracts only where the model family actually needs it
Tool names should explain action, object, and scope when ambiguity is possible.
Good:
ha_automation_listha_automation_updateha_registry_searchadd_context_entity
If platform or source matters, keep it in the name, but do not drop the
verb to make room for it. Prefer names like
macos_calendar_list_events over names like macos_calendar_events.
Avoid:
- bare generic names like
list,get, orupdatewith no object or scope - synonyms that hide the true object being manipulated
- names that require knowledge of internal package structure
If two tools would be hard to distinguish from the name alone, the names are not done yet.
Models often reason in names, not IDs. Upstreams often require IDs, not names.
When deterministic resolution is possible and ambiguity is bounded, tools should accept either human names or canonical IDs, then normalize immediately to the canonical form before making the upstream call.
Example:
- accept
Kitchenorarea_kitchen_1 - resolve both to the canonical area ID before calling upstream
- if resolution fails, return a structured not-found or ambiguity error instead of guessing
Good fit for dual acceptance:
- Home Assistant area IDs or area names
- category IDs or category names
- label IDs or label names
Bad fit:
- friendly names that are likely to be duplicated across a large graph
- values where fuzzy matching would silently guess
When ambiguity exists, fail fast and return the candidates.
Tool results should include the canonical IDs needed for follow-up calls and the names or labels needed for interpretation.
The purpose of names in results is not friendliness. It is semantic
value. If a model sees only a category ID like
01JSPY2KHMDFXMSDFXJNKZWX2V, it may invent a story about what that
category means. If it also sees the resolved category name
Door Locks, it can reason with less guesswork.
A delegate should be able to recover from a good tool error in one more attempt.
Bad errors force the model to speculate:
no data foundinvalid requestmultiple accounts; specify account
Better errors tell the model what to change next:
no calendar events matched account="aimee" between start=... and end=...; try widening the window or removing querymultiple accounts have connected platform providers for macos.calendar/list_events; retry with account set to one of ["aimee", "nugget"]label "critical" is ambiguous; matches [{"id":"label_a","name":"Critical"},{"id":"label_b","name":"Critical"}]
An actionable error usually includes:
- which argument needs to change
- the valid choices when the set is small
- whether the failure was ambiguity, absence, gating, or upstream error
- enough echoed context to understand the failed selection
If a tool can target multiple accounts, clients, or tenants, ambiguity errors should name the parameter to supply and list the valid choices when the set is small.
Good:
multiple accounts have connected platform providers for macos.calendar/list_events; retry with account set to one of ["aimee", "nugget"]
When the result shape allows it, include the choices as structured data as well as in the error text.
No data is not one thing.
These are meaningfully different:
- the selection was ambiguous
- the target could not be found
- the query was valid but returned zero results
- the upstream was unavailable
- the tool was gated or unavailable in this loop
Do not collapse them into the same string.
Models can recover from a precise empty result. They struggle when every miss looks identical.
Models work best when they can inspect before they change.
Prefer tool families that make the progression obvious through descriptive names:
*_searchor*_list*_get*_create*_update*_delete
Mutation tools should return enough canonical data for the next step:
- created or updated IDs
- normalized metadata
- enabled or disabled state
- a compact summary of what changed
If the upstream supports validation or dry-run behavior, expose it.
Human names and semantic labels are metadata. They are not the source of truth.
When writing back to upstream systems:
- use canonical IDs after normalization
- keep raw configs faithful when the point is exact system state
- avoid inventing a Thane-only schema when the upstream object is already the right abstraction
This keeps the model-friendly layer separate from the machine-authority layer.
If multiple targets are plausible, say so and enumerate them when the set is small.
Good:
- return the two matching accounts
- return the matching label IDs and names
- return the matching entities with area and domain
Avoid silent best guesses unless the tool contract explicitly promises a deterministic winner.
The output of one tool should naturally support the input of the next.
Examples:
- list results include IDs needed for get or update
- search results include names that explain opaque IDs
- activity summaries include enough context to decide whether to inspect a specific item
- validation results point at the exact field or path that failed
If the model has to re-run a broad search just to recover the argument for a follow-up call, the first tool did not expose enough.
When the model needs generated information about available tools, capabilities, or activation affordances, default to compact structured data over narrative explanation.
The model should be able to tell:
- which tools are actually callable
- which capabilities are loaded vs merely available
- which exact tool names are valid
- which recovery path applies if a tool family uses raw-text fallback
Do not hide those distinctions in prose if a stable JSON shape would make them obvious.
If a tool expects an integer, path, enum, or boolean, say so explicitly in the schema and description. Do not assume the model will infer the type from the parameter name alone.
Bad:
forge_issue_get(number: "352")
Better:
forge_issue_get(number: 352)
Delegates burn iterations when a tool design forces a discovery step for something that could have been accepted literally.
Prefer tool shapes that accept exact file paths, IDs, or other canonical selectors directly when the caller already has them.
Stable path abstractions are also valid when they resolve
deterministically before execution. This is the lesson of
#377 and
PR #378: prefixes
like kb:, scratchpad:, and temp: are good abstractions when they
are backed by a shared resolver and collapse to concrete paths without
making the model search the filesystem.
Bad:
- requiring a search step before every file read or entity mutation
Better:
read_file(path="/home/thane/config.yaml")read_file(path="kb:reference/architecture.md")ha_automation_get(entity_id="automation.low_battery_warning")
Delegates are literal. They do not have human intuition about what an error "probably means."
Write tool contracts and errors so a delegate can:
- understand what failed
- identify the argument to change
- issue one better follow-up call
Tool descriptions should help with that. Include:
- required parameter names and types when they are easy to misuse
- what the tool returns when that is important for the next step
- literal examples when a value shape is non-obvious
- common failure modes when there is a known trap
For example, if a tool expects an integer issue number, say number (integer, not string) instead of just number.
If a delegate would need to ask itself "what do I try next?", the tool surface can probably be improved.
Before merging a new or changed tool, ask:
- From the tool name alone, can a model tell what it acts on?
- Are the parameters shaped the way a model is likely to think about the problem?
- Does the result include the canonical IDs needed for the next step?
- Does it also include the semantic names needed for interpretation?
- Are ambiguity and empty-result cases distinguished cleanly?
- Could a delegate recover from the error in one more attempt?
- Does the description call out parameter types where misuse is likely?
- Would multi-account or multi-target ambiguity be obvious to recover?
- Is the output compact structure instead of human-oriented narrative?
- Are we preserving exact upstream state where exactness matters?
If those answers are good, the tool surface is probably on the right track.