-
Notifications
You must be signed in to change notification settings - Fork 369
Description
Summary
The current agent-native-architecture skill takes a strong position that tools should be pure primitives with all behavior defined in prompts. After auditing CORA against this principle, I believe we need a more nuanced stance that acknowledges valid use cases for logic in tool descriptions.
Current Position (SKILL.md lines 26-33)
### Tools Provide Capability, Not Behavior
Tools should be primitives that enable capability. The prompt defines what to do with that capability.
**Wrong:** `generate_dashboard(data, layout, filters)` — agent executes your workflow
**Right:** `read_file`, `write_file`, `list_files` — agent figures out how to build a dashboardThe Case FOR Logic in Tool Descriptions
1. Smaller Master Prompts
Tool descriptions are modular context. When you include search strategy in SearchEmails:
description <<~DESC
TRIPLE SEARCH PHILOSOPHY: LITERAL → CONTEXTUAL → INTERPRETIVE
• 🔍 LITERAL: Use plain keywords...
• 🧠 CONTEXTUAL: Add synonyms, related terms...
...
DESCThis content is only loaded when the tool is available. If you move it to the master prompt, it's always present, even when searching isn't relevant to the conversation.
Trade-off: Tool descriptions = JIT context loading. Master prompt = always-present context.
2. Tools as Composable Behavior Modules
Tools can be enabled/disabled dynamically. This means:
- Include
SearchEmailswith its search strategy → agent knows HOW to search - Exclude
SearchEmails→ agent doesn't have search capability OR strategy - Swap in
SearchEmailsSimple→ different strategy, no code change
The tool + its description is a composable unit. Separating them (primitive tool + prompt behavior) breaks this composability.
3. Harness Pattern: Code as Guardrails
Sometimes you WANT the tool to enforce behavior:
# ChangeCategory tool in CORA
def execute(email_id:, category_id:, new_category_name:)
ActiveRecord::Base.transaction do
# Step 1: Change category
# Step 2: Create sender rule
# Step 3: Reclassify brief emails
# Step 4: Broadcast refresh
end
endThis orchestration is intentional. When a user changes a category, we ALWAYS want a rule created. This is a business invariant, not a decision the agent should make.
Principle: Use code harnesses for business invariants. Use prompts for flexible behavior.
4. Portability
A tool with its behavior in the description is self-contained. You can:
- Share it between projects
- Version it independently
- Test it in isolation
If behavior lives in the master prompt, the tool is incomplete without its corresponding prompt section.
The Deeper Question: What IS a Tool?
Here's where the current guidance breaks down: it conflates the technical mechanism of a tool call with the conceptual role tools play in agent systems.
A tool call is just a function invocation. A "tool" is whatever you wire up to that call.
Consider what happens when Claude Code uses the Task tool to launch a sub-agent:
Task({
subagent_type: "explore",
prompt: "Find all files that handle authentication"
})On the wire, this is a tool call. The API sees tool_use with name Task and some parameters. But conceptually? You've just spawned an autonomous agent that will run for minutes, make its own decisions, call its own tools, and return a synthesized answer.
Is that "a tool"?
Technically: yes.
Philosophically: it's an agent with agency.
Tools as JIT Context Injection
Think of tool descriptions not as documentation, but as just-in-time context injection.
When the model sees a tool, it doesn't just learn "I can call this function." It absorbs the description as part of its working context. A 200-line tool description is 200 lines of context that:
- Only loads when that tool is available
- Can be swapped by swapping the tool
- Disappears when the tool is removed
This is modular prompting. The "master prompt" becomes smaller because domain-specific guidance lives in the tools that need it.
┌─────────────────────────────────────┐
│ Master Prompt │ ← Core identity, general behavior
│ (small, stable, universal) │
└─────────────────────────────────────┘
↓ tool injection ↓
┌───────────┐ ┌───────────┐ ┌───────────┐
│ SearchTool│ │ WriteTool │ │ AnalyzeTool│ ← Each carries its own context
│ + strategy│ │ + style │ │ + method │
└───────────┘ └───────────┘ └───────────┘
The current "primitives only" guidance would have all that strategy, style, and method in the master prompt. But that means:
- You can't swap search strategies by swapping tools
- Removing a tool doesn't remove its behavioral guidance
- The master prompt grows with every capability
Sub-Agents Complicate Everything
The real breakdown: sub-agents triggered via tool calls.
// This is a tool call...
Task({
subagent_type: "security-sentinel",
prompt: "Audit this code for vulnerabilities"
})
// ...that spawns THIS (an agent with its own system prompt, tools, and loop)The security-sentinel agent might run for 5 minutes. It has its own personality, its own tools, its own judgment. It's not a "primitive" in any meaningful sense.
Yet it's invoked via tool_use. The API doesn't distinguish between "read a file" and "spawn an autonomous agent that will make 50 decisions."
Implication: When we say "tools should be primitives," we're ignoring that tool calls are the mechanism for EVERYTHING—including spawning complex, stateful, multi-turn agents.
A More Precise Framework
Instead of "tools should be primitives," consider:
| What you're building | Mechanism | Guidance |
|---|---|---|
| Data access | Tool call | Pure primitive (read/write/list) |
| Domain operation | Tool call | May encode business invariants |
| Behavior guidance | Tool description | JIT context injection—often preferable to master prompt |
| Sub-agent | Tool call | Definitely not a "primitive"—embrace the complexity |
| Workflow | Tool call | OK if it's a business invariant; avoid if it's flexible behavior |
The Real Question to Ask
Don't ask "Is this a primitive?"
Ask: "Who should own this decision?"
| Owner | Implementation |
|---|---|
| Business rule (always this way) | Code in tool execution |
| Tool-specific strategy (usually this way) | Tool description |
| Conversation-dependent (depends on context) | Master prompt |
| Agent judgment (trust the model) | Neither—let it decide |
Proposed Refinement
Replace Binary "Primitives vs Workflows" with a Spectrum
| Pattern | When to Use | Example |
|---|---|---|
| Pure Primitive | Maximum flexibility needed | read_file, write_file |
| Guided Primitive | Tool + description with strategy | SearchEmails with search philosophy |
| Domain Primitive | Storage/retrieval with domain schema | StoreMemory, CreateCategory |
| Orchestrated Action | Business invariants that must always happen together | ChangeCategory (always creates rule) |
| Sub-Agent Trigger | Complex autonomous task | Task with agent type |
| Workflow Tool | Complex multi-step with decisions | Avoid unless intentionally constraining |
New Guidance
### Tools: More Than Just Functions
A tool call is a mechanism. What you wire to it determines its nature.
**Pure primitives** (read, write, list): Maximum flexibility. Agent decides everything.
**Guided primitives** (tool + rich description): The description is JIT context injection.
Use this when behavior is tool-specific and you want it portable with the tool.
**Domain operations** (create_category, change_status): May encode business invariants.
If something must ALWAYS happen together, put it in code.
**Sub-agent triggers** (Task tool, specialized agents): Not primitives at all.
These spawn autonomous agents with their own judgment. Embrace the complexity.
**Decision framework:**
- Who should own this decision?
- Is this always true (code) or context-dependent (prompt)?
- Should this guidance travel with the tool (description) or stay global (master prompt)?CORA Examples
Good: Logic in Tool Description (SearchEmails)
The 213-line description teaches search strategy. This is appropriate because:
- Strategy is specific to email searching
- Portable with the tool
- Reduces master prompt size
- Can swap for different strategy by swapping tool
Good: Harness Pattern (ChangeCategory)
4-step orchestration is appropriate because:
- Business invariant: category changes SHOULD create rules
- Atomic: all steps must succeed or fail together
- Not a decision point for the agent
Potentially Problematic: GetGmailLabelStats
Encodes "HIGH_VOLUME" vs "NEEDS_CLEANUP" categorization in code:
def inbox_state(count)
case count
when 0..100 then "READY"
when 101..500 then "NEEDS_CLEANUP"
else "HIGH_VOLUME"
end
endThis IS a decision the agent could make contextually. User with 200 emails who checks daily ≠ user with 200 emails who checks weekly.
Recommendation: Return raw stats, let agent (or prompt) interpret.
Proposed Changes
- Update
SKILL.mdessential_principles to reflect the spectrum, not binary - Add "What IS a Tool?" section clarifying technical vs conceptual distinction
- Add "JIT Context Injection" as a valid pattern for tool descriptions
- Acknowledge sub-agents as a special case that breaks the "primitive" model
- Add "Decision Framework" based on ownership, not tool structure
- Rename principle from "Tools Provide Capability, Not Behavior" to something more nuanced
Discussion
- Is the JIT context injection framing compelling?
- Should we acknowledge sub-agents as a distinct category?
- Are there cases where the current "primitives only" guidance is clearly better?
- How do we balance simplicity of guidance vs accuracy of the nuance?