| version-requirements | version-last-verified | measurement-claims | status | last-verified | |||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
2026-02-27 |
|
PRODUCTION |
2026-02-16 |
🔗 Vendor Documentation: Anthropic Engineering Blog (November 24, 2025)
Evidence Tier: A (Primary vendor documentation)
Beta Header: advanced-tool-use-2025-11-20
Meta-Guide Note: This pattern synthesizes vendor documentation from Anthropic's Engineering Blog and provides local Claude Code examples with SDD integration. For authoritative feature documentation and official updates, refer to the Anthropic Engineering Blog. For practical implementation guidance and SDD methodology integration, continue here.
⚠️ Status Inconsistency: Pattern marked as "PRODUCTION" but references beta header from Nov 2025 (3+ months old). Verify: Has this feature graduated from beta? Is beta header still required? Check Anthropic Changelog for graduation date or updated header.
Traditional tool calling has three fundamental bottlenecks:
| Server | Tools | Tokens |
|---|---|---|
| GitHub | 35 | ~26K |
| Slack | 11 | ~21K |
| Jira | - | ~17K |
| Total | 58+ | ~72K |
Example: Analyzing 10MB log file
- Traditional: Entire file enters context window
- Optimized: Only summary of error frequencies
JSON Schema defines structural validity but cannot express:
- When to include optional parameters
- Which parameter combinations make sense
- API conventions and format expectations
Problem: 50K+ tokens in tool definitions before conversation starts
Solution: Mark tools with defer_loading: true - Claude discovers on-demand via search
Results:
- Traditional: ~77K tokens before work begins
- With Tool Search: ~8.7K tokens (85% reduction)
- Accuracy: 79.5% → 88.1% (Opus 4.5)
When to use:
- Tool definitions >10K tokens
- 10+ tools available
- MCP systems with multiple servers
Problem: Each tool invocation requires full inference pass
Solution: Claude writes orchestration code that runs in sandboxed environment
# Claude writes this code
team = await get_team_members("engineering")
# Parallel fetch all expenses
expenses = await asyncio.gather(*[
get_expenses(m["id"], "Q3") for m in team
])
# Process in code, return only summary
exceeded = []
for member, exp in zip(team, expenses):
total = sum(e["amount"] for e in exp)
if total > budget_limit:
exceeded.append({"name": member["name"], "spent": total})
print(json.dumps(exceeded)) # Only this enters Claude's contextResults: Token usage reduced by 37%
When to use:
- Processing large datasets needing aggregates
- Multi-step workflows (3+ dependent calls)
- Filtering/transforming results before Claude sees them
- Parallel operations across many items
Problem: JSON Schema can't express usage patterns
Solution: Provide concrete invocations
{
"name": "create_ticket",
"input_examples": [
{
"title": "Login page returns 500 error",
"priority": "critical",
"labels": ["bug", "authentication", "production"]
},
{
"title": "Add dark mode support",
"labels": ["feature-request", "ui"]
}
]
}What Claude learns from examples:
- Format conventions (dates, IDs)
- Nested structure patterns
- Optional parameter correlations
Results: Accuracy improved from 72% → 90%
Skills that would benefit from input examples:
-
contradiction-detector
- Gap: "absolute statements" is ambiguous
- Examples needed: Strong claim vs. hedged claim vs. established fact
-
hypothesis-validator
- Gap: Passive reading vs. active formulation unclear
- Examples needed: Research claim vs. hypothesis proposal
-
publication-quality-checker
- Gap: Draft review vs. quality gate vs. casual feedback
- Examples needed: Publication request vs. internal review
- cybersecurity-concept-analyzer - "Analyze" vs. "quick summary"
- expert-communication-writer - Composing vs. reviewing vs. discussing
- For Skills: Add explicit activation examples in SKILL.md
- For MCP Servers: Consider
defer_loadingfor specialized tools - For Research Workflows: Use programmatic calling for batch operations
Claude Code includes a Language Server Protocol (LSP) tool providing IDE-like code intelligence capabilities.
| Operation | Purpose | Use Case |
|---|---|---|
| Go-to-definition | Navigate to where symbol is defined | Understanding function/class implementation |
| Find references | Locate all usages of a symbol | Impact analysis before refactoring |
| Hover documentation | Get type info and docs for symbol | Quick reference without leaving context |
| Scenario | Why LSP Helps |
|---|---|
| Refactoring prep | Find all usages before renaming |
| Understanding unfamiliar code | Jump to definitions, see type signatures |
| API exploration | Hover for documentation without searching |
| Impact analysis | Find references before modifying |
| Task | LSP | Grep/Glob |
|---|---|---|
| Find function definition | ✅ Precise (language-aware) | |
| Find all usages | ✅ Semantic (knows imports, aliases) | |
| Type information | ✅ Available | ❌ Not available |
| Cross-file navigation | ✅ Handles imports correctly |
- Language support varies: TypeScript/JavaScript excellent, others may be limited
- Project setup required: LSP needs proper project configuration (tsconfig, etc.)
- Not for text search: Use Grep for pattern matching, LSP for semantic operations
1. Grep/Glob: Find candidate files by pattern
2. LSP: Navigate precisely within those files
3. Read: Examine full context when needed
| Claim | Source | Tier |
|---|---|---|
| 85% token reduction (Tool Search) | Anthropic measurements | A |
| 37% token reduction (PTC) | Anthropic measurements | A |
| 72% → 90% accuracy (Examples) | Anthropic internal testing | A |
Problem: Including all tool definitions in every context
Symptom: 50K+ tokens consumed before conversation starts, context exhaustion
Solution: Use defer_loading: true for specialized tools, enable tool search
Problem: Letting full tool results enter context without summarization Symptom: Large files (logs, data) consume entire context window Solution: Use programmatic tool calling to filter/aggregate before results enter context
Problem: Relying solely on JSON Schema without usage examples
Symptom: 72% accuracy when examples would yield 90%
Solution: Add input_examples showing correct parameter patterns
Problem: Making tool calls one at a time when parallelization is possible Symptom: Slow responses, unnecessary inference passes Solution: Use programmatic calling for batch operations (asyncio.gather pattern)
- Context Engineering - Correctness over compression philosophy
- Progressive Disclosure - Token-efficient skill architecture
Last updated: January 2026