Advanced tool use transforms Claude from a simple function-calling agent into an intelligent orchestrator capable of working with hundreds of tools, processing massive datasets, and executing complex multi-step workflows. Anthropic's three beta features—Tool Search Tool, Programmatic Tool Calling, and Tool Use Examples—solve the fundamental bottlenecks preventing production-scale agent deployments.
Traditional tool calling hits three critical walls: context bloat from loading tool definitions (55K+ tokens for basic MCP setups), context pollution from intermediate results (50KB+ of expense data for simple budget checks), and inference overhead (19+ separate model calls for 20-tool workflows). Advanced tool use eliminates these bottlenecks with dynamic tool discovery, code-based orchestration, and example-driven parameter guidance.
This playbook provides production implementation patterns for building agents that scale to enterprise tool libraries, handle multi-step research workflows, and process large datasets without context exhaustion.
- Core Problems and Solutions
- Feature 1: Tool Search Tool
- Feature 2: Programmatic Tool Calling
- Feature 3: Tool Use Examples
- Architecture Patterns
- Implementation Guide
- Performance Optimization
- Error Handling and Monitoring
- Best Practices
- Summary
Scenario: You have 50+ MCP tools across GitHub, Slack, Sentry, Grafana, Splunk, and Jira integrations. Loading all tool definitions consumes 72K tokens before the conversation even starts. Adding one more server (Jira alone: 17K tokens) pushes overhead beyond 100K tokens.
Traditional Approach:
// Load ALL tools upfront - massive context consumption
const tools = [
githubTools, // 15K tokens
slackTools, // 12K tokens
sentryTools, // 10K tokens
grafanaTools, // 8K tokens
splunkTools, // 10K tokens
jiraTools // 17K tokens
]; // Total: 72K tokens BEFORE any conversationSolution: Tool Search Tool - Dynamic discovery loads only relevant tools on-demand, reducing overhead by 85%.
Scenario: Analyzing employee expenses across 20 team members. Each has 50-100 expense line items. Traditional tool calling forces all 2,000+ expense records into Claude's context (50KB+) just to find 2-3 budget violations.
Traditional Flow:
1. Fetch team members → 20 records (1KB)
2. Fetch expenses for each → 2,000 line items (50KB)
3. Fetch budget limits → 20 records (1KB)
4. Claude manually sums and compares → 52KB in context
Solution: Programmatic Tool Calling - Claude writes Python code that processes data in-flight, returning only the 2-3 violations (1KB total).
Traditional Multi-Step Workflow:
Inference 1: Call get_team_members()
Inference 2: Call get_budget_limits()
Inference 3: Call get_expenses(user_1)
Inference 4: Call get_expenses(user_2)
... (18 more inferences)
Inference 22: Synthesize results
Each tool call requires a full model inference pass. A 20-tool workflow demands 20+ separate model runs, creating latency (minutes) and accuracy problems (context drift).
Solution: Programmatic Tool Calling - One code block orchestrates all calls, eliminating 19+ inference passes.
Scenario: Support ticket API with complex nested structures and format conventions. JSON schema defines structure but cannot express "when to include optional escalation parameter" or "reporter contact format."
Schema Alone:
interface CreateTicketInput {
title: string;
description: string;
priority?: "low" | "medium" | "high" | "critical";
escalation?: {
level: number;
reason: string;
};
reporter?: {
email?: string;
slack?: string;
};
}
// Schema says WHAT but not WHEN or HOWSolution: Tool Use Examples - 1-5 realistic invocation examples show format conventions, parameter correlations, and usage patterns.
Rather than loading all tool definitions upfront, the Tool Search Tool discovers tools dynamically when needed. Developers mark tools with defer_loading: true, making them discoverable without consuming context initially. Claude searches for relevant tools, which expand into full definitions only when needed.
Before (Traditional Approach):
// All tools loaded upfront
const mcpSetup = {
github: 15000, // tokens
slack: 12000,
sentry: 10000,
grafana: 8000,
splunk: 10000,
jira: 17000
};
// Total: 72,000 tokens consumed before conversation startsAfter (Tool Search Tool):
// Only search tool + discovered tools loaded
const optimizedSetup = {
toolSearchTool: 500, // Search capability
discoveredTools: 3000 // 3-5 relevant tools
};
// Total: 3,500 tokens (85% reduction)// tools.config.ts
interface ToolConfiguration {
name: string;
description: string;
defer_loading: boolean; // Key parameter
input_schema: object;
}
// Mark tools for deferred loading
const toolConfigs: ToolConfiguration[] = [
{
name: "github_create_pr",
description: "Create GitHub pull requests with diff analysis",
defer_loading: true, // Discovered on-demand
input_schema: githubPRSchema
},
{
name: "slack_send_message",
description: "Send messages to Slack channels and users",
defer_loading: false, // Always loaded (frequently used)
input_schema: slackMessageSchema
},
// ... 48 more tools marked defer_loading: true
];
// Claude searches when needed
// User: "Create a PR for the authentication fix"
// Claude: Searches for "pull request creation" → discovers github_create_pr
// Only then does the full tool definition loadKeep Always-Loaded (defer_loading: false):
- Tools used in >50% of sessions
- Low-token tools (<500 tokens)
- Critical infrastructure (logging, error reporting)
- Frequently-paired tools (git + github, slack + notify)
Mark Deferred (defer_loading: true):
- Specialized tools (database migrations, ML model training)
- Domain-specific tools (legal document analysis, medical coding)
- Rarely-used integrations (legacy systems, backup services)
- Large tool libraries (>10 tools per service)
Help Claude discover tools efficiently with context hints:
const systemPrompt = `
You have access to tools across multiple domains:
**Communication:** Slack messaging, email sending, SMS notifications
**Version Control:** GitHub PRs, commits, issue tracking, code review
**Monitoring:** Sentry error tracking, Grafana dashboards, log analysis
**Project Management:** Jira tickets, sprint planning, roadmap updates
**Data:** Database queries, analytics, reporting, ETL pipelines
Use the Tool Search Tool to discover specific capabilities as needed.
Focus on the user's immediate task rather than loading all tools upfront.
`;Internal Anthropic testing revealed significant accuracy gains:
- Claude Opus 4: 49% → 74% (51% improvement)
- Claude Opus 4.5: 79.5% → 88.1% (11% improvement)
The improvement comes from reduced context noise—Claude focuses on relevant tools rather than scanning hundreds of definitions.
✅ Use When:
- Tool definitions exceed 10K tokens
- Large MCP setups with 10+ tools
- Many similar tool names causing confusion
- Accuracy concerns with tool selection
- Building multi-tenant systems with tenant-specific tools
❌ Skip When:
- Small libraries (under 10 tools)
- All tools used in every session
- Compact definitions (<5K total tokens)
- Real-time latency critical (discovery adds ~200ms)
Tool Search Tool maintains prompt caching benefits because deferred tools are excluded from the initial prompt entirely. They're only added to context after Claude searches for them, so cached prompts remain stable.
Instead of requesting tools sequentially, Claude writes Python code orchestrating multiple tools. Code runs in a sandboxed environment, pausing when tool results are needed. Results process within the execution context—never entering Claude's context window—until only final output returns.
Traditional Multi-Step Scenario:
// User: "Which employees exceeded their quarterly budget?"
// Step 1: Fetch team members
const teamMembers = await getTeamMembers(); // 20 people
// Context: 1KB
// Step 2-21: Fetch expenses for each (20 sequential inferences)
for (const member of teamMembers) {
const expenses = await getExpenses(member.id); // 50-100 items each
// Context grows: 1KB → 50KB (2,000 expense line items)
}
// Step 22: Fetch budget limits
const budgets = await getBudgetLimits();
// Context: 51KB
// Step 23: Claude manually sums and compares
// Final answer: "Alice, Bob, and Charlie exceeded budgets"
// Total: 23 inferences, 51KB context consumptionWith Programmatic Tool Calling:
# Claude writes this code (1 inference)
import asyncio
async def find_budget_violations():
# Parallel fetches (no context pollution)
teams, budgets = await asyncio.gather(
get_team_members(),
get_budget_limits()
)
# Fetch all expenses in parallel
expense_tasks = [get_expenses(m.id) for m in teams]
all_expenses = await asyncio.gather(*expense_tasks)
# Process in Python (results never enter Claude's context)
violations = []
for member, expenses, budget in zip(teams, all_expenses, budgets):
total = sum(e.amount for e in expenses)
if total > budget.limit:
violations.append({
'name': member.name,
'spent': total,
'limit': budget.limit,
'overage': total - budget.limit
})
return violations # Only 2-3 records (1KB)
# Code executes outside Claude's context
# Only final result returns: [Alice: $12K/$10K, Bob: $8.5K/$8K, Charlie: $15K/$12K]
# Total: 1 inference, 1KB context consumptionToken Reduction:
- Average usage dropped from 43,588 → 27,297 tokens (37% reduction)
- Complex research tasks see 50%+ reductions
Latency Improvement:
- Eliminates 19+ inference passes
- Parallel execution (20 sequential calls → 1 async batch)
- Typical workflow: 45 seconds → 8 seconds
Accuracy Boost:
- Internal knowledge retrieval: 25.6% → 28.5%
- GIA benchmarks: 46.5% → 51.2%
- Reduced context drift from intermediate results
1. Enable Tools for Code Execution:
// tool-config.ts
interface ToolConfig {
name: string;
description: string;
input_schema: object;
allowed_callers?: string[]; // Key parameter
}
const expenseTools: ToolConfig[] = [
{
name: "get_team_members",
description: "Fetch all team members with IDs",
input_schema: { type: "object", properties: {} },
allowed_callers: ["code_execution_20250825"] // Opt-in
},
{
name: "get_expenses",
description: "Fetch expense records for a user",
input_schema: {
type: "object",
properties: {
user_id: { type: "string", description: "User ID" }
},
required: ["user_id"]
},
allowed_callers: ["code_execution_20250825"] // Opt-in
},
{
name: "get_budget_limits",
description: "Fetch quarterly budget limits",
input_schema: { type: "object", properties: {} },
allowed_callers: ["code_execution_20250825"] // Opt-in
}
];2. Claude Generates Orchestration Code:
Claude automatically writes async Python with:
asyncio.gather()for parallel execution- Data filtering and transformation
- Aggregation before returning results
3. Tool Execution Outside Context:
// Execution flow
class ProgrammaticExecutor {
async executeCode(pythonCode: string): Promise<any> {
const sandbox = new CodeSandbox();
// Parse code for tool calls
const toolCalls = this.extractToolCalls(pythonCode);
// Execute with tool pausing
let result = null;
for (const toolCall of toolCalls) {
// Tool pauses execution, returns to API
const toolResult = await this.invokeTool(toolCall);
// Result goes back to code execution (NOT to Claude)
sandbox.injectResult(toolCall.id, toolResult);
}
// Only final processed output returns
return sandbox.getFinalResult();
}
}4. Only Final Output Returns to Claude:
# Code processes 2,000 expense records internally
# Claude's context only receives:
[
{"name": "Alice", "spent": 12000, "limit": 10000, "overage": 2000},
{"name": "Bob", "spent": 8500, "limit": 8000, "overage": 500},
{"name": "Charlie", "spent": 15000, "limit": 12000, "overage": 3000}
]
# 1KB vs 50KB (98% reduction)Pattern 1: Multi-Stage Data Pipeline
async def analyze_customer_churn():
# Stage 1: Fetch raw data in parallel
customers, transactions, support_tickets = await asyncio.gather(
get_customers(status="active"),
get_transactions(days=90),
get_support_tickets(days=90)
)
# Stage 2: Enrich and filter (in Python, not context)
enriched = []
for customer in customers:
tx = [t for t in transactions if t.customer_id == customer.id]
tickets = [t for t in support_tickets if t.customer_id == customer.id]
# Churn risk scoring
revenue_trend = calculate_trend([t.amount for t in tx])
ticket_sentiment = analyze_sentiment([t.description for t in tickets])
if revenue_trend < -0.3 or ticket_sentiment < 0.4:
enriched.append({
'customer': customer.name,
'risk_score': calculate_risk(revenue_trend, ticket_sentiment),
'last_purchase': tx[-1].date if tx else None,
'unresolved_tickets': len([t for t in tickets if not t.resolved])
})
# Stage 3: Return only high-risk customers
return sorted(enriched, key=lambda x: x['risk_score'], reverse=True)[:10]Pattern 2: Iterative Refinement
async def find_optimal_pricing():
# Fetch historical data
sales_data = await get_sales_history(months=12)
competitor_prices = await get_competitor_pricing()
# Binary search for optimal price point
low, high = 10, 100
best_price = None
best_revenue = 0
while high - low > 1:
mid = (low + high) // 2
projected_revenue = simulate_revenue(sales_data, price=mid)
if projected_revenue > best_revenue:
best_revenue = projected_revenue
best_price = mid
if competitor_avg := average([c.price for c in competitor_prices]):
if mid < competitor_avg:
low = mid
else:
high = mid
return {
'optimal_price': best_price,
'projected_revenue': best_revenue,
'competitor_avg': competitor_avg
}✅ Use When:
- Large datasets requiring aggregation (>1K records)
- Multi-step dependent workflows (3+ tool calls)
- Data transformation before reasoning
- Parallel operations across many items
- Intermediate data shouldn't influence reasoning
❌ Skip When:
- Simple single-tool calls
- Claude needs to see all intermediate results
- Quick lookups returning small responses (<1KB)
- Non-idempotent operations requiring human review
// Safe tools for programmatic execution
const safeTools = [
"read_database", // Read-only, idempotent
"fetch_api_data", // GET requests only
"calculate_metrics", // Pure computation
"analyze_logs" // Read-only analysis
];
// Unsafe tools - require human approval
const unsafeTools = [
"delete_records", // Destructive
"send_email", // Side effects
"charge_payment", // Financial
"deploy_code" // Production changes
];
// Only mark safe tools for code execution
safeTools.forEach(tool => {
tool.allowed_callers = ["code_execution_20250825"];
});JSON schemas define structure but cannot express usage patterns:
// Schema shows WHAT but not WHEN or HOW
interface CreateTicketInput {
title: string; // Always required
description: string; // Always required
priority?: "low" | "medium" | "high" | "critical"; // When?
escalation?: { // When to include?
level: number; // What values are valid?
reason: string;
};
reporter?: { // Format?
email?: string; // Pattern?
slack?: string; // Slack ID or @mention?
};
due_date?: string; // Format? YYYY-MM-DD? ISO 8601?
}Common Errors Without Examples:
- Missing required nested fields
- Wrong date formats (MM/DD/YYYY vs YYYY-MM-DD)
- Incorrect ID conventions (USR-123 vs 123 vs usr_123)
- Inappropriate parameter combinations
- Misunderstanding optional field usage
Provide 1-5 examples showing minimal, partial, and full patterns:
// tool-examples.ts
const createTicketExamples = [
// Example 1: Minimal (required fields only)
{
title: "Login button unresponsive",
description: "Users cannot click the login button on mobile Safari"
},
// Example 2: Partial (priority + reporter contact)
{
title: "Database connection pool exhausted",
description: "Production API timing out due to connection pool limits",
priority: "high",
reporter: {
slack: "U01ABC123XY" // Shows Slack ID format
}
},
// Example 3: Full (critical bug with escalation)
{
title: "Payment processing failures",
description: "Stripe webhook timeouts causing failed charges",
priority: "critical",
escalation: {
level: 2, // Shows level values (1-3)
reason: "Revenue impact: $50K/hour in failed transactions"
},
reporter: {
email: "alice@company.com",
slack: "U01ABC123XY"
},
due_date: "2025-01-15" // Shows YYYY-MM-DD format
},
// Example 4: Feature request (no escalation)
{
title: "Add dark mode to dashboard",
description: "Users requesting dark theme for night shift monitoring",
priority: "low",
reporter: {
email: "bob@company.com"
}
},
// Example 5: Medium priority with specific due date
{
title: "Upgrade PostgreSQL to v16",
description: "Plan migration from v14 to v16 for performance improvements",
priority: "medium",
due_date: "2025-02-28" // End of Q1
}
];
// Attach to tool configuration
const createTicketTool = {
name: "create_support_ticket",
description: "Create support tickets for bugs, features, and infrastructure",
input_schema: createTicketSchema,
examples: createTicketExamples // Key parameter
};Format Conventions:
- Date format:
2025-01-15(YYYY-MM-DD) - Slack IDs:
U01ABC123XY(not @mentions) - Email: Standard format
Parameter Correlations:
- Critical bugs → include escalation
- Feature requests → no escalation
- High priority → often includes reporter contact
Nested Structure Usage:
- Escalation: Only for critical/high priority
- Reporter: Email or Slack or both
- Due date: Optional for bugs, common for features
Realistic Data:
- Actual city names, plausible prices
- Real-world scenarios
- Varied examples (bugs, features, infrastructure)
Anthropic testing showed parameter handling improved from 72% → 90% on complex tools.
1. Use Realistic Data:
// ❌ BAD: Generic placeholders
{
title: "Title here",
description: "Description here",
priority: "medium"
}
// ✅ GOOD: Realistic scenario
{
title: "API rate limiting returning 429 errors",
description: "GitHub API calls exceeding 5000/hour limit during CI builds",
priority: "high"
}2. Demonstrate Format Conventions:
// Show actual formats Claude should use
const examples = [
{ user_id: "USR-00123" }, // ID convention
{ created_at: "2025-01-15" }, // Date format
{ amount: 1299 }, // Cents not dollars
{ phone: "+1-555-0123" } // Phone format
];3. Show Parameter Correlations:
// Critical bugs have escalation; features don't
const bugExample = {
type: "bug",
severity: "critical",
escalation: { level: 2, reason: "Production outage" }
};
const featureExample = {
type: "feature",
priority: "low"
// No escalation for features
};4. Clarify Optional Field Patterns:
// When to include optional nested structures
const minimalOrder = {
items: [{ sku: "ABC-123", quantity: 2 }]
// No shipping, no billing (digital goods)
};
const physicalOrder = {
items: [{ sku: "XYZ-789", quantity: 1 }],
shipping: {
address: "123 Main St",
city: "San Francisco",
zip: "94102"
},
billing: { // Include when different from shipping
address: "456 Oak Ave",
city: "Oakland",
zip: "94601"
}
};5. Keep to 1-5 Examples Per Tool:
- Too few (1): Insufficient pattern coverage
- Optimal (3-5): Minimal, partial, full patterns
- Too many (10+): Cognitive overload, token waste
✅ Use When:
- Complex nested structures
- Many optional parameters with conditional patterns
- Domain-specific conventions (IDs, dates, formats)
- Disambiguating similar tools
- APIs with implicit parameter correlations
❌ Skip When:
- Simple single-parameter tools
- Standard formats Claude already understands
- Validation better handled by schema constraints
- Parameters are self-explanatory
// Layer 1: Always-loaded core tools
const corTools = [
toolSearchTool, // Discovery
logError, // Monitoring
getUserContext // Session management
];
// Layer 2: Frequently-used domain tools
const frequentTools = [
slackSendMessage, // Used in 80% of sessions
getDatabaseRecord, // Common queries
searchDocuments // Knowledge retrieval
];
// Layer 3: Deferred specialized tools (discovered on-demand)
const deferredTools = [
githubCreatePR, // 20% of sessions
runDatabaseMigration, // 5% of sessions
trainMLModel, // 2% of sessions
analyzeLegalDoc // 1% of sessions
];
// Configuration
const toolConfig = {
core: corTools.map(t => ({ ...t, defer_loading: false })),
frequent: frequentTools.map(t => ({ ...t, defer_loading: false })),
deferred: deferredTools.map(t => ({ ...t, defer_loading: true }))
};class ToolOrchestrator {
async execute(task: string): Promise<any> {
// Try programmatic execution first
try {
const code = await this.generateOrchestrationCode(task);
const result = await this.executeProgrammatically(code);
// Success - 37% token savings, 80% latency reduction
return result;
} catch (error) {
// Fallback to traditional sequential tool calling
console.warn('Programmatic execution failed, using sequential fallback');
return await this.executeSequentially(task);
}
}
private async executeProgrammatically(code: string): Promise<any> {
const sandbox = new CodeSandbox();
return await sandbox.run(code, {
allowedTools: this.getSafeTools(),
timeout: 30000, // 30 second limit
maxIterations: 100
});
}
private getSafeTools(): string[] {
// Only idempotent, read-only operations
return [
'get_*', // All read operations
'fetch_*', // API fetches
'calculate_*', // Pure computations
'analyze_*' // Read-only analysis
];
}
}interface ToolConfiguration {
name: string;
description: string;
input_schema: JSONSchema;
examples: ToolExample[];
validation: ValidationRules;
}
const createOrderTool: ToolConfiguration = {
name: "create_order",
description: "Create customer orders with items, shipping, and billing",
// Schema: Structural validation
input_schema: {
type: "object",
properties: {
items: { type: "array", minItems: 1 },
shipping: { type: "object" },
billing: { type: "object" }
},
required: ["items"]
},
// Examples: Behavioral guidance
examples: [
// Digital order (no shipping)
{ items: [{ sku: "DIG-001", qty: 1 }] },
// Physical order (shipping required)
{
items: [{ sku: "PHY-123", qty: 2 }],
shipping: { address: "123 Main St", city: "SF", zip: "94102" }
},
// Physical order with separate billing
{
items: [{ sku: "PHY-456", qty: 1 }],
shipping: { address: "123 Main St", city: "SF", zip: "94102" },
billing: { address: "456 Oak Ave", city: "Oakland", zip: "94601" }
}
],
// Validation: Runtime checks
validation: {
rules: [
{
condition: "item.sku.startsWith('PHY-')",
requires: "shipping is provided",
error: "Physical items require shipping address"
},
{
condition: "billing is provided",
requires: "billing.address !== shipping.address",
error: "Billing address must differ from shipping when specified"
}
]
}
};import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY
});
const response = await client.beta.messages.create({
betas: ["advanced-tool-use-2025-11-20"], // Enable beta features
model: "claude-sonnet-4-5-20250929",
max_tokens: 4096,
tools: toolConfiguration,
messages: [
{
role: "user",
content: "Which employees exceeded their quarterly budget?"
}
]
});// tool-registry.ts
class ToolRegistry {
private tools: Map<string, ToolConfig> = new Map();
register(tool: ToolConfig): void {
this.tools.set(tool.name, tool);
}
getConfiguration(): ToolConfig[] {
const config: ToolConfig[] = [];
// Always load: search tool + core utilities
config.push(...this.getCoreTools());
// Conditionally load: frequent tools
config.push(...this.getFrequentTools());
// Defer: specialized tools (85% token reduction)
config.push(...this.getDeferredTools());
return config;
}
private getCoreTools(): ToolConfig[] {
return Array.from(this.tools.values()).filter(t =>
t.category === "core" || t.usage_frequency > 0.5
).map(t => ({
...t,
defer_loading: false
}));
}
private getDeferredTools(): ToolConfig[] {
return Array.from(this.tools.values()).filter(t =>
t.category === "specialized" && t.usage_frequency < 0.2
).map(t => ({
...t,
defer_loading: true
}));
}
}// programmatic-executor.ts
class ProgrammaticExecutor {
async handleToolUse(response: Message): Promise<ExecutionResult> {
for (const content of response.content) {
if (content.type === "code_execution") {
return await this.executeCode(content);
}
}
throw new Error("No code execution block found");
}
private async executeCode(codeBlock: CodeBlock): Promise<ExecutionResult> {
const sandbox = this.createSandbox();
try {
// Parse code for tool dependencies
const toolCalls = this.extractToolCalls(codeBlock.code);
// Execute with paused tool resolution
const result = await sandbox.execute(codeBlock.code, {
onToolCall: async (name, params) => {
// Verify tool is allowed for programmatic execution
if (!this.isToolAllowed(name)) {
throw new Error(`Tool ${name} not allowed for code execution`);
}
// Execute tool (result stays in sandbox, not in Claude's context)
return await this.invokeTool(name, params);
}
});
return {
success: true,
output: result, // Only final processed result
tokensConsumed: this.estimateTokens(result) // Minimal
};
} catch (error) {
return {
success: false,
error: error.message,
fallback: "sequential_execution_recommended"
};
}
}
private isToolAllowed(toolName: string): boolean {
const tool = this.toolRegistry.get(toolName);
return tool?.allowed_callers?.includes("code_execution_20250825") ?? false;
}
}// tool-examples.ts
function createToolWithExamples(
baseConfig: ToolConfig,
examples: ToolExample[]
): EnrichedToolConfig {
return {
...baseConfig,
examples: examples.map(ex => ({
input: ex,
explanation: generateExplanation(ex, baseConfig.input_schema)
}))
};
}
// Example: Support ticket creation
const ticketTool = createToolWithExamples(
{
name: "create_ticket",
description: "Create support tickets for bugs and features",
input_schema: ticketSchema
},
[
// Minimal
{
title: "Login fails on Safari",
description: "Users report authentication errors"
},
// Partial
{
title: "API rate limit exceeded",
description: "CI builds hitting GitHub rate limits",
priority: "high",
reporter: { slack: "U01ABC123" }
},
// Full
{
title: "Payment processor down",
description: "Stripe webhooks timing out",
priority: "critical",
escalation: { level: 2, reason: "Revenue impact" },
reporter: { email: "alice@co.com", slack: "U01ABC123" },
due_date: "2025-01-15"
}
]
);class TokenBudgetManager {
private readonly maxContextTokens = 200000; // Claude Sonnet 4.5
private readonly reserveForResponse = 4096;
calculateToolLoadingStrategy(): LoadingStrategy {
const availableForTools = this.maxContextTokens - this.reserveForResponse;
// Measure current tool definitions
const currentToolTokens = this.estimateToolTokens();
if (currentToolTokens < 10000) {
return { strategy: "load_all", defer_loading: [] };
} else if (currentToolTokens < 50000) {
return {
strategy: "selective_defer",
defer_loading: this.identifyLowFrequencyTools()
};
} else {
return {
strategy: "aggressive_defer",
defer_loading: this.identifyAllButCoreTools()
};
}
}
private estimateToolTokens(): number {
return this.tools.reduce((total, tool) =>
total + this.estimateSingleToolTokens(tool), 0
);
}
private estimateSingleToolTokens(tool: ToolConfig): number {
// Rough estimate: 1 token ≈ 4 characters
const schemaSize = JSON.stringify(tool.input_schema).length / 4;
const descriptionSize = tool.description.length / 4;
return Math.ceil(schemaSize + descriptionSize + 50); // +50 for overhead
}
}// Identify parallelizable tool calls in workflows
class ParallelExecutionOptimizer {
analyzeWorkflow(steps: WorkflowStep[]): ExecutionPlan {
const graph = this.buildDependencyGraph(steps);
const parallelBatches = this.identifyParallelBatches(graph);
return {
batches: parallelBatches,
estimatedLatency: this.estimateLatency(parallelBatches),
tokenSavings: this.estimateTokenSavings(parallelBatches)
};
}
private buildDependencyGraph(steps: WorkflowStep[]): DependencyGraph {
const graph = new Map<string, Set<string>>();
for (const step of steps) {
const dependencies = new Set<string>();
// Check if step parameters reference previous step outputs
for (const param of Object.values(step.parameters)) {
const refs = this.extractReferences(param);
refs.forEach(ref => dependencies.add(ref));
}
graph.set(step.id, dependencies);
}
return graph;
}
private identifyParallelBatches(graph: DependencyGraph): ExecutionBatch[] {
const batches: ExecutionBatch[] = [];
const executed = new Set<string>();
while (executed.size < graph.size) {
// Find all steps with satisfied dependencies
const ready = Array.from(graph.entries())
.filter(([id, deps]) =>
!executed.has(id) &&
Array.from(deps).every(dep => executed.has(dep))
)
.map(([id]) => id);
if (ready.length === 0) break; // Circular dependency
batches.push({
steps: ready,
parallelizable: true,
estimatedTime: Math.max(...ready.map(id => this.estimateStepTime(id)))
});
ready.forEach(id => executed.add(id));
}
return batches;
}
}// Cache tool results for programmatic execution
class ToolResultCache {
private cache = new Map<string, CachedResult>();
private readonly ttl = 300000; // 5 minutes
async getCached(toolName: string, params: object): Promise<any | null> {
const key = this.generateKey(toolName, params);
const cached = this.cache.get(key);
if (!cached) return null;
if (Date.now() - cached.timestamp > this.ttl) {
this.cache.delete(key);
return null;
}
return cached.result;
}
async setCached(toolName: string, params: object, result: any): Promise<void> {
const key = this.generateKey(toolName, params);
this.cache.set(key, {
result,
timestamp: Date.now()
});
}
private generateKey(toolName: string, params: object): string {
// Stable serialization for consistent keys
const sortedParams = this.sortObjectKeys(params);
return `${toolName}:${JSON.stringify(sortedParams)}`;
}
}class ExecutionErrorHandler {
async handleExecutionError(error: ExecutionError): Promise<RecoveryAction> {
// Categorize error type
if (error.type === "timeout") {
return {
action: "fallback_sequential",
reason: "Code execution exceeded 30s timeout",
recommendation: "Break workflow into smaller batches"
};
}
if (error.type === "tool_not_allowed") {
return {
action: "request_permission",
reason: `Tool ${error.toolName} not in allowed_callers list`,
recommendation: "Add tool to allowed_callers or use sequential execution"
};
}
if (error.type === "sandbox_crash") {
return {
action: "fallback_sequential",
reason: "Code execution sandbox crashed",
recommendation: "Review code for infinite loops or memory issues"
};
}
// Default: fallback to sequential
return {
action: "fallback_sequential",
reason: "Unknown execution error",
recommendation: "Use traditional tool calling"
};
}
}// Monitor tool discovery patterns
class ToolDiscoveryMonitor {
private metrics = {
searches: 0,
hits: 0,
misses: 0,
avgSearchLatency: 0
};
recordSearch(query: string, results: Tool[], latency: number): void {
this.metrics.searches++;
this.metrics.avgSearchLatency =
(this.metrics.avgSearchLatency * (this.metrics.searches - 1) + latency)
/ this.metrics.searches;
if (results.length > 0) {
this.metrics.hits++;
} else {
this.metrics.misses++;
this.logMissedSearch(query);
}
}
private logMissedSearch(query: string): void {
// Alert on common missed searches
console.warn(`Tool search miss: "${query}"`);
// Suggest tools to add or improve descriptions
const suggestions = this.suggestToolImprovements(query);
if (suggestions.length > 0) {
console.info(`Consider adding tools: ${suggestions.join(', ')}`);
}
}
getMetrics(): DiscoveryMetrics {
return {
...this.metrics,
hitRate: this.metrics.hits / this.metrics.searches,
missRate: this.metrics.misses / this.metrics.searches
};
}
}interface AdvancedToolUseMetrics {
// Token efficiency
avgTokensPerRequest: number;
tokenSavingsVsTraditional: number;
// Latency
avgLatencyMs: number;
latencySavingsVsTraditional: number;
// Accuracy
toolSelectionAccuracy: number;
parameterAccuracy: number;
// Discovery
toolSearchHitRate: number;
avgToolsDiscovered: number;
// Programmatic execution
programmaticExecutionRate: number;
avgResultSizeReduction: number;
}
class MetricsDashboard {
async generateReport(): Promise<AdvancedToolUseMetrics> {
const traditional = await this.getTraditionalBaseline();
const advanced = await this.getAdvancedMetrics();
return {
avgTokensPerRequest: advanced.tokens,
tokenSavingsVsTraditional:
((traditional.tokens - advanced.tokens) / traditional.tokens) * 100,
avgLatencyMs: advanced.latency,
latencySavingsVsTraditional:
((traditional.latency - advanced.latency) / traditional.latency) * 100,
toolSelectionAccuracy: advanced.accuracy.toolSelection,
parameterAccuracy: advanced.accuracy.parameters,
toolSearchHitRate: advanced.discovery.hitRate,
avgToolsDiscovered: advanced.discovery.avgDiscovered,
programmaticExecutionRate: advanced.programmatic.executionRate,
avgResultSizeReduction: advanced.programmatic.resultSizeReduction
};
}
}// Phase 1: Start with Tool Search Tool (easiest win)
const phase1Config = {
features: ["tool_search_tool"],
defer_loading: identifyLowFrequencyTools(), // 85% token reduction
rollout: "all_users"
};
// Phase 2: Add Programmatic Tool Calling (complex workflows)
const phase2Config = {
features: ["tool_search_tool", "programmatic_tool_calling"],
programmatic_allowed: getSafeTools(), // Read-only operations first
rollout: "power_users"
};
// Phase 3: Tool Use Examples (accuracy improvements)
const phase3Config = {
features: ["tool_search_tool", "programmatic_tool_calling", "tool_use_examples"],
examples: addExamplesToComplexTools(),
rollout: "all_users"
};// Help Claude write better orchestration code
const toolWithDocumentedOutput = {
name: "get_expenses",
description: "Fetch expense records for a user",
input_schema: { /* ... */ },
output_format: {
description: "Array of expense objects",
example: [
{
id: "EXP-001",
user_id: "USR-123",
amount: 4599, // Cents
category: "travel",
date: "2025-01-15",
approved: true
}
],
fields: {
id: "Unique expense ID (EXP-XXXXX)",
amount: "Amount in cents (integer)",
date: "ISO 8601 date string (YYYY-MM-DD)"
}
}
};// ❌ BAD: Ambiguous names
const badNames = [
"fetch", // Fetch what?
"process", // Process what?
"analyze" // Analyze what?
];
// ✅ GOOD: Descriptive names for discovery
const goodNames = [
"fetch_customer_expense_records",
"process_invoice_payment",
"analyze_sales_trend_quarterly"
];// ❌ BAD: Dangerous tools in code execution
const dangerousConfig = {
name: "delete_all_records",
allowed_callers: ["code_execution_20250825"] // NEVER DO THIS
};
// ✅ GOOD: Only safe, idempotent tools
const safeConfig = {
name: "fetch_records",
allowed_callers: ["code_execution_20250825"] // Read-only, safe
};// ❌ BAD: Deferring high-frequency tools
const badDeferral = {
name: "send_slack_message", // Used in 80% of sessions
defer_loading: true // Adds discovery latency every time
};
// ✅ GOOD: Keep frequent tools always loaded
const goodConfig = {
name: "send_slack_message",
defer_loading: false // Always available
};// ❌ BAD: Too many examples (10+)
const tooManyExamples = [
example1, example2, example3, example4, example5,
example6, example7, example8, example9, example10,
example11, example12 // Cognitive overload, token waste
];
// ✅ GOOD: 3-5 focused examples
const focusedExamples = [
minimalExample, // Required fields only
partialExample, // Common optional fields
fullExample // Complete usage
];Advanced tool use transforms Claude from a simple function-calling agent into an intelligent orchestrator capable of enterprise-scale deployments. The three beta features solve fundamental bottlenecks:
- Tool Search Tool - Dynamic discovery reduces context overhead by 85% (72K → 3.5K tokens)
- Programmatic Tool Calling - Code orchestration reduces token usage by 37% and eliminates 19+ inference passes
- Tool Use Examples - Realistic invocations improve parameter accuracy from 72% → 90%
Phase 1: Enable Beta Features
- Add
betas: ["advanced-tool-use-2025-11-20"]to API calls - Use Claude Sonnet 4.5 (
claude-sonnet-4-5-20250929) - Test with existing tool configurations
Phase 2: Configure Tool Discovery
- Identify frequently-used tools (>50% of sessions)
- Mark 3-5 core tools
defer_loading: false - Mark specialized tools
defer_loading: true - Add system prompt guidance for discovery
- Monitor discovery hit rate (target: >80%)
Phase 3: Enable Programmatic Execution
- Identify safe, idempotent tools (read operations, calculations)
- Add
allowed_callers: ["code_execution_20250825"]to safe tools - Document tool return formats clearly
- Implement fallback to sequential execution
- Monitor execution success rate (target: >90%)
Phase 4: Add Tool Use Examples
- Identify tools with complex parameters (5+ optional fields)
- Create 3-5 realistic examples per complex tool
- Show minimal, partial, and full invocation patterns
- Demonstrate format conventions and parameter correlations
- Validate accuracy improvements (target: >10% gain)
Phase 5: Production Monitoring
- Track token usage (baseline vs advanced)
- Measure latency improvements
- Monitor tool selection accuracy
- Alert on discovery misses
- Review programmatic execution errors
Target Performance:
- Token reduction: 35-85% depending on feature mix
- Latency improvement: 50-80% for multi-step workflows
- Accuracy gains: 10-20% on complex tasks
- Discovery hit rate: >80%
- Programmatic execution success: >90%
Advanced tool use is a beta feature rapidly evolving. Stay updated with Anthropic's engineering blog for new capabilities, performance improvements, and production case studies.
For complex agent systems processing large datasets across hundreds of tools, these features are not optional—they're foundational for production deployments that scale beyond prototypes into enterprise production systems.
Related Playbooks:
- Multi-Agent Rate Limits - Coordinate tool calls across concurrent agents
- MCP Server Reliability - Build self-healing tool backends
- Cost Attribution System - Track token usage per feature
- Progressive Enhancement - Roll out advanced features safely