-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Labels
Description
Summary
Enable models to call other models as tools, creating a sub-agent flow where an orchestrating model can delegate tasks to specialized models. The tool definition should automatically derive its description from the target model's description field in the config.
Motivation
Many AI workflows benefit from model composition:
- A general reasoning model that routes to specialized models (code, math, vision)
- An orchestrator that delegates to fine-tuned classifiers or domain experts
- Chain-of-thought verification where one model checks another's work
- Multi-modal pipelines where a text model calls a vision model
Currently, users must implement this routing logic in application code. Adding native support for "model as tool" would make these patterns trivial to configure.
Proposed Design
Config Schema
models:
# Specialized models
- name: code-expert
description: "Expert at writing and debugging Python code. Best for coding tasks."
provider: ollama
model: deepseek-coder:6.7b
- name: math-expert
description: "Specialized in mathematical reasoning and calculations."
provider: universal
model: qwen2.5-math:7b
- name: vision-analyzer
description: "Analyzes images and describes visual content."
provider: universal
model: llava:7b
# Orchestrator model with sub-agent tools
- name: orchestrator
description: "General-purpose assistant that can delegate to specialists."
provider: ollama
model: llama3.2:latest
tools:
# Inline tool that calls another model
- type: model_call
model: code-expert # References model by name
# description auto-pulled from code-expert.description
- type: model_call
model: math-expert
override_description: "Use for complex math" # Optional override
- type: model_call
model: vision-analyzer
input_mapping: # Optional: map tool params to model input
image_url: "content[0].image_url"Auto-Generated Tool Schema
When type: model_call is specified, LlamaFarm should automatically generate a tool definition:
{
"type": "function",
"function": {
"name": "call_code_expert",
"description": "Expert at writing and debugging Python code. Best for coding tasks.",
"parameters": {
"type": "object",
"properties": {
"prompt": {
"type": "string",
"description": "The prompt/question to send to the code expert model"
},
"context": {
"type": "string",
"description": "Optional additional context for the model"
}
},
"required": ["prompt"]
}
}
}Execution Flow
- Orchestrator model decides to call
call_code_experttool - LlamaFarm intercepts the tool call
- Routes to
code-expertmodel with the provided prompt - Returns response as tool result to orchestrator
- Orchestrator incorporates result into its response
Advanced Options
tools:
- type: model_call
model: code-expert
# Execution options
max_tokens: 2000 # Limit sub-agent response length
timeout_seconds: 30 # Timeout for sub-agent call
# Tool behavior
tool_name: "write_code" # Custom tool name (default: call_{model_name})
streaming: false # Whether to stream sub-agent response
# Context passing
include_conversation: false # Pass conversation history to sub-agent
system_prompt_override: "You are a code assistant..."
# Recursion control
allow_nested_calls: false # Prevent sub-agent from calling other models
max_depth: 2 # Max recursion depth if allowedImplementation Recommendations
Phase 1: Basic Model-as-Tool
- Add
model_calltool type to config schema - Implement tool definition generation with description auto-pull
- Add basic tool execution routing in the server
- Support simple prompt/response flow
Phase 2: Enhanced Routing
- Input/output mapping for complex payloads
- Support for passing images/files to vision models
- Conversation context passing options
- Timeout and error handling
Phase 3: Advanced Orchestration
- Recursive call support with depth limits
- Streaming responses from sub-agents
- Parallel sub-agent calls
- Cost/latency tracking per sub-agent
Code Changes Required
1. Config Schema (config/datamodel.py)
class ModelCallTool(BaseModel):
type: Literal["model_call"] = Field("model_call")
model: str = Field(..., description="Name of the model to call")
tool_name: Optional[str] = Field(None, description="Custom tool name")
override_description: Optional[str] = Field(None)
max_tokens: Optional[int] = Field(None)
timeout_seconds: Optional[int] = Field(30)
include_conversation: Optional[bool] = Field(False)
allow_nested_calls: Optional[bool] = Field(False)
# Update Tool union type
Tool = Union[FunctionTool, ModelCallTool]2. Tool Definition Generation
def generate_model_call_tool_definition(
tool_config: ModelCallTool,
target_model: Model
) -> ToolDefinition:
description = tool_config.override_description or target_model.description
return ToolDefinition(
name=tool_config.tool_name or f"call_{tool_config.model}",
description=description or f"Call the {tool_config.model} model",
parameters={
"type": "object",
"properties": {
"prompt": {
"type": "string",
"description": f"The prompt to send to {tool_config.model}"
}
},
"required": ["prompt"]
}
)3. Tool Execution Handler
In server/agents/ or appropriate location:
async def handle_model_call_tool(
tool_name: str,
arguments: dict,
config: LlamaFarmConfig,
tool_config: ModelCallTool
) -> str:
target_model = get_model_by_name(config, tool_config.model)
# Build request for target model
messages = [{"role": "user", "content": arguments["prompt"]}]
# Call target model
response = await call_model(
model=target_model,
messages=messages,
max_tokens=tool_config.max_tokens,
timeout=tool_config.timeout_seconds
)
return response.contentRelated Files
config/datamodel.py- Schema definitions (Tool, Model classes)server/agents/base/types.py- ToolDefinition, ToolCallRequestserver/services/ml_model_service.py- Model calling logic- Existing tools implementation for reference pattern
Example Use Case: Research Assistant
models:
- name: web-searcher
description: "Searches the web and returns relevant information"
provider: openai
model: gpt-4-turbo
mcp_servers: ["brave-search"]
- name: code-writer
description: "Writes clean, tested Python code"
provider: ollama
model: deepseek-coder:33b
- name: fact-checker
description: "Verifies claims and provides citations"
provider: anthropic
model: claude-3-sonnet
- name: research-assistant
description: "Comprehensive research assistant with specialist delegation"
provider: anthropic
model: claude-3-opus
tools:
- type: model_call
model: web-searcher
- type: model_call
model: code-writer
- type: model_call
model: fact-checkerQuestions for Discussion
- Should sub-agent responses be cached to reduce redundant calls?
- How do we handle authentication/API keys for different providers in sub-calls?
- Should there be a global "allow_model_calls" flag for security?
- How should streaming work when the orchestrator is also streaming?
- Should we support "tool chaining" where a sub-agent's tool call triggers another?
Labels: enhancement, feature-request, tools, agents, orchestration
Reactions are currently unavailable