Skip to content

Sub-Agent Tool: Allow Models to Call Other Models via Inline Tools #709

@rgthelen

Description

@rgthelen

Summary

Enable models to call other models as tools, creating a sub-agent flow where an orchestrating model can delegate tasks to specialized models. The tool definition should automatically derive its description from the target model's description field in the config.

Motivation

Many AI workflows benefit from model composition:

  • A general reasoning model that routes to specialized models (code, math, vision)
  • An orchestrator that delegates to fine-tuned classifiers or domain experts
  • Chain-of-thought verification where one model checks another's work
  • Multi-modal pipelines where a text model calls a vision model

Currently, users must implement this routing logic in application code. Adding native support for "model as tool" would make these patterns trivial to configure.

Proposed Design

Config Schema

models:
  # Specialized models
  - name: code-expert
    description: "Expert at writing and debugging Python code. Best for coding tasks."
    provider: ollama
    model: deepseek-coder:6.7b

  - name: math-expert
    description: "Specialized in mathematical reasoning and calculations."
    provider: universal
    model: qwen2.5-math:7b

  - name: vision-analyzer
    description: "Analyzes images and describes visual content."
    provider: universal
    model: llava:7b

  # Orchestrator model with sub-agent tools
  - name: orchestrator
    description: "General-purpose assistant that can delegate to specialists."
    provider: ollama
    model: llama3.2:latest
    tools:
      # Inline tool that calls another model
      - type: model_call
        model: code-expert           # References model by name
        # description auto-pulled from code-expert.description
        
      - type: model_call
        model: math-expert
        override_description: "Use for complex math"  # Optional override
        
      - type: model_call
        model: vision-analyzer
        input_mapping:              # Optional: map tool params to model input
          image_url: "content[0].image_url"

Auto-Generated Tool Schema

When type: model_call is specified, LlamaFarm should automatically generate a tool definition:

{
  "type": "function",
  "function": {
    "name": "call_code_expert",
    "description": "Expert at writing and debugging Python code. Best for coding tasks.",
    "parameters": {
      "type": "object",
      "properties": {
        "prompt": {
          "type": "string",
          "description": "The prompt/question to send to the code expert model"
        },
        "context": {
          "type": "string",
          "description": "Optional additional context for the model"
        }
      },
      "required": ["prompt"]
    }
  }
}

Execution Flow

  1. Orchestrator model decides to call call_code_expert tool
  2. LlamaFarm intercepts the tool call
  3. Routes to code-expert model with the provided prompt
  4. Returns response as tool result to orchestrator
  5. Orchestrator incorporates result into its response

Advanced Options

tools:
  - type: model_call
    model: code-expert
    
    # Execution options
    max_tokens: 2000          # Limit sub-agent response length
    timeout_seconds: 30       # Timeout for sub-agent call
    
    # Tool behavior
    tool_name: "write_code"   # Custom tool name (default: call_{model_name})
    streaming: false          # Whether to stream sub-agent response
    
    # Context passing
    include_conversation: false  # Pass conversation history to sub-agent
    system_prompt_override: "You are a code assistant..."
    
    # Recursion control
    allow_nested_calls: false    # Prevent sub-agent from calling other models
    max_depth: 2                 # Max recursion depth if allowed

Implementation Recommendations

Phase 1: Basic Model-as-Tool

  1. Add model_call tool type to config schema
  2. Implement tool definition generation with description auto-pull
  3. Add basic tool execution routing in the server
  4. Support simple prompt/response flow

Phase 2: Enhanced Routing

  1. Input/output mapping for complex payloads
  2. Support for passing images/files to vision models
  3. Conversation context passing options
  4. Timeout and error handling

Phase 3: Advanced Orchestration

  1. Recursive call support with depth limits
  2. Streaming responses from sub-agents
  3. Parallel sub-agent calls
  4. Cost/latency tracking per sub-agent

Code Changes Required

1. Config Schema (config/datamodel.py)

class ModelCallTool(BaseModel):
    type: Literal["model_call"] = Field("model_call")
    model: str = Field(..., description="Name of the model to call")
    tool_name: Optional[str] = Field(None, description="Custom tool name")
    override_description: Optional[str] = Field(None)
    max_tokens: Optional[int] = Field(None)
    timeout_seconds: Optional[int] = Field(30)
    include_conversation: Optional[bool] = Field(False)
    allow_nested_calls: Optional[bool] = Field(False)

# Update Tool union type
Tool = Union[FunctionTool, ModelCallTool]

2. Tool Definition Generation

def generate_model_call_tool_definition(
    tool_config: ModelCallTool,
    target_model: Model
) -> ToolDefinition:
    description = tool_config.override_description or target_model.description
    return ToolDefinition(
        name=tool_config.tool_name or f"call_{tool_config.model}",
        description=description or f"Call the {tool_config.model} model",
        parameters={
            "type": "object",
            "properties": {
                "prompt": {
                    "type": "string",
                    "description": f"The prompt to send to {tool_config.model}"
                }
            },
            "required": ["prompt"]
        }
    )

3. Tool Execution Handler

In server/agents/ or appropriate location:

async def handle_model_call_tool(
    tool_name: str,
    arguments: dict,
    config: LlamaFarmConfig,
    tool_config: ModelCallTool
) -> str:
    target_model = get_model_by_name(config, tool_config.model)
    
    # Build request for target model
    messages = [{"role": "user", "content": arguments["prompt"]}]
    
    # Call target model
    response = await call_model(
        model=target_model,
        messages=messages,
        max_tokens=tool_config.max_tokens,
        timeout=tool_config.timeout_seconds
    )
    
    return response.content

Related Files

  • config/datamodel.py - Schema definitions (Tool, Model classes)
  • server/agents/base/types.py - ToolDefinition, ToolCallRequest
  • server/services/ml_model_service.py - Model calling logic
  • Existing tools implementation for reference pattern

Example Use Case: Research Assistant

models:
  - name: web-searcher
    description: "Searches the web and returns relevant information"
    provider: openai
    model: gpt-4-turbo
    mcp_servers: ["brave-search"]

  - name: code-writer
    description: "Writes clean, tested Python code"
    provider: ollama
    model: deepseek-coder:33b

  - name: fact-checker
    description: "Verifies claims and provides citations"
    provider: anthropic
    model: claude-3-sonnet

  - name: research-assistant
    description: "Comprehensive research assistant with specialist delegation"
    provider: anthropic
    model: claude-3-opus
    tools:
      - type: model_call
        model: web-searcher
      - type: model_call
        model: code-writer
      - type: model_call
        model: fact-checker

Questions for Discussion

  1. Should sub-agent responses be cached to reduce redundant calls?
  2. How do we handle authentication/API keys for different providers in sub-calls?
  3. Should there be a global "allow_model_calls" flag for security?
  4. How should streaming work when the orchestrator is also streaming?
  5. Should we support "tool chaining" where a sub-agent's tool call triggers another?

Labels: enhancement, feature-request, tools, agents, orchestration

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions