Sub-Agent Tool: Allow Models to Call Other Models via Inline Tools

## Summary

Enable models to call other models as tools, creating a sub-agent flow where an orchestrating model can delegate tasks to specialized models. The tool definition should automatically derive its description from the target model's `description` field in the config.

## Motivation

Many AI workflows benefit from model composition:
- A **general reasoning model** that routes to specialized models (code, math, vision)
- An **orchestrator** that delegates to fine-tuned classifiers or domain experts
- **Chain-of-thought verification** where one model checks another's work
- **Multi-modal pipelines** where a text model calls a vision model

Currently, users must implement this routing logic in application code. Adding native support for "model as tool" would make these patterns trivial to configure.

## Proposed Design

### Config Schema

```yaml
models:
  # Specialized models
  - name: code-expert
    description: "Expert at writing and debugging Python code. Best for coding tasks."
    provider: ollama
    model: deepseek-coder:6.7b

  - name: math-expert
    description: "Specialized in mathematical reasoning and calculations."
    provider: universal
    model: qwen2.5-math:7b

  - name: vision-analyzer
    description: "Analyzes images and describes visual content."
    provider: universal
    model: llava:7b

  # Orchestrator model with sub-agent tools
  - name: orchestrator
    description: "General-purpose assistant that can delegate to specialists."
    provider: ollama
    model: llama3.2:latest
    tools:
      # Inline tool that calls another model
      - type: model_call
        model: code-expert           # References model by name
        # description auto-pulled from code-expert.description
        
      - type: model_call
        model: math-expert
        override_description: "Use for complex math"  # Optional override
        
      - type: model_call
        model: vision-analyzer
        input_mapping:              # Optional: map tool params to model input
          image_url: "content[0].image_url"
```

### Auto-Generated Tool Schema

When `type: model_call` is specified, LlamaFarm should automatically generate a tool definition:

```json
{
  "type": "function",
  "function": {
    "name": "call_code_expert",
    "description": "Expert at writing and debugging Python code. Best for coding tasks.",
    "parameters": {
      "type": "object",
      "properties": {
        "prompt": {
          "type": "string",
          "description": "The prompt/question to send to the code expert model"
        },
        "context": {
          "type": "string",
          "description": "Optional additional context for the model"
        }
      },
      "required": ["prompt"]
    }
  }
}
```

### Execution Flow

1. Orchestrator model decides to call `call_code_expert` tool
2. LlamaFarm intercepts the tool call
3. Routes to `code-expert` model with the provided prompt
4. Returns response as tool result to orchestrator
5. Orchestrator incorporates result into its response

### Advanced Options

```yaml
tools:
  - type: model_call
    model: code-expert
    
    # Execution options
    max_tokens: 2000          # Limit sub-agent response length
    timeout_seconds: 30       # Timeout for sub-agent call
    
    # Tool behavior
    tool_name: "write_code"   # Custom tool name (default: call_{model_name})
    streaming: false          # Whether to stream sub-agent response
    
    # Context passing
    include_conversation: false  # Pass conversation history to sub-agent
    system_prompt_override: "You are a code assistant..."
    
    # Recursion control
    allow_nested_calls: false    # Prevent sub-agent from calling other models
    max_depth: 2                 # Max recursion depth if allowed
```

## Implementation Recommendations

### Phase 1: Basic Model-as-Tool
1. Add `model_call` tool type to config schema
2. Implement tool definition generation with description auto-pull
3. Add basic tool execution routing in the server
4. Support simple prompt/response flow

### Phase 2: Enhanced Routing
1. Input/output mapping for complex payloads
2. Support for passing images/files to vision models
3. Conversation context passing options
4. Timeout and error handling

### Phase 3: Advanced Orchestration
1. Recursive call support with depth limits
2. Streaming responses from sub-agents
3. Parallel sub-agent calls
4. Cost/latency tracking per sub-agent

## Code Changes Required

### 1. Config Schema (`config/datamodel.py`)

```python
class ModelCallTool(BaseModel):
    type: Literal["model_call"] = Field("model_call")
    model: str = Field(..., description="Name of the model to call")
    tool_name: Optional[str] = Field(None, description="Custom tool name")
    override_description: Optional[str] = Field(None)
    max_tokens: Optional[int] = Field(None)
    timeout_seconds: Optional[int] = Field(30)
    include_conversation: Optional[bool] = Field(False)
    allow_nested_calls: Optional[bool] = Field(False)

# Update Tool union type
Tool = Union[FunctionTool, ModelCallTool]
```

### 2. Tool Definition Generation

```python
def generate_model_call_tool_definition(
    tool_config: ModelCallTool,
    target_model: Model
) -> ToolDefinition:
    description = tool_config.override_description or target_model.description
    return ToolDefinition(
        name=tool_config.tool_name or f"call_{tool_config.model}",
        description=description or f"Call the {tool_config.model} model",
        parameters={
            "type": "object",
            "properties": {
                "prompt": {
                    "type": "string",
                    "description": f"The prompt to send to {tool_config.model}"
                }
            },
            "required": ["prompt"]
        }
    )
```

### 3. Tool Execution Handler

In `server/agents/` or appropriate location:

```python
async def handle_model_call_tool(
    tool_name: str,
    arguments: dict,
    config: LlamaFarmConfig,
    tool_config: ModelCallTool
) -> str:
    target_model = get_model_by_name(config, tool_config.model)
    
    # Build request for target model
    messages = [{"role": "user", "content": arguments["prompt"]}]
    
    # Call target model
    response = await call_model(
        model=target_model,
        messages=messages,
        max_tokens=tool_config.max_tokens,
        timeout=tool_config.timeout_seconds
    )
    
    return response.content
```

## Related Files

- `config/datamodel.py` - Schema definitions (Tool, Model classes)
- `server/agents/base/types.py` - ToolDefinition, ToolCallRequest
- `server/services/ml_model_service.py` - Model calling logic
- Existing tools implementation for reference pattern

## Example Use Case: Research Assistant

```yaml
models:
  - name: web-searcher
    description: "Searches the web and returns relevant information"
    provider: openai
    model: gpt-4-turbo
    mcp_servers: ["brave-search"]

  - name: code-writer
    description: "Writes clean, tested Python code"
    provider: ollama
    model: deepseek-coder:33b

  - name: fact-checker
    description: "Verifies claims and provides citations"
    provider: anthropic
    model: claude-3-sonnet

  - name: research-assistant
    description: "Comprehensive research assistant with specialist delegation"
    provider: anthropic
    model: claude-3-opus
    tools:
      - type: model_call
        model: web-searcher
      - type: model_call
        model: code-writer
      - type: model_call
        model: fact-checker
```

## Questions for Discussion

1. Should sub-agent responses be cached to reduce redundant calls?
2. How do we handle authentication/API keys for different providers in sub-calls?
3. Should there be a global "allow_model_calls" flag for security?
4. How should streaming work when the orchestrator is also streaming?
5. Should we support "tool chaining" where a sub-agent's tool call triggers another?

---

**Labels:** enhancement, feature-request, tools, agents, orchestration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sub-Agent Tool: Allow Models to Call Other Models via Inline Tools #709

Summary

Motivation

Proposed Design

Config Schema

Auto-Generated Tool Schema

Execution Flow

Advanced Options

Implementation Recommendations

Phase 1: Basic Model-as-Tool

Phase 2: Enhanced Routing

Phase 3: Advanced Orchestration

Code Changes Required

1. Config Schema (`config/datamodel.py`)

2. Tool Definition Generation

3. Tool Execution Handler

Related Files

Example Use Case: Research Assistant

Questions for Discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sub-Agent Tool: Allow Models to Call Other Models via Inline Tools #709

Description

Summary

Motivation

Proposed Design

Config Schema

Auto-Generated Tool Schema

Execution Flow

Advanced Options

Implementation Recommendations

Phase 1: Basic Model-as-Tool

Phase 2: Enhanced Routing

Phase 3: Advanced Orchestration

Code Changes Required

1. Config Schema (config/datamodel.py)

2. Tool Definition Generation

3. Tool Execution Handler

Related Files

Example Use Case: Research Assistant

Questions for Discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Config Schema (`config/datamodel.py`)