Skip to content

Commit 3e78a6c

Browse files
committed
feat(progressive_discovery): dynamic tool discovery agent
Add agent and utilities for dynamic tool selection to minimize LLM context size. - Introduce progressive_discovery agent for discovering and filtering tools - Add search tool and filter service modules - Provide config and usage examples - Include new tests for progressive discovery workflow
1 parent 4e53971 commit 3e78a6c

File tree

9 files changed

+623
-0
lines changed

9 files changed

+623
-0
lines changed
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Progressive Tool Discovery
2+
3+
Example agent demonstrating dynamic tool discovery for SGR Agent Core.
4+
5+
## Problem
6+
7+
When using multiple MCP servers (Jira, Confluence, GitHub, GDrive), each adds dozens of tools. With ~60 tools the LLM context becomes bloated — local models can't handle it, and paid APIs waste tokens on irrelevant tool descriptions.
8+
9+
## Solution
10+
11+
The agent starts with a minimal set of **system tools** (reasoning, planning, clarification, final answer) and dynamically discovers additional tools via `SearchToolsTool`.
12+
13+
```
14+
User query → Agent reasons → needs web search → calls SearchToolsTool("search the web")
15+
→ WebSearchTool discovered and added to active toolkit → Agent uses WebSearchTool
16+
```
17+
18+
### How it works
19+
20+
1. **Init**: Toolkit is split into system tools (`isSystemTool=True`) and discoverable tools
21+
2. **Runtime**: Only system tools + already discovered tools are sent to LLM
22+
3. **Discovery**: Agent calls `SearchToolsTool` with a natural language query
23+
4. **Matching**: `ToolFilterService` uses BM25 ranking + regex keyword overlap to find relevant tools
24+
5. **Activation**: Matched tools are added to the active toolkit for subsequent calls
25+
26+
### Key components
27+
28+
| Component | Description |
29+
| --------------------------- | ------------------------------------------------------------- |
30+
| `ProgressiveDiscoveryAgent` | Agent subclass that manages system/discovered tool split |
31+
| `SearchToolsTool` | Meta-tool for discovering new tools by capability description |
32+
| `ToolFilterService` | Stateless BM25 + regex matching service |
33+
34+
## Usage
35+
36+
```bash
37+
cp config.yaml.example config.yaml
38+
# Edit config.yaml with your API key and MCP servers
39+
sgr --config-file config.yaml
40+
```
41+
42+
## Architecture
43+
44+
```
45+
ProgressiveDiscoveryAgent
46+
├── self.toolkit = [ReasoningTool, SearchToolsTool, ...] (system tools)
47+
├── context.custom_context["all_tools"] = [WebSearchTool, ...] (discoverable)
48+
└── context.custom_context["discovered_tools"] = [] (accumulates at runtime)
49+
```
50+
51+
`_get_active_tools()` returns `system_tools + discovered_tools` — used by both `_prepare_tools()` and `_prepare_context()`.

examples/progressive_discovery/__init__.py

Whitespace-only changes.
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Progressive Discovery Agent Configuration
2+
#
3+
# This agent starts with minimal system tools and dynamically discovers
4+
# additional tools as needed via SearchToolsTool (BM25 + regex matching).
5+
#
6+
# Useful when you have many MCP servers with dozens of tools — keeps
7+
# the LLM context small and focused.
8+
9+
llm:
10+
model: "gpt-4o"
11+
base_url: "https://api.openai.com/v1"
12+
api_key: "sk-..."
13+
temperature: 0.1
14+
max_tokens: 16000
15+
16+
execution:
17+
max_iterations: 15
18+
max_clarifications: 2
19+
20+
prompts:
21+
system_prompt_path: "examples/progressive_discovery/prompts/system_prompt.txt"
22+
23+
# MCP servers provide additional tools that will be discoverable
24+
# (not loaded into context until agent searches for them)
25+
#
26+
# mcp:
27+
# servers:
28+
# - name: "jira"
29+
# command: "npx"
30+
# args: ["-y", "@anthropic/jira-mcp-server"]
31+
# env:
32+
# JIRA_URL: "https://your-org.atlassian.net"
33+
# JIRA_TOKEN: "your-token"
34+
#
35+
# - name: "github"
36+
# command: "npx"
37+
# args: ["-y", "@anthropic/github-mcp-server"]
38+
# env:
39+
# GITHUB_TOKEN: "your-token"
40+
41+
agents:
42+
progressive_discovery:
43+
base_class: "examples.progressive_discovery.progressive_discovery_agent.ProgressiveDiscoveryAgent"
44+
tools:
45+
- "sgr_agent_core.tools.reasoning_tool.ReasoningTool"
46+
- "sgr_agent_core.tools.clarification_tool.ClarificationTool"
47+
- "sgr_agent_core.tools.generate_plan_tool.GeneratePlanTool"
48+
- "sgr_agent_core.tools.adapt_plan_tool.AdaptPlanTool"
49+
- "sgr_agent_core.tools.create_report_tool.CreateReportTool"
50+
- "sgr_agent_core.tools.final_answer_tool.FinalAnswerTool"
51+
# Add any non-system tools here — they will be discoverable, not loaded by default
52+
# - "sgr_agent_core.tools.web_search_tool.WebSearchTool"
53+
# - "sgr_agent_core.tools.extract_page_content_tool.ExtractPageContentTool"
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
from __future__ import annotations
2+
3+
from typing import Type
4+
5+
from openai import AsyncOpenAI, pydantic_function_tool
6+
7+
from sgr_agent_core.agent_definition import AgentConfig
8+
from sgr_agent_core.agents.sgr_tool_calling_agent import SGRToolCallingAgent
9+
from sgr_agent_core.base_tool import BaseTool
10+
from sgr_agent_core.services.prompt_loader import PromptLoader
11+
12+
from .tools.search_tools_tool import SearchToolsTool
13+
14+
15+
class ProgressiveDiscoveryAgent(SGRToolCallingAgent):
16+
"""Agent that starts with minimal system tools and dynamically discovers
17+
additional tools via SearchToolsTool.
18+
19+
On init, splits the toolkit into:
20+
- system tools (isSystemTool=True) -> self.toolkit (always available)
21+
- non-system tools -> stored in context.custom_context["all_tools"]
22+
23+
SearchToolsTool is automatically added if not already present.
24+
Discovered tools accumulate in context.custom_context["discovered_tools"].
25+
"""
26+
27+
name: str = "progressive_discovery_agent"
28+
29+
def __init__(
30+
self,
31+
task_messages: list,
32+
openai_client: AsyncOpenAI,
33+
agent_config: AgentConfig,
34+
toolkit: list[Type[BaseTool]],
35+
def_name: str | None = None,
36+
**kwargs: dict,
37+
):
38+
system_tools = [t for t in toolkit if getattr(t, "isSystemTool", False)]
39+
non_system_tools = [t for t in toolkit if not getattr(t, "isSystemTool", False)]
40+
41+
if SearchToolsTool not in system_tools:
42+
system_tools.append(SearchToolsTool)
43+
44+
super().__init__(
45+
task_messages=task_messages,
46+
openai_client=openai_client,
47+
agent_config=agent_config,
48+
toolkit=system_tools,
49+
def_name=def_name,
50+
**kwargs,
51+
)
52+
53+
if self._context.custom_context is None:
54+
self._context.custom_context = {}
55+
self._context.custom_context["all_tools"] = non_system_tools
56+
self._context.custom_context["discovered_tools"] = []
57+
58+
def _get_active_tools(self) -> list[Type[BaseTool]]:
59+
"""Return system tools + discovered tools."""
60+
discovered = []
61+
if isinstance(self._context.custom_context, dict):
62+
discovered = self._context.custom_context.get("discovered_tools", [])
63+
return list(self.toolkit) + list(discovered)
64+
65+
async def _prepare_tools(self) -> list[dict]:
66+
"""Override to return only active tools (system + discovered)."""
67+
active_tools = self._get_active_tools()
68+
if self._context.iteration >= self.config.execution.max_iterations:
69+
raise RuntimeError("Max iterations reached")
70+
return [pydantic_function_tool(tool, name=tool.tool_name) for tool in active_tools]
71+
72+
async def _prepare_context(self) -> list[dict]:
73+
"""Override to pass only active tools to system prompt."""
74+
active_tools = self._get_active_tools()
75+
return [
76+
{"role": "system", "content": PromptLoader.get_system_prompt(active_tools, self.config.prompts)},
77+
*self.task_messages,
78+
{"role": "user", "content": PromptLoader.get_initial_user_request(self.task_messages, self.config.prompts)},
79+
*self.conversation,
80+
]

examples/progressive_discovery/services/__init__.py

Whitespace-only changes.
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
from __future__ import annotations
2+
3+
import re
4+
from typing import TYPE_CHECKING
5+
6+
from rank_bm25 import BM25Okapi
7+
8+
if TYPE_CHECKING:
9+
from sgr_agent_core.base_tool import BaseTool
10+
11+
12+
class ToolFilterService:
13+
"""Stateless service for filtering tools by relevance to a query.
14+
15+
Uses BM25 ranking + regex keyword overlap to find tools matching a
16+
query.
17+
"""
18+
19+
@classmethod
20+
def filter_tools(
21+
cls,
22+
query: str,
23+
tools: list[type[BaseTool]],
24+
bm25_threshold: float = 0.1,
25+
) -> list[type[BaseTool]]:
26+
"""Filter tools by relevance to query using BM25 + regex.
27+
28+
Args:
29+
query: Natural language description of needed capability.
30+
tools: Full list of available tool classes.
31+
bm25_threshold: Minimum BM25 score to consider a tool relevant.
32+
33+
Returns:
34+
List of tool classes matching the query.
35+
"""
36+
if not query or not query.strip() or not tools:
37+
return list(tools)
38+
39+
query_lower = query.strip().lower()
40+
41+
tool_documents = []
42+
for tool in tools:
43+
tool_name = (tool.tool_name or tool.__name__).lower()
44+
tool_description = (tool.description or "").lower()
45+
tool_documents.append(f"{tool_name} {tool_description}")
46+
47+
tokenized_docs = [doc.split() for doc in tool_documents]
48+
bm25 = BM25Okapi(tokenized_docs)
49+
50+
query_tokens = query_lower.split()
51+
scores = bm25.get_scores(query_tokens)
52+
53+
query_words = set(re.findall(r"\b\w+\b", query_lower))
54+
55+
filtered = []
56+
for i, tool in enumerate(tools):
57+
bm25_score = scores[i]
58+
59+
tool_name = (tool.tool_name or tool.__name__).lower()
60+
tool_description = (tool.description or "").lower()
61+
tool_words = set(re.findall(r"\b\w+\b", f"{tool_name} {tool_description}"))
62+
has_regex_match = bool(query_words & tool_words)
63+
64+
if bm25_score > bm25_threshold or has_regex_match:
65+
filtered.append(tool)
66+
67+
return filtered
68+
69+
@classmethod
70+
def get_tool_summaries(cls, tools: list[type[BaseTool]]) -> str:
71+
"""Format tool list for LLM output.
72+
73+
Args:
74+
tools: List of tool classes to summarize.
75+
76+
Returns:
77+
Formatted string with tool names and descriptions.
78+
"""
79+
lines = []
80+
for i, tool in enumerate(tools, start=1):
81+
name = tool.tool_name or tool.__name__
82+
desc = tool.description or ""
83+
lines.append(f"{i}. {name}: {desc}")
84+
return "\n".join(lines)

examples/progressive_discovery/tools/__init__.py

Whitespace-only changes.
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
from __future__ import annotations
2+
3+
from typing import TYPE_CHECKING
4+
5+
from pydantic import Field
6+
7+
from sgr_agent_core.base_tool import BaseTool
8+
9+
from ..services.tool_filter_service import ToolFilterService
10+
11+
if TYPE_CHECKING:
12+
from sgr_agent_core.agent_definition import AgentConfig
13+
from sgr_agent_core.models import AgentContext
14+
15+
16+
class SearchToolsTool(BaseTool):
17+
"""Search for available tools by capability description.
18+
19+
Use this tool when you need a capability that is not in your current
20+
toolkit. Describe what you need in natural language and matching
21+
tools will be added to your active toolkit for subsequent use.
22+
"""
23+
24+
isSystemTool = True
25+
26+
query: str = Field(description="Natural language description of the capability you need (e.g. 'search the web')")
27+
28+
async def __call__(self, context: AgentContext, config: AgentConfig, **kwargs) -> str:
29+
custom = context.custom_context
30+
if not isinstance(custom, dict):
31+
return "Error: custom_context is not initialized as dict"
32+
33+
all_tools = custom.get("all_tools", [])
34+
if not all_tools:
35+
return "No additional tools available for discovery."
36+
37+
discovered = custom.setdefault("discovered_tools", [])
38+
39+
matched = ToolFilterService.filter_tools(self.query, all_tools)
40+
41+
already_discovered_names = {t.tool_name for t in discovered}
42+
new_tools = [t for t in matched if t.tool_name not in already_discovered_names]
43+
44+
if not new_tools:
45+
return f"No new tools found for query '{self.query}'. Already discovered: {already_discovered_names}"
46+
47+
discovered.extend(new_tools)
48+
49+
summary = ToolFilterService.get_tool_summaries(new_tools)
50+
return (
51+
f"Found {len(new_tools)} new tool(s) for '{self.query}':\n{summary}\n\n"
52+
"These tools are now available in your toolkit. You can use them in subsequent steps."
53+
)

0 commit comments

Comments
 (0)