Skip to content

Commit ed63e2b

Browse files
Tools filtering logic (#166)
* isSystemTool added to tools * tools filtering: add rank-bm25 dependency * feat(progressive_discovery): dynamic tool discovery agent Add agent and utilities for dynamic tool selection to minimize LLM context size. - Introduce progressive_discovery agent for discovering and filtering tools - Add search tool and filter service modules - Provide config and usage examples - Include new tests for progressive discovery workflow * fix(logging): remove redundant string concatenations in warning messages and error formatting for clarity * refactor(config.yaml.example): update tool and mcp server definitions for clarity and discoverability * test(progressive_discovery): remove phase comments and add tests ensuring system tools are never filtered and always included in active tools * refactor(SystemBaseTool): introduce SystemBaseTool base class for system tools and update all core system tools to inherit it to unify system tool identification and improve filtering logic * fix(base_tool): ensure tool_name and description are only set if not explicitly defined in subclass to prevent unintended overrides * refactor(tool_filter_service): optimize tool word extraction by using precomputed documents to improve efficiency * refactor(progressive_discovery): replace dict-based custom_context with ProgressiveDiscoveryContext model for type safety and clarity * refactor(config.yaml.example): remove unused prompts section to simplify example configuration * chore(config): remove commented-out tool definitions from progressive_discovery example config to clean up and simplify the file * refactor(progressive_discovery): unify context usage by replacing custom_context with direct context fields and update related code and tests accordingly --------- Co-authored-by: Nikita Matsko <nikmd1306@gmail.com>
1 parent 32d99ad commit ed63e2b

22 files changed

+734
-20
lines changed
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Progressive Tool Discovery
2+
3+
Example agent demonstrating dynamic tool discovery for SGR Agent Core.
4+
5+
## Problem
6+
7+
When using multiple MCP servers (Jira, Confluence, GitHub, GDrive), each adds dozens of tools. With ~60 tools the LLM context becomes bloated — local models can't handle it, and paid APIs waste tokens on irrelevant tool descriptions.
8+
9+
## Solution
10+
11+
The agent starts with a minimal set of **system tools** (reasoning, planning, clarification, final answer) and dynamically discovers additional tools via `SearchToolsTool`.
12+
13+
```
14+
User query → Agent reasons → needs web search → calls SearchToolsTool("search the web")
15+
→ WebSearchTool discovered and added to active toolkit → Agent uses WebSearchTool
16+
```
17+
18+
### How it works
19+
20+
1. **Init**: Toolkit is split into system tools (subclasses of `SystemBaseTool`) and discoverable tools
21+
2. **Runtime**: Only system tools + already discovered tools are sent to LLM
22+
3. **Discovery**: Agent calls `SearchToolsTool` with a natural language query
23+
4. **Matching**: `ToolFilterService` uses BM25 ranking + regex keyword overlap to find relevant tools
24+
5. **Activation**: Matched tools are added to the active toolkit for subsequent calls
25+
26+
### Key components
27+
28+
| Component | Description |
29+
| --------------------------- | ------------------------------------------------------------- |
30+
| `ProgressiveDiscoveryAgent` | Agent subclass that manages system/discovered tool split |
31+
| `SearchToolsTool` | Meta-tool for discovering new tools by capability description |
32+
| `ToolFilterService` | Stateless BM25 + regex matching service |
33+
34+
## Usage
35+
36+
```bash
37+
cp config.yaml.example config.yaml
38+
# Edit config.yaml with your API key and MCP servers
39+
sgr --config-file config.yaml
40+
```
41+
42+
## Architecture
43+
44+
```
45+
ProgressiveDiscoveryAgent
46+
├── self.toolkit = [ReasoningTool, SearchToolsTool, ...] (system tools)
47+
├── context.all_tools = [WebSearchTool, ...] (discoverable)
48+
└── context.discovered_tools = [] (accumulates at runtime)
49+
```
50+
51+
`context` is a `ProgressiveDiscoveryContext(AgentContext)` — extends the base context with discovery-specific fields.
52+
53+
`_get_active_tools()` returns `system_tools + discovered_tools` — used by both `_prepare_tools()` and `_prepare_context()`.

examples/progressive_discovery/__init__.py

Whitespace-only changes.
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Progressive Discovery Agent Configuration
2+
#
3+
# This agent starts with minimal system tools and dynamically discovers
4+
# additional tools as needed via SearchToolsTool (BM25 + regex matching).
5+
#
6+
# Useful when you have many MCP servers with dozens of tools — keeps
7+
# the LLM context small and focused.
8+
9+
llm:
10+
model: "gpt-4o"
11+
base_url: "https://api.openai.com/v1"
12+
api_key: "sk-..."
13+
temperature: 0.1
14+
max_tokens: 16000
15+
16+
execution:
17+
max_iterations: 15
18+
max_clarifications: 2
19+
20+
# MCP servers provide additional tools that will be discoverable
21+
# (not loaded into context until agent searches for them)
22+
#
23+
# mcp:
24+
# mcpServers:
25+
# jira:
26+
# url: "https://your-jira-mcp-server.com/mcp"
27+
# github:
28+
# url: "https://your-github-mcp-server.com/mcp"
29+
30+
agents:
31+
progressive_discovery:
32+
base_class: "examples.progressive_discovery.progressive_discovery_agent.ProgressiveDiscoveryAgent"
33+
tools:
34+
- "reasoning_tool"
35+
- "clarification_tool"
36+
- "generate_plan_tool"
37+
- "adapt_plan_tool"
38+
- "create_report_tool"
39+
- "final_answer_tool"
40+
# Non-system tools — discoverable via SearchToolsTool
41+
- "web_search_tool"
42+
- "extract_page_content_tool"
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
from pydantic import Field
2+
3+
from sgr_agent_core.base_tool import BaseTool
4+
from sgr_agent_core.models import AgentContext
5+
6+
7+
class ProgressiveDiscoveryContext(AgentContext):
8+
"""Extended agent context for progressive discovery.
9+
10+
Inherits all standard AgentContext fields (iteration, state,
11+
searches, etc.) and adds tool lists used by the discovery mechanism.
12+
"""
13+
14+
all_tools: list[type[BaseTool]] = Field(
15+
default_factory=list, description="Full list of non-system tools available for discovery"
16+
)
17+
discovered_tools: list[type[BaseTool]] = Field(
18+
default_factory=list, description="Tools discovered so far via SearchToolsTool"
19+
)
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
from __future__ import annotations
2+
3+
from typing import Type
4+
5+
from openai import AsyncOpenAI, pydantic_function_tool
6+
7+
from sgr_agent_core.agent_definition import AgentConfig
8+
from sgr_agent_core.agents.sgr_tool_calling_agent import SGRToolCallingAgent
9+
from sgr_agent_core.base_tool import BaseTool, SystemBaseTool
10+
from sgr_agent_core.services.prompt_loader import PromptLoader
11+
12+
from .models import ProgressiveDiscoveryContext
13+
from .tools.search_tools_tool import SearchToolsTool
14+
15+
16+
class ProgressiveDiscoveryAgent(SGRToolCallingAgent):
17+
"""Agent that starts with minimal system tools and dynamically discovers
18+
additional tools via SearchToolsTool.
19+
20+
On init, splits the toolkit into:
21+
- system tools (subclasses of SystemBaseTool) -> self.toolkit (always available)
22+
- non-system tools -> stored in context.all_tools
23+
24+
SearchToolsTool is automatically added if not already present.
25+
Discovered tools accumulate in context.discovered_tools.
26+
"""
27+
28+
name: str = "progressive_discovery_agent"
29+
30+
def __init__(
31+
self,
32+
task_messages: list,
33+
openai_client: AsyncOpenAI,
34+
agent_config: AgentConfig,
35+
toolkit: list[Type[BaseTool]],
36+
def_name: str | None = None,
37+
**kwargs: dict,
38+
):
39+
system_tools = [t for t in toolkit if issubclass(t, SystemBaseTool)]
40+
non_system_tools = [t for t in toolkit if not issubclass(t, SystemBaseTool)]
41+
42+
if SearchToolsTool not in system_tools:
43+
system_tools.append(SearchToolsTool)
44+
45+
super().__init__(
46+
task_messages=task_messages,
47+
openai_client=openai_client,
48+
agent_config=agent_config,
49+
toolkit=system_tools,
50+
def_name=def_name,
51+
**kwargs,
52+
)
53+
54+
self._context = ProgressiveDiscoveryContext(
55+
all_tools=non_system_tools,
56+
)
57+
58+
def _get_active_tools(self) -> list[Type[BaseTool]]:
59+
"""Return system tools + discovered tools."""
60+
return list(self.toolkit) + list(self._context.discovered_tools)
61+
62+
async def _prepare_tools(self) -> list[dict]:
63+
"""Override to return only active tools (system + discovered)."""
64+
active_tools = self._get_active_tools()
65+
if self._context.iteration >= self.config.execution.max_iterations:
66+
raise RuntimeError("Max iterations reached")
67+
return [pydantic_function_tool(tool, name=tool.tool_name) for tool in active_tools]
68+
69+
async def _prepare_context(self) -> list[dict]:
70+
"""Override to pass only active tools to system prompt."""
71+
active_tools = self._get_active_tools()
72+
return [
73+
{"role": "system", "content": PromptLoader.get_system_prompt(active_tools, self.config.prompts)},
74+
*self.task_messages,
75+
{"role": "user", "content": PromptLoader.get_initial_user_request(self.task_messages, self.config.prompts)},
76+
*self.conversation,
77+
]

examples/progressive_discovery/services/__init__.py

Whitespace-only changes.
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
from __future__ import annotations
2+
3+
import re
4+
from typing import TYPE_CHECKING
5+
6+
from rank_bm25 import BM25Okapi
7+
8+
if TYPE_CHECKING:
9+
from sgr_agent_core.base_tool import BaseTool
10+
11+
12+
class ToolFilterService:
13+
"""Stateless service for filtering tools by relevance to a query.
14+
15+
Uses BM25 ranking + regex keyword overlap to find tools matching a
16+
query.
17+
"""
18+
19+
@classmethod
20+
def filter_tools(
21+
cls,
22+
query: str,
23+
tools: list[type[BaseTool]],
24+
bm25_threshold: float = 0.1,
25+
) -> list[type[BaseTool]]:
26+
"""Filter tools by relevance to query using BM25 + regex.
27+
28+
Args:
29+
query: Natural language description of needed capability.
30+
tools: Full list of available tool classes.
31+
bm25_threshold: Minimum BM25 score to consider a tool relevant.
32+
33+
Returns:
34+
List of tool classes matching the query.
35+
"""
36+
if not query or not query.strip() or not tools:
37+
return list(tools)
38+
39+
query_lower = query.strip().lower()
40+
41+
tool_documents = []
42+
for tool in tools:
43+
tool_name = (tool.tool_name or tool.__name__).lower()
44+
tool_description = (tool.description or "").lower()
45+
tool_documents.append(f"{tool_name} {tool_description}")
46+
47+
tokenized_docs = [doc.split() for doc in tool_documents]
48+
bm25 = BM25Okapi(tokenized_docs)
49+
50+
query_tokens = query_lower.split()
51+
scores = bm25.get_scores(query_tokens)
52+
53+
query_words = set(re.findall(r"\b\w+\b", query_lower))
54+
55+
filtered = []
56+
for i, tool in enumerate(tools):
57+
bm25_score = scores[i]
58+
59+
tool_words = set(re.findall(r"\b\w+\b", tool_documents[i]))
60+
has_regex_match = bool(query_words & tool_words)
61+
62+
if bm25_score > bm25_threshold or has_regex_match:
63+
filtered.append(tool)
64+
65+
return filtered
66+
67+
@classmethod
68+
def get_tool_summaries(cls, tools: list[type[BaseTool]]) -> str:
69+
"""Format tool list for LLM output.
70+
71+
Args:
72+
tools: List of tool classes to summarize.
73+
74+
Returns:
75+
Formatted string with tool names and descriptions.
76+
"""
77+
lines = []
78+
for i, tool in enumerate(tools, start=1):
79+
name = tool.tool_name or tool.__name__
80+
desc = tool.description or ""
81+
lines.append(f"{i}. {name}: {desc}")
82+
return "\n".join(lines)

examples/progressive_discovery/tools/__init__.py

Whitespace-only changes.
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
from __future__ import annotations
2+
3+
from typing import TYPE_CHECKING
4+
5+
from pydantic import Field
6+
7+
from sgr_agent_core.base_tool import SystemBaseTool
8+
9+
from ..models import ProgressiveDiscoveryContext
10+
from ..services.tool_filter_service import ToolFilterService
11+
12+
if TYPE_CHECKING:
13+
from sgr_agent_core.agent_definition import AgentConfig
14+
from sgr_agent_core.models import AgentContext
15+
16+
17+
class SearchToolsTool(SystemBaseTool):
18+
"""Search for available tools by capability description.
19+
20+
Use this tool when you need a capability that is not in your current
21+
toolkit. Describe what you need in natural language and matching
22+
tools will be added to your active toolkit for subsequent use.
23+
"""
24+
25+
query: str = Field(description="Natural language description of the capability you need (e.g. 'search the web')")
26+
27+
async def __call__(self, context: AgentContext, config: AgentConfig, **kwargs) -> str:
28+
if not isinstance(context, ProgressiveDiscoveryContext):
29+
return "Error: context is not initialized as ProgressiveDiscoveryContext"
30+
31+
if not context.all_tools:
32+
return "No additional tools available for discovery."
33+
34+
matched = ToolFilterService.filter_tools(self.query, context.all_tools)
35+
36+
already_discovered_names = {t.tool_name for t in context.discovered_tools}
37+
new_tools = [t for t in matched if t.tool_name not in already_discovered_names]
38+
39+
if not new_tools:
40+
return f"No new tools found for query '{self.query}'. Already discovered: {already_discovered_names}"
41+
42+
context.discovered_tools.extend(new_tools)
43+
44+
summary = ToolFilterService.get_tool_summaries(new_tools)
45+
return (
46+
f"Found {len(new_tools)} new tool(s) for '{self.query}':\n{summary}\n\n"
47+
"These tools are now available in your toolkit. You can use them in subsequent steps."
48+
)

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,8 @@ dependencies = [
4646
"uvicorn>=0.35.0",
4747
"fastmcp>=2.12.4",
4848
"jambo>=0.1.3.post2",
49+
# Tools filtering
50+
"rank-bm25>=0.2.2",
4951
]
5052

5153
[project.urls]

0 commit comments

Comments
 (0)