-
Notifications
You must be signed in to change notification settings - Fork 45
Description
Description
In current Agentic applications using the WSO2 AI Gateway, all available tools are exposed to the Large Language Model (LLM) by default. This "over-exposure" introduces three critical inefficiencies: degraded application performance due to processing overhead, a higher risk of LLM hallucinations when selecting irrelevant tools, and inflated token costs.
The proposed Semantic Tool Filtering Policy addresses these challenges by intelligently limiting the toolset to only the most contextually relevant options for each request. This is a critical feature for organizations where the number of available backend tools exceeds the optimal context window of the LLM.
Problem Statement
The "all-tools-at-once" approach has become a technical liability due to:
- Context Window Limitations: Efficient filtering ensures the meaning of the user's request isn't crowded out by metadata.
- Inference Precision: Reducing the "search space" minimizes the mathematical probability of the model picking the wrong tool.
- System Latency: Large payloads require more time for the AI Gateway to process and for the LLM to parse, creating a sluggish user experience.
Proposed Solution
We propose an intelligent and semantic tool filtering policy within the WSO2 AI Gateway. This policy executes a two-step retrieval process:
-
Vectorization: Both the user’s query and the metadata (names/descriptions) of all available tools are converted into high-dimensional vectors using an embedding model.
-
Semantic Ranking: The system calculates the Cosine Similarity between the query vector and the tool vectors.
-
Filtering: Only the tools that meet a specific relevance threshold (e.g., top 5 most relevant) are injected into the LLM prompt.
-
Caching Mechanism: To address performance overhead, a caching mechanism will be implemented for each API to store the tools embeddings list.
- Tool descriptions are hashed to check if embeddings are already in memory.
- If a new tool appears, it is appended to the list so embeddings can be used for subsequent requests.
Alternatives
No response
Version
0.1.0