-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Closed
Labels
Description
Feature Request
Add domain filtering options to the WebSearchTool in the OpenAI Agents SDK.
Problem Statement
Currently, the WebSearchTool in the OpenAI Agents SDK provides basic web search functionality without domain filtering capabilities:
from agents import Agent, WebSearchTool
agent = Agent(
name="Research Assistant",
tools=[WebSearchTool()],
instructions="You help with research tasks"
)This limitation prevents developers from:
- Focusing searches on specific trusted sources
- Excluding known unreliable or irrelevant domains
- Building agents that require domain-specific information (e.g., academic research, official documentation)
Proposed Solution
Add optional domain filtering parameters to the WebSearchTool constructor:
include_domains: List of domains to limit search results toexclude_domains: List of domains to exclude from search results
Example Implementation
Current Implementation:
from agents import Agent, WebSearchTool, Runner
agent = Agent(
name="Research Assistant",
tools=[WebSearchTool()],
instructions="Help users find reliable research papers"
)
# All web results are included, including potentially unreliable sources
result = await Runner.run(agent, "Find recent AI research papers")Proposed Enhancement:
from agents import Agent, WebSearchTool, Runner
# Academic research agent with domain filtering
academic_agent = Agent(
name="Academic Research Assistant",
tools=[
WebSearchTool(
include_domains=["arxiv.org", "nature.com", "science.org", "ieee.org"],
exclude_domains=["medium.com", "reddit.com", "quora.com"]
)
],
instructions="Find peer-reviewed research papers from academic sources"
)
# Official documentation agent
docs_agent = Agent(
name="Documentation Assistant",
tools=[
WebSearchTool(
include_domains=["docs.python.org", "docs.openai.com", "github.com"],
exclude_domains=["stackoverflow.com", "w3schools.com"]
)
],
instructions="Find official documentation and API references"
)
# News aggregation agent
news_agent = Agent(
name="News Assistant",
tools=[
WebSearchTool(
include_domains=["reuters.com", "apnews.com", "bbc.com"],
exclude_domains=["tabloid-site.com", "clickbait-news.com"]
)
],
instructions="Find news from reputable sources"
)Reference: Perplexity AI Implementation
Perplexity's web search API already supports this functionality:
import requests
response = requests.post(
"https://api.perplexity.ai/search",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
},
json={
"query": "machine learning research",
"include_domains": ["arxiv.org", "ieee.org", "acm.org"],
"exclude_domains": ["blogspot.com", "wordpress.com"]
}
)Use Cases
- Academic Research Agents: Restrict to .edu domains and peer-reviewed journals
- Technical Documentation Agents: Focus on official docs, exclude forums/Q&A sites
- Government Information Agents: Include only .gov domains
- Medical Information Agents: Restrict to verified medical sources
- Corporate Intelligence Agents: Focus on official company websites and press releases
Implementation Considerations
- Both parameters should be optional to maintain backward compatibility
- Support for wildcard patterns (e.g.,
*.edu,*.gov) would be valuable - Consider validation of domain format
- Document any limitations on the number of domains
- Since WebSearchTool is a hosted tool, the filtering would need to be implemented at the API level
Alternative API Design
If modifying the hosted tool is complex, consider allowing configuration through the tool's instantiation:
# Option 1: Constructor parameters (proposed above)
tool = WebSearchTool(
include_domains=["example.com"],
exclude_domains=["spam.com"]
)
# Option 2: Configuration object
tool = WebSearchTool(
config={
"include_domains": ["example.com"],
"exclude_domains": ["spam.com"],
"max_results": 10
}
)
# Option 3: Builder pattern
tool = (WebSearchTool()
.include_domains(["example.com"])
.exclude_domains(["spam.com"])
.build())Benefits
- Enables building specialized agents with reliable information sources
- Reduces noise and improves result quality
- Provides feature parity with competing solutions
- Allows compliance with organizational policies on information sources
- Improves agent reliability for production use cases
Related Enhancements
This feature would complement other potential WebSearchTool improvements:
- Date range filtering
- Language filtering
- SafeSearch options
- Custom ranking/sorting options
This enhancement would significantly improve the utility of the WebSearchTool for building production-ready agents that require high-quality, domain-specific information retrieval.