Skip to content

Add domain filtering options to WebSearchTool #1542

@jay-dhamale

Description

@jay-dhamale

Feature Request

Add domain filtering options to the WebSearchTool in the OpenAI Agents SDK.

Problem Statement

Currently, the WebSearchTool in the OpenAI Agents SDK provides basic web search functionality without domain filtering capabilities:

from agents import Agent, WebSearchTool

agent = Agent(
    name="Research Assistant",
    tools=[WebSearchTool()],
    instructions="You help with research tasks"
)

This limitation prevents developers from:

  • Focusing searches on specific trusted sources
  • Excluding known unreliable or irrelevant domains
  • Building agents that require domain-specific information (e.g., academic research, official documentation)

Proposed Solution

Add optional domain filtering parameters to the WebSearchTool constructor:

  1. include_domains: List of domains to limit search results to
  2. exclude_domains: List of domains to exclude from search results

Example Implementation

Current Implementation:

from agents import Agent, WebSearchTool, Runner

agent = Agent(
    name="Research Assistant",
    tools=[WebSearchTool()],
    instructions="Help users find reliable research papers"
)

# All web results are included, including potentially unreliable sources
result = await Runner.run(agent, "Find recent AI research papers")

Proposed Enhancement:

from agents import Agent, WebSearchTool, Runner

# Academic research agent with domain filtering
academic_agent = Agent(
    name="Academic Research Assistant",
    tools=[
        WebSearchTool(
            include_domains=["arxiv.org", "nature.com", "science.org", "ieee.org"],
            exclude_domains=["medium.com", "reddit.com", "quora.com"]
        )
    ],
    instructions="Find peer-reviewed research papers from academic sources"
)

# Official documentation agent
docs_agent = Agent(
    name="Documentation Assistant",
    tools=[
        WebSearchTool(
            include_domains=["docs.python.org", "docs.openai.com", "github.com"],
            exclude_domains=["stackoverflow.com", "w3schools.com"]
        )
    ],
    instructions="Find official documentation and API references"
)

# News aggregation agent
news_agent = Agent(
    name="News Assistant",
    tools=[
        WebSearchTool(
            include_domains=["reuters.com", "apnews.com", "bbc.com"],
            exclude_domains=["tabloid-site.com", "clickbait-news.com"]
        )
    ],
    instructions="Find news from reputable sources"
)

Reference: Perplexity AI Implementation

Perplexity's web search API already supports this functionality:

import requests

response = requests.post(
    "https://api.perplexity.ai/search",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    },
    json={
        "query": "machine learning research",
        "include_domains": ["arxiv.org", "ieee.org", "acm.org"],
        "exclude_domains": ["blogspot.com", "wordpress.com"]
    }
)

Use Cases

  1. Academic Research Agents: Restrict to .edu domains and peer-reviewed journals
  2. Technical Documentation Agents: Focus on official docs, exclude forums/Q&A sites
  3. Government Information Agents: Include only .gov domains
  4. Medical Information Agents: Restrict to verified medical sources
  5. Corporate Intelligence Agents: Focus on official company websites and press releases

Implementation Considerations

  • Both parameters should be optional to maintain backward compatibility
  • Support for wildcard patterns (e.g., *.edu, *.gov) would be valuable
  • Consider validation of domain format
  • Document any limitations on the number of domains
  • Since WebSearchTool is a hosted tool, the filtering would need to be implemented at the API level

Alternative API Design

If modifying the hosted tool is complex, consider allowing configuration through the tool's instantiation:

# Option 1: Constructor parameters (proposed above)
tool = WebSearchTool(
    include_domains=["example.com"],
    exclude_domains=["spam.com"]
)

# Option 2: Configuration object
tool = WebSearchTool(
    config={
        "include_domains": ["example.com"],
        "exclude_domains": ["spam.com"],
        "max_results": 10
    }
)

# Option 3: Builder pattern
tool = (WebSearchTool()
    .include_domains(["example.com"])
    .exclude_domains(["spam.com"])
    .build())

Benefits

  • Enables building specialized agents with reliable information sources
  • Reduces noise and improves result quality
  • Provides feature parity with competing solutions
  • Allows compliance with organizational policies on information sources
  • Improves agent reliability for production use cases

Related Enhancements

This feature would complement other potential WebSearchTool improvements:

  • Date range filtering
  • Language filtering
  • SafeSearch options
  • Custom ranking/sorting options

This enhancement would significantly improve the utility of the WebSearchTool for building production-ready agents that require high-quality, domain-specific information retrieval.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions