Skip to content

Commit a7d47e4

Browse files
cpsievertclaude
andauthored
feat: add provider-agnostic tool_web_search() and tool_web_fetch() (#248)
* feat: add provider-agnostic tool_web_search() and tool_web_fetch() Add built-in web search and URL fetch tools that work across providers: - tool_web_search(): Works with OpenAI, Anthropic, and Google - tool_web_fetch(): Works with Anthropic and Google Each provider automatically translates the tool configuration to its specific API format. Supports options like allowed_domains, blocked_domains, user_location, and max_uses where applicable. This is equivalent to tidyverse/ellmer#829 but with a cleaner provider-agnostic API (one function instead of separate functions per provider). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: wrap Google builtin tools in GoogleTool() GoogleSearch and UrlContext must be passed as keyword arguments to GoogleTool(), not directly to config.tools. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: add warning helper and use generic examples - Add _warn_unsupported() helper to reduce warning code duplication - Update examples to use generic domains (wikipedia.org, python.org) instead of Posit/Tidyverse-specific ones 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add content types for web search/fetch requests and results Add ContentWebSearchRequest, ContentWebSearchResults, ContentWebFetchRequest, and ContentWebFetchResults to represent built-in web tool interactions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: handle web search/fetch response types in providers - Anthropic: Handle server_tool_use, web_search_tool_result, and web_fetch_tool_result content types - OpenAI: Handle web_search_call output type 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add citation streaming support for Claude Handle citations_delta chunk type during streaming to accumulate citations on content blocks. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: document Claude web_fetch beta header requirement Update tool_web_fetch docstring to explain that Claude requires the anthropic-beta: web-fetch-2025-09-10 header. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add changelog entry for web search/fetch tools Document the new tool_web_search() and tool_web_fetch() functions and associated content types. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add web search/fetch to API reference - Add new "Built-in tools" section with tool_web_search and tool_web_fetch - Export new content types from chatlas.types module 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add built-in tools section to tools guide 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: restructure MCP tools guide to prioritize 3rd party servers - Add Quick start section with MCP Fetch server example - Add Finding MCP servers section with awesome-mcp-servers link - Reorder to show Stdio before HTTP (more common for 3rd party) - Move "Building your own server" section later - Rename "Motivating example" to "Advanced example: Code execution" - Add callout linking to built-in tool_web_fetch/tool_web_search alternatives - Update tools.qmd link to point to quick start section 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: use correct uvx command for MCP Fetch server The MCP Fetch server is run via `uvx mcp-server-fetch`, not npx. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: use ChatAnthropic for MCP Fetch examples OpenAI's function calling API doesn't support the "format": "uri" JSON Schema format used by the MCP Fetch server. Switch examples to use ChatAnthropic which has better MCP compatibility. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: strip unsupported JSON Schema format field for MCP tools OpenAI's function calling API doesn't support JSON Schema format hints like "uri" or "date-time". Strip these from MCP tool schemas so they work across all providers. Also reverts docs to use ChatOpenAI for MCP Fetch examples since the fix enables cross-provider compatibility. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: ensure all properties in required array for OpenAI compatibility OpenAI requires all properties to be listed in the required array. Update sanitize_schema to add all property keys to required. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: properly handle optional params in MCP tool schemas OpenAI requires all properties in the required array with strict mode. Optional parameters are indicated using anyOf with null type, matching how pydantic_function_tool generates schemas. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: use strict=False for MCP tools to preserve optional params MCP tools use standard JSON Schema conventions where optional params are simply not in the required array. Setting strict=False for OpenAI allows this to work properly, so the LLM won't be forced to provide values for all parameters. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Cleanup OpenAI web search content handling * Better error message * fix: OpenAI web search content -> message param * Add web search/fetch integration tests from ellmer PR #829 Port integration tests from tidyverse/ellmer#829: - Add assert_tool_web_fetch and assert_tool_web_search helpers to conftest.py - Add test_anthropic_web_fetch and test_anthropic_web_search - Add test_google_web_fetch and test_google_web_search - Add test_openai_web_search (OpenAI doesn't support web_fetch) Fix Anthropic provider to handle web search/fetch content in multi-turn conversations by explicitly storing only API-input-accepted fields (the API response includes output-only fields like citations, text, url that aren't accepted when replaying turns). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Cleanup after Claude * Rmove top-level imports * Rename web content types to match ellmer naming convention - ContentWebSearchRequest → ContentToolRequestSearch - ContentWebSearchResults → ContentToolResponseSearch - ContentWebFetchRequest → ContentToolRequestFetch - ContentWebFetchResults → ContentToolResponseFetch 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Remove MCP compatibility changes (split to separate PR) The MCP tool compatibility improvements (strict=False, sanitize_schema with format removal, doc restructure) have been moved to a separate PR (feat/mcp-tool-compat branch). This PR now focuses solely on the built-in web search/fetch tools: - tool_web_search() and tool_web_fetch() functions - Content types for web search/fetch requests and results - Provider integration tests The callout tip about built-in tools is kept in mcp-tools.qmd to help users discover the simpler alternative to MCP Fetch. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add test to make sure we're capturing citations * Simplify citations logic * Fix imports --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent f5ebf48 commit a7d47e4

22 files changed

+2893
-23
lines changed

CHANGELOG.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1111

1212
### New features
1313

14-
15-
* `ChatOpenAI()`, `ChatAnthropic()`, and `ChatGoogle()` gain a new `reasoning` parameter to easily opt-into, and fully customize, reasoning capabilities. (#202)
14+
* `ChatOpenAI()`, `ChatAnthropic()`, and `ChatGoogle()` gain a new `reasoning` parameter to easily opt-into, and fully customize, reasoning capabilities. (#202)
1615
* A new `ContentThinking` content type was added and captures the "thinking" portion of a reasoning model. (#192)
1716
* Added support for built-in provider tools via a new `ToolBuiltIn` class. This enables provider-specific functionality like OpenAI's image generation to be registered and used as tools. Built-in tools pass raw provider definitions directly to the API rather than wrapping Python functions. (#214)
1817
* `ChatGoogle()` gains basic support for image generation. (#214)
18+
* New `tool_web_search()` and `tool_web_fetch()` functions provide provider-agnostic access to built-in web search and URL fetch tools:
19+
* `tool_web_search()` is supported by OpenAI, Claude (Anthropic), and Google (Gemini).
20+
* `tool_web_fetch()` is supported by Claude (requires beta header) and Google.
21+
* New content types `ContentToolRequestSearch`, `ContentToolResponseSearch`, `ContentToolRequestFetch`, and `ContentToolResponseFetch` capture web tool interactions.
1922
* `ChatOpenAI()` and `ChatAzureOpenAI()` gain a new `service_tier` parameter to request a specific service tier (e.g., `"flex"` for slower/cheaper or `"priority"` for faster/more expensive). (#204)
2023
* `Chat` and `Turn` now have a `_repr_markdown_` method and an overall improved `repr()` experience. (#245)
2124

CLAUDE.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,14 @@ The project uses `uv` for package management and Make for common tasks:
5151
4. **Content-Based Messaging**: All communication uses structured `Content` objects rather than raw strings
5252
5. **Tool Integration**: Seamless function calling with automatic JSON schema generation from Python type hints
5353

54+
### Typing Best Practices
55+
56+
This project prioritizes strong typing that leverages provider SDK types directly:
57+
58+
- **Use provider SDK types**: Import and use types from `openai.types`, `anthropic.types`, `google.genai.types`, etc. rather than creating custom TypedDicts or dataclasses that mirror them. This ensures compatibility with SDK updates and provides better IDE support.
59+
- **Use `@overload` for provider-specific returns**: When a method returns different types based on a provider argument, use `@overload` with `Literal` types to give callers precise return type information.
60+
- **Explore SDK types interactively**: Use `python -c "from <sdk>.types import <Type>; print(<Type>.__annotations__)"` to inspect available fields and nested types when implementing provider-specific features.
61+
5462
### Testing Structure
5563

5664
- Tests are organized by component (e.g., `test_provider_openai.py`, `test_tools.py`)

chatlas/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
from ._provider_snowflake import ChatSnowflake
3636
from ._tokens import token_usage
3737
from ._tools import Tool, ToolBuiltIn, ToolRejectError
38+
from ._tools_builtin import tool_web_fetch, tool_web_search
3839
from ._turn import AssistantTurn, SystemTurn, Turn, UserTurn
3940

4041
try:
@@ -86,6 +87,8 @@
8687
"Tool",
8788
"ToolBuiltIn",
8889
"ToolRejectError",
90+
"tool_web_fetch",
91+
"tool_web_search",
8992
"Turn",
9093
"UserTurn",
9194
"SystemTurn",

chatlas/_content.py

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,10 @@ def from_tool(cls, tool: "Tool | ToolBuiltIn") -> "ToolInfo":
132132
"json",
133133
"pdf",
134134
"thinking",
135+
"web_search_request",
136+
"web_search_results",
137+
"web_fetch_request",
138+
"web_fetch_results",
135139
]
136140
"""
137141
A discriminated union of all content types.
@@ -622,6 +626,103 @@ def tagify(self):
622626
return HTML(html)
623627

624628

629+
class ContentToolRequestSearch(Content):
630+
"""
631+
A web search request from the model.
632+
633+
This content type represents the model's request to search the web.
634+
It's automatically generated when a built-in web search tool is used.
635+
636+
Parameters
637+
----------
638+
query
639+
The search query.
640+
extra
641+
The raw provider-specific response data.
642+
"""
643+
644+
query: str
645+
extra: Optional[dict[str, Any]] = None
646+
647+
content_type: ContentTypeEnum = "web_search_request"
648+
649+
def __str__(self):
650+
return f"[web search request]: {self.query!r}"
651+
652+
653+
class ContentToolResponseSearch(Content):
654+
"""
655+
Web search results from the model.
656+
657+
This content type represents the results of a web search.
658+
It's automatically generated when a built-in web search tool returns results.
659+
660+
Parameters
661+
----------
662+
urls
663+
The URLs returned by the search.
664+
extra
665+
The raw provider-specific response data.
666+
"""
667+
668+
urls: list[str]
669+
extra: Optional[dict[str, Any]] = None
670+
671+
content_type: ContentTypeEnum = "web_search_results"
672+
673+
def __str__(self):
674+
url_list = "\n".join(f"* {url}" for url in self.urls)
675+
return f"[web search results]:\n{url_list}"
676+
677+
678+
class ContentToolRequestFetch(Content):
679+
"""
680+
A web fetch request from the model.
681+
682+
This content type represents the model's request to fetch a URL.
683+
It's automatically generated when a built-in web fetch tool is used.
684+
685+
Parameters
686+
----------
687+
url
688+
The URL to fetch.
689+
extra
690+
The raw provider-specific response data.
691+
"""
692+
693+
url: str
694+
extra: Optional[dict[str, Any]] = None
695+
696+
content_type: ContentTypeEnum = "web_fetch_request"
697+
698+
def __str__(self):
699+
return f"[web fetch request]: {self.url}"
700+
701+
702+
class ContentToolResponseFetch(Content):
703+
"""
704+
Web fetch results from the model.
705+
706+
This content type represents the results of fetching a URL.
707+
It's automatically generated when a built-in web fetch tool returns results.
708+
709+
Parameters
710+
----------
711+
url
712+
The URL that was fetched.
713+
extra
714+
The raw provider-specific response data.
715+
"""
716+
717+
url: str
718+
extra: Optional[dict[str, Any]] = None
719+
720+
content_type: ContentTypeEnum = "web_fetch_results"
721+
722+
def __str__(self):
723+
return f"[web fetch result]: {self.url}"
724+
725+
625726
ContentUnion = Union[
626727
ContentText,
627728
ContentImageRemote,
@@ -631,6 +732,10 @@ def tagify(self):
631732
ContentJson,
632733
ContentPDF,
633734
ContentThinking,
735+
ContentToolRequestSearch,
736+
ContentToolResponseSearch,
737+
ContentToolRequestFetch,
738+
ContentToolResponseFetch,
634739
]
635740

636741

@@ -661,6 +766,14 @@ def create_content(data: dict[str, Any]) -> ContentUnion:
661766
return ContentPDF.model_validate(data)
662767
elif ct == "thinking":
663768
return ContentThinking.model_validate(data)
769+
elif ct == "web_search_request":
770+
return ContentToolRequestSearch.model_validate(data)
771+
elif ct == "web_search_results":
772+
return ContentToolResponseSearch.model_validate(data)
773+
elif ct == "web_fetch_request":
774+
return ContentToolRequestFetch.model_validate(data)
775+
elif ct == "web_fetch_results":
776+
return ContentToolResponseFetch.model_validate(data)
664777
else:
665778
raise ValueError(f"Unknown content type: {ct}")
666779

chatlas/_provider_anthropic.py

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,10 @@
2727
ContentText,
2828
ContentThinking,
2929
ContentToolRequest,
30+
ContentToolRequestFetch,
31+
ContentToolRequestSearch,
32+
ContentToolResponseFetch,
33+
ContentToolResponseSearch,
3034
ContentToolResult,
3135
)
3236
from ._logging import log_model_default
@@ -39,6 +43,7 @@
3943
)
4044
from ._tokens import get_price_info
4145
from ._tools import Tool, ToolBuiltIn, basemodel_to_param_schema
46+
from ._tools_builtin import ToolWebFetch, ToolWebSearch
4247
from ._turn import AssistantTurn, SystemTurn, Turn, UserTurn, user_turn
4348
from ._utils import split_http_client_kwargs
4449

@@ -494,6 +499,11 @@ def stream_merge_chunks(self, completion, chunk):
494499
elif chunk.delta.type == "signature_delta":
495500
this_content = cast("ThinkingBlock", this_content)
496501
this_content.signature += chunk.delta.signature
502+
elif chunk.delta.type == "citations_delta":
503+
# https://docs.claude.com/en/docs/build-with-claude/citations#streaming-support
504+
# Accumulate citations on the content block
505+
if hasattr(this_content, "citations"):
506+
this_content.citations.append(chunk.delta.citation) # type: ignore
497507
elif chunk.type == "content_block_stop":
498508
this_content = completion.content[chunk.index]
499509
if this_content.type == "tool_use" and isinstance(this_content.input, str):
@@ -695,11 +705,28 @@ def _as_content_block(content: Content) -> "ContentBlockParam":
695705
"thinking": content.thinking,
696706
"signature": extra.get("signature", ""),
697707
}
708+
elif isinstance(
709+
content,
710+
(
711+
ContentToolRequestSearch,
712+
ContentToolResponseSearch,
713+
ContentToolRequestFetch,
714+
ContentToolResponseFetch,
715+
),
716+
):
717+
# extra contains the full original content block param
718+
return cast("ContentBlockParam", content.extra)
698719

699720
raise ValueError(f"Unknown content type: {type(content)}")
700721

701722
@staticmethod
702723
def _anthropic_tool_schema(tool: "Tool | ToolBuiltIn") -> "ToolUnionParam":
724+
if isinstance(tool, ToolWebSearch):
725+
return tool.get_definition("anthropic")
726+
if isinstance(tool, ToolWebFetch):
727+
# N.B. seems the return type here (BetaWebFetchTool20250910Param) is
728+
# not a member of ToolUnionParam since it's still in beta?
729+
return tool.get_definition("anthropic") # type: ignore
703730
if isinstance(tool, ToolBuiltIn):
704731
return tool.definition # type: ignore
705732

@@ -757,6 +784,76 @@ def _as_turn(self, completion: Message, has_data_model=False) -> AssistantTurn:
757784
extra={"signature": content.signature},
758785
)
759786
)
787+
elif content.type == "server_tool_use":
788+
# Unfortunately, content.model_dump() includes fields like "url"
789+
# that aren't acceptable as API input, so we manually construct
790+
# the extra dict
791+
if isinstance(content.input, str):
792+
input_data = orjson.loads(content.input)
793+
else:
794+
input_data = content.input
795+
796+
extra = {
797+
"type": content.type,
798+
"id": content.id,
799+
"name": content.name,
800+
"input": input_data,
801+
}
802+
# https://docs.claude.com/en/docs/agents-and-tools/tool-use/web-search-tool#response
803+
if content.name == "web_search":
804+
contents.append(
805+
ContentToolRequestSearch(
806+
query=str(input_data.get("query", "")),
807+
extra=extra,
808+
)
809+
)
810+
# https://docs.claude.com/en/docs/agents-and-tools/tool-use/web-fetch-tool#response
811+
elif content.name == "web_fetch":
812+
# N.B. type checker thinks this is unreachable due to
813+
# ToolUnionParam not including BetaWebFetchTool20250910Param
814+
# yet
815+
contents.append(
816+
ContentToolRequestFetch(
817+
url=str(input_data.get("url", "")),
818+
extra=extra,
819+
)
820+
)
821+
else:
822+
raise ValueError(f"Unknown server tool: {content.name}")
823+
elif content.type == "web_search_tool_result":
824+
# https://docs.claude.com/en/docs/agents-and-tools/tool-use/web-search-tool#response
825+
urls: list[str] = []
826+
if isinstance(content.content, list):
827+
urls = [x.url for x in content.content]
828+
contents.append(
829+
ContentToolResponseSearch(
830+
urls=urls,
831+
extra=content.model_dump(),
832+
)
833+
)
834+
elif content.type == "web_fetch_tool_result":
835+
# N.B. type checker thinks this is unreachable due to
836+
# ToolUnionParam not including BetaWebFetchTool20250910Param
837+
# yet. Also, at run-time, the SDK is currently giving non-sense
838+
# of type(content) == TextBlock, but it doesn't even fit that
839+
# shape?!? Anyway, content.content has a dict with the content
840+
# we want.
841+
content_fetch = cast("dict", getattr(content, "content", {}))
842+
if not content_fetch:
843+
raise ValueError(
844+
"web_fetch_tool_result content is empty. Please report this issue."
845+
)
846+
extra = {
847+
"type": "web_fetch_tool_result",
848+
"tool_use_id": content.tool_use_id, # type: ignore
849+
"content": content_fetch,
850+
}
851+
contents.append(
852+
ContentToolResponseFetch(
853+
url=content_fetch.get("url", "failed"),
854+
extra=extra,
855+
)
856+
)
760857

761858
return AssistantTurn(
762859
contents,

chatlas/_provider_google.py

Lines changed: 30 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
from ._provider import ModelInfo, Provider, StandardModelParamNames, StandardModelParams
2323
from ._tokens import get_price_info
2424
from ._tools import Tool, ToolBuiltIn
25+
from ._tools_builtin import ToolWebFetch, ToolWebSearch
2526
from ._turn import AssistantTurn, SystemTurn, Turn, UserTurn, user_turn
2627

2728
if TYPE_CHECKING:
@@ -295,7 +296,11 @@ def _chat_perform_args(
295296
data_model: Optional[type[BaseModel]] = None,
296297
kwargs: Optional["SubmitInputArgs"] = None,
297298
) -> "SubmitInputArgs":
298-
from google.genai.types import FunctionDeclaration, GenerateContentConfig
299+
from google.genai.types import (
300+
FunctionDeclaration,
301+
GenerateContentConfig,
302+
ToolListUnion,
303+
)
299304
from google.genai.types import Tool as GoogleTool
300305

301306
kwargs_full: "SubmitInputArgs" = {
@@ -319,20 +324,30 @@ def _chat_perform_args(
319324
config.response_mime_type = "application/json"
320325

321326
if tools:
322-
config.tools = [
323-
GoogleTool(
324-
function_declarations=[
325-
FunctionDeclaration.from_callable(
326-
client=self._client._api_client,
327-
callable=tool.func,
328-
)
329-
for tool in tools.values()
330-
# TODO: to support built-in tools, we may need a way to make
331-
# tool names (e.g., google_search to google.genai.types.GoogleSearch())
332-
if isinstance(tool, Tool)
333-
]
334-
)
335-
]
327+
google_tools: ToolListUnion = []
328+
for tool in tools.values():
329+
if isinstance(tool, ToolWebSearch):
330+
gtool = GoogleTool(google_search=tool.get_definition("google"))
331+
google_tools.append(gtool)
332+
elif isinstance(tool, ToolWebFetch):
333+
gtool = GoogleTool(url_context=tool.get_definition("google"))
334+
google_tools.append(gtool)
335+
elif isinstance(tool, ToolBuiltIn):
336+
gtool = GoogleTool.model_validate(tool.definition)
337+
google_tools.append(gtool)
338+
else:
339+
gtool = GoogleTool(
340+
function_declarations=[
341+
FunctionDeclaration.from_callable(
342+
client=self._client._api_client,
343+
callable=tool.func,
344+
)
345+
]
346+
)
347+
google_tools.append(gtool)
348+
349+
if google_tools:
350+
config.tools = google_tools
336351

337352
kwargs_full["config"] = config
338353

0 commit comments

Comments
 (0)