pydantic
diff --git a/‎docs/agents.md
Lines changed: 1 addition & 1 deletion b/‎docs/agents.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/api/ext.md
Lines changed: 5 additions & 0 deletions b/‎docs/api/ext.md
Lines changed: 5 additions & 0 deletions
diff --git a/‎docs/api/output.md
Lines changed: 1 addition & 0 deletions b/‎docs/api/output.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/api/toolsets.md
Lines changed: 14 additions & 0 deletions b/‎docs/api/toolsets.md
Lines changed: 14 additions & 0 deletions
diff --git a/‎docs/mcp/client.md
Lines changed: 61 additions & 98 deletions b/‎docs/mcp/client.md
Lines changed: 61 additions & 98 deletions
diff --git a/‎docs/models/huggingface.md
Lines changed: 1 addition & 1 deletion b/‎docs/models/huggingface.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/output.md
Lines changed: 4 additions & 2 deletions b/‎docs/output.md
Lines changed: 4 additions & 2 deletions
diff --git a/‎docs/testing.md
Lines changed: 1 addition & 1 deletion b/‎docs/testing.md
Lines changed: 1 addition & 1 deletion
@@ -826,7 +826,7 @@ with capture_run_messages() as messages:  # (2)!
         result = agent.run_sync('Please get me the volume of a box with size 6.')
     except UnexpectedModelBehavior as e:
         print('An error occurred:', e)
-        #> An error occurred: Tool exceeded max retries count of 1
+        #> An error occurred: Tool 'calc_volume' exceeded max retries count of 1
         print('cause:', repr(e.__cause__))
         #> cause: ModelRetry('Please try again.')
         print('messages:', messages)
 
@@ -0,0 +1,5 @@
+# `pydantic_ai.ext`
+
+::: pydantic_ai.ext.langchain
+
+::: pydantic_ai.ext.aci
@@ -10,3 +10,4 @@
             - PromptedOutput
             - TextOutput
             - StructuredDict
+            - DeferredToolCalls
@@ -0,0 +1,14 @@
+# `pydantic_ai.toolsets`
+
+::: pydantic_ai.toolsets
+    options:
+        members:
+        - AbstractToolset
+        - CombinedToolset
+        - DeferredToolset
+        - FilteredToolset
+        - FunctionToolset
+        - PrefixedToolset
+        - RenamedToolset
+        - PreparedToolset
+        - WrapperToolset
@@ -16,42 +16,54 @@ pip/uv-add "pydantic-ai-slim[mcp]"
 
 ## Usage
 
-PydanticAI comes with two ways to connect to MCP servers:
+PydanticAI comes with three ways to connect to MCP servers:
 
-- [`MCPServerSSE`][pydantic_ai.mcp.MCPServerSSE] which connects to an MCP server using the [HTTP SSE](https://spec.modelcontextprotocol.io/specification/2024-11-05/basic/transports/#http-with-sse) transport
 - [`MCPServerStreamableHTTP`][pydantic_ai.mcp.MCPServerStreamableHTTP] which connects to an MCP server using the [Streamable HTTP](https://modelcontextprotocol.io/introduction#streamable-http) transport
+- [`MCPServerSSE`][pydantic_ai.mcp.MCPServerSSE] which connects to an MCP server using the [HTTP SSE](https://spec.modelcontextprotocol.io/specification/2024-11-05/basic/transports/#http-with-sse) transport
 - [`MCPServerStdio`][pydantic_ai.mcp.MCPServerStdio] which runs the server as a subprocess and connects to it using the [stdio](https://spec.modelcontextprotocol.io/specification/2024-11-05/basic/transports/#stdio) transport
 
-Examples of both are shown below; [mcp-run-python](run-python.md) is used as the MCP server in both examples.
+Examples of all three are shown below; [mcp-run-python](run-python.md) is used as the MCP server in all examples.
 
-### SSE Client
+Each MCP server instance is a [toolset](../toolsets.md) and can be registered with an [`Agent`][pydantic_ai.Agent] using the `toolsets` argument.
 
-[`MCPServerSSE`][pydantic_ai.mcp.MCPServerSSE] connects over HTTP using the [HTTP + Server Sent Events transport](https://spec.modelcontextprotocol.io/specification/2024-11-05/basic/transports/#http-with-sse) to a server.
+You can use the [`async with agent`][pydantic_ai.Agent.__aenter__] context manager to open and close connections to all registered servers (and in the case of stdio servers, start and stop the subprocesses) around the context where they'll be used in agent runs. You can also use [`async with server`][pydantic_ai.mcp.MCPServer.__aenter__] to manage the connection or subprocess of a specific server, for example if you'd like to use it with multiple agents. If you don't explicitly enter one of these context managers to set up the server, this will be done automatically when it's needed (e.g. to list the available tools or call a specific tool), but it's more efficient to do so around the entire context where you expect the servers to be used.
+
+### Streamable HTTP Client
+
+[`MCPServerStreamableHTTP`][pydantic_ai.mcp.MCPServerStreamableHTTP] connects over HTTP using the
+[Streamable HTTP](https://modelcontextprotocol.io/introduction#streamable-http) transport to a server.
 
 !!! note
-    [`MCPServerSSE`][pydantic_ai.mcp.MCPServerSSE] requires an MCP server to be running and accepting HTTP connections before calling [`agent.run_mcp_servers()`][pydantic_ai.Agent.run_mcp_servers]. Running the server is not managed by PydanticAI.
+    [`MCPServerStreamableHTTP`][pydantic_ai.mcp.MCPServerStreamableHTTP] requires an MCP server to be
+    running and accepting HTTP connections before running the agent. Running the server is not
+    managed by Pydantic AI.
 
-The name "HTTP" is used since this implementation will be adapted in future to use the new
-[Streamable HTTP](https://github.com/modelcontextprotocol/specification/pull/206) currently in development.
+Before creating the Streamable HTTP client, we need to run a server that supports the Streamable HTTP transport.
 
-Before creating the SSE client, we need to run the server (docs [here](run-python.md)):
+```python {title="streamable_http_server.py" py="3.10" dunder_name="not_main"}
+from mcp.server.fastmcp import FastMCP
 
-```bash {title="terminal (run sse server)"}
-deno run \
-  -N -R=node_modules -W=node_modules --node-modules-dir=auto \
-  jsr:@pydantic/mcp-run-python sse
+app = FastMCP()
+
+@app.tool()
+def add(a: int, b: int) -> int:
+    return a + b
+
+if __name__ == '__main__':
+    app.run(transport='streamable-http')
 ```
 
-```python {title="mcp_sse_client.py" py="3.10"}
-from pydantic_ai import Agent
-from pydantic_ai.mcp import MCPServerSSE
+Then we can create the client:
 
-server = MCPServerSSE(url='http://localhost:3001/sse')  # (1)!
-agent = Agent('openai:gpt-4o', mcp_servers=[server])  # (2)!
+```python {title="mcp_streamable_http_client.py" py="3.10"}
+from pydantic_ai import Agent
+from pydantic_ai.mcp import MCPServerStreamableHTTP
 
+server = MCPServerStreamableHTTP('http://localhost:8000/mcp')  # (1)!
+agent = Agent('openai:gpt-4o', toolsets=[server])  # (2)!
 
 async def main():
-    async with agent.run_mcp_servers():  # (3)!
+    async with agent:  # (3)!
         result = await agent.run('How many days between 2000-01-01 and 2025-03-18?')
     print(result.output)
     #> There are 9,208 days between January 1, 2000, and March 18, 2025.
@@ -85,43 +97,34 @@ Will display as follows:
 
 ![Logfire run python code](../img/logfire-run-python-code.png)
 
-### Streamable HTTP Client
+### SSE Client
 
-[`MCPServerStreamableHTTP`][pydantic_ai.mcp.MCPServerStreamableHTTP] connects over HTTP using the
-[Streamable HTTP](https://modelcontextprotocol.io/introduction#streamable-http) transport to a server.
+[`MCPServerSSE`][pydantic_ai.mcp.MCPServerSSE] connects over HTTP using the [HTTP + Server Sent Events transport](https://spec.modelcontextprotocol.io/specification/2024-11-05/basic/transports/#http-with-sse) to a server.
 
 !!! note
-    [`MCPServerStreamableHTTP`][pydantic_ai.mcp.MCPServerStreamableHTTP] requires an MCP server to be
-    running and accepting HTTP connections before calling
-    [`agent.run_mcp_servers()`][pydantic_ai.Agent.run_mcp_servers]. Running the server is not
-    managed by PydanticAI.
-
-Before creating the Streamable HTTP client, we need to run a server that supports the Streamable HTTP transport.
-
-```python {title="streamable_http_server.py" py="3.10" dunder_name="not_main"}
-from mcp.server.fastmcp import FastMCP
+    [`MCPServerSSE`][pydantic_ai.mcp.MCPServerSSE] requires an MCP server to be running and accepting HTTP connections before running the agent. Running the server is not managed by Pydantic AI.
 
-app = FastMCP()
+The name "HTTP" is used since this implementation will be adapted in future to use the new
+[Streamable HTTP](https://github.com/modelcontextprotocol/specification/pull/206) currently in development.
 
-@app.tool()
-def add(a: int, b: int) -> int:
-    return a + b
+Before creating the SSE client, we need to run the server (docs [here](run-python.md)):
 
-if __name__ == '__main__':
-    app.run(transport='streamable-http')
+```bash {title="terminal (run sse server)"}
+deno run \
+  -N -R=node_modules -W=node_modules --node-modules-dir=auto \
+  jsr:@pydantic/mcp-run-python sse
 ```
 
-Then we can create the client:
-
-```python {title="mcp_streamable_http_client.py" py="3.10"}
+```python {title="mcp_sse_client.py" py="3.10"}
 from pydantic_ai import Agent
-from pydantic_ai.mcp import MCPServerStreamableHTTP
+from pydantic_ai.mcp import MCPServerSSE
+
+server = MCPServerSSE(url='http://localhost:3001/sse')  # (1)!
+agent = Agent('openai:gpt-4o', toolsets=[server])  # (2)!
 
-server = MCPServerStreamableHTTP('http://localhost:8000/mcp')  # (1)!
-agent = Agent('openai:gpt-4o', mcp_servers=[server])  # (2)!
 
 async def main():
-    async with agent.run_mcp_servers():  # (3)!
+    async with agent:  # (3)!
         result = await agent.run('How many days between 2000-01-01 and 2025-03-18?')
     print(result.output)
     #> There are 9,208 days between January 1, 2000, and March 18, 2025.
@@ -137,9 +140,6 @@ _(This example is complete, it can be run "as is" with Python 3.10+ — you'll n
 
 The other transport offered by MCP is the [stdio transport](https://spec.modelcontextprotocol.io/specification/2024-11-05/basic/transports/#stdio) where the server is run as a subprocess and communicates with the client over `stdin` and `stdout`. In this case, you'd use the [`MCPServerStdio`][pydantic_ai.mcp.MCPServerStdio] class.
 
-!!! note
-    When using [`MCPServerStdio`][pydantic_ai.mcp.MCPServerStdio] servers, the [`agent.run_mcp_servers()`][pydantic_ai.Agent.run_mcp_servers] context manager is responsible for starting and stopping the server.
-
 ```python {title="mcp_stdio_client.py" py="3.10"}
 from pydantic_ai import Agent
 from pydantic_ai.mcp import MCPServerStdio
@@ -156,11 +156,11 @@ server = MCPServerStdio(  # (1)!
         'stdio',
     ]
 )
-agent = Agent('openai:gpt-4o', mcp_servers=[server])
+agent = Agent('openai:gpt-4o', toolsets=[server])
 
 
 async def main():
-    async with agent.run_mcp_servers():
+    async with agent:
         result = await agent.run('How many days between 2000-01-01 and 2025-03-18?')
     print(result.output)
     #> There are 9,208 days between January 1, 2000, and March 18, 2025.
@@ -188,23 +188,23 @@ from pydantic_ai.tools import RunContext
 async def process_tool_call(
     ctx: RunContext[int],
     call_tool: CallToolFunc,
-    tool_name: str,
-    args: dict[str, Any],
+    name: str,
+    tool_args: dict[str, Any],
 ) -> ToolResult:
     """A tool call processor that passes along the deps."""
-    return await call_tool(tool_name, args, metadata={'deps': ctx.deps})
+    return await call_tool(name, tool_args, {'deps': ctx.deps})
 
 
 server = MCPServerStdio('python', ['mcp_server.py'], process_tool_call=process_tool_call)
 agent = Agent(
     model=TestModel(call_tools=['echo_deps']),
     deps_type=int,
-    mcp_servers=[server]
+    toolsets=[server]
 )
 
 
 async def main():
-    async with agent.run_mcp_servers():
+    async with agent:
         result = await agent.run('Echo with deps set to 42', deps=42)
     print(result.output)
     #> {"echo_deps":{"echo":"This is an echo message","deps":42}}
@@ -214,15 +214,7 @@ async def main():
 
 When connecting to multiple MCP servers that might provide tools with the same name, you can use the `tool_prefix` parameter to avoid naming conflicts. This parameter adds a prefix to all tool names from a specific server.
 
-### How It Works
-
-- If `tool_prefix` is set, all tools from that server will be prefixed with `{tool_prefix}_`
-- When listing tools, the prefixed names are shown to the model
-- When calling tools, the prefix is automatically removed before sending the request to the server
-
-This allows you to use multiple servers that might have overlapping tool names without conflicts.
-
-### Example with HTTP Server
+This allows you to use multiple servers that might have overlapping tool names without conflicts:
 
 ```python {title="mcp_tool_prefix_http_client.py" py="3.10"}
 from pydantic_ai import Agent
@@ -242,41 +234,9 @@ calculator_server = MCPServerSSE(
 # Both servers might have a tool named 'get_data', but they'll be exposed as:
 # - 'weather_get_data'
 # - 'calc_get_data'
-agent = Agent('openai:gpt-4o', mcp_servers=[weather_server, calculator_server])
-```
-
-### Example with Stdio Server
-
-```python {title="mcp_tool_prefix_stdio_client.py" py="3.10"}
-from pydantic_ai import Agent
-from pydantic_ai.mcp import MCPServerStdio
-
-python_server = MCPServerStdio(
-    'deno',
-    args=[
-        'run',
-        '-N',
-        'jsr:@pydantic/mcp-run-python',
-        'stdio',
-    ],
-    tool_prefix='py'  # Tools will be prefixed with 'py_'
-)
-
-js_server = MCPServerStdio(
-    'node',
-    args=[
-        'run',
-        'mcp-js-server.js',
-        'stdio',
-    ],
-    tool_prefix='js'  # Tools will be prefixed with 'js_'
-)
-
-agent = Agent('openai:gpt-4o', mcp_servers=[python_server, js_server])
+agent = Agent('openai:gpt-4o', toolsets=[weather_server, calculator_server])
 ```
 
-When the model interacts with these servers, it will see the prefixed tool names, but the prefixes will be automatically handled when making tool calls.
-
 ## MCP Sampling
 
 !!! info "What is MCP Sampling?"
@@ -312,6 +272,8 @@ Pydantic AI supports sampling as both a client and server. See the [server](./se
 
 Sampling is automatically supported by Pydantic AI agents when they act as a client.
 
+To be able to use sampling, an MCP server instance needs to have a [`sampling_model`][pydantic_ai.mcp.MCPServerStdio.sampling_model] set. This can be done either directly on the server using the constructor keyword argument or the property, or by using [`agent.set_mcp_sampling_model()`][pydantic_ai.Agent.set_mcp_sampling_model] to set the agent's model or one specified as an argument as the sampling model on all MCP servers registered with that agent.
+
 Let's say we have an MCP server that wants to use sampling (in this case to generate an SVG as per the tool arguments).
 
 ??? example "Sampling MCP Server"
@@ -359,11 +321,12 @@ from pydantic_ai import Agent
 from pydantic_ai.mcp import MCPServerStdio
 
 server = MCPServerStdio(command='python', args=['generate_svg.py'])
-agent = Agent('openai:gpt-4o', mcp_servers=[server])
+agent = Agent('openai:gpt-4o', toolsets=[server])
 
 
 async def main():
-    async with agent.run_mcp_servers():
+    async with agent:
+        agent.set_mcp_sampling_model()
         result = await agent.run('Create an image of a robot in a punk style.')
     print(result.output)
     #> Image file written to robot_punk.svg.
 
@@ -69,7 +69,7 @@ agent = Agent(model)
 ## Custom Hugging Face client
 
 [`HuggingFaceProvider`][pydantic_ai.providers.huggingface.HuggingFaceProvider] also accepts a custom
-[`AsyncInferenceClient`][huggingface_hub.AsyncInferenceClient] client via the `hf_client` parameter, so you can customise
+[`AsyncInferenceClient`](https://huggingface.co/docs/huggingface_hub/v0.29.3/en/package_reference/inference_client#huggingface_hub.AsyncInferenceClient) client via the `hf_client` parameter, so you can customise
 the `headers`, `bill_to` (billing to an HF organization you're a member of), `base_url` etc. as defined in the
 [Hugging Face Hub python library docs](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client).
 
 
@@ -199,8 +199,8 @@ async def hand_off_to_sql_agent(ctx: RunContext, query: str) -> list[Row]:
         return output
     except UnexpectedModelBehavior as e:
         # Bubble up potentially retryable errors to the router agent
-        if (cause := e.__cause__) and hasattr(cause, 'tool_retry'):
-            raise ModelRetry(f'SQL agent failed: {cause.tool_retry.content}') from e
+        if (cause := e.__cause__) and isinstance(cause, ModelRetry):
+            raise ModelRetry(f'SQL agent failed: {cause.message}') from e
         else:
             raise
 
@@ -276,6 +276,8 @@ In the default Tool Output mode, the output JSON schema of each output type (or
 
 If you'd like to change the name of the output tool, pass a custom description to aid the model, or turn on or off strict mode, you can wrap the type(s) in the [`ToolOutput`][pydantic_ai.output.ToolOutput] marker class and provide the appropriate arguments. Note that by default, the description is taken from the docstring specified on a Pydantic model or output function, so specifying it using the marker class is typically not necessary.
 
+To dynamically modify or filter the available output tools during an agent run, you can define an agent-wide `prepare_output_tools` function that will be called ahead of each step of a run. This function should be of type [`ToolsPrepareFunc`][pydantic_ai.tools.ToolsPrepareFunc], which takes the [`RunContext`][pydantic_ai.tools.RunContext] and a list of [`ToolDefinition`][pydantic_ai.tools.ToolDefinition], and returns a new list of tool definitions (or `None` to disable all tools for that step). This is analogous to the [`prepare_tools` function](tools.md#prepare-tools) for non-output tools.
+
 ```python {title="tool_output.py"}
 from pydantic import BaseModel
 
 
@@ -10,7 +10,7 @@ Unless you're really sure you know better, you'll probably want to follow roughl
 * If you find yourself typing out long assertions, use [inline-snapshot](https://15r10nk.github.io/inline-snapshot/latest/)
 * Similarly, [dirty-equals](https://dirty-equals.helpmanual.io/latest/) can be useful for comparing large data structures
 * Use [`TestModel`][pydantic_ai.models.test.TestModel] or [`FunctionModel`][pydantic_ai.models.function.FunctionModel] in place of your actual model to avoid the usage, latency and variability of real LLM calls
-* Use [`Agent.override`][pydantic_ai.agent.Agent.override] to replace your model inside your application logic
+* Use [`Agent.override`][pydantic_ai.agent.Agent.override] to replace an agent's model, dependencies, or toolsets inside your application logic
 * Set [`ALLOW_MODEL_REQUESTS=False`][pydantic_ai.models.ALLOW_MODEL_REQUESTS] globally to block any requests from being made to non-test models accidentally
 
 ### Unit testing with `TestModel`