Skip to content

Commit 6dadb70

Browse files
authored
Merge branch 'main' into dmontagu/add-route-support-to-provider-inference
2 parents 48ad9c9 + 24e47bf commit 6dadb70

32 files changed

+1667
-200
lines changed

docs/durable_execution/temporal.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,7 @@ As workflows and activities run in separate processes, any values passed between
172172

173173
To account for these limitations, tool functions and the [event stream handler](#streaming) running inside activities receive a limited version of the agent's [`RunContext`][pydantic_ai.tools.RunContext], and it's your responsibility to make sure that the [dependencies](../dependencies.md) object provided to [`TemporalAgent.run()`][pydantic_ai.durable_exec.temporal.TemporalAgent.run] can be serialized using Pydantic.
174174

175-
Specifically, only the `deps`, `run_id`, `retries`, `tool_call_id`, `tool_name`, `tool_call_approved`, `retry`, `max_retries`, `run_step` and `partial_output` fields are available by default, and trying to access `model`, `usage`, `prompt`, `messages`, or `tracer` will raise an error.
175+
Specifically, only the `deps`, `run_id`, `retries`, `tool_call_id`, `tool_name`, `tool_call_approved`, `retry`, `max_retries`, `run_step`, `usage`, and `partial_output` fields are available by default, and trying to access `model`, `prompt`, `messages`, or `tracer` will raise an error.
176176
If you need one or more of these attributes to be available inside activities, you can create a [`TemporalRunContext`][pydantic_ai.durable_exec.temporal.TemporalRunContext] subclass with custom `serialize_run_context` and `deserialize_run_context` class methods and pass it to [`TemporalAgent`][pydantic_ai.durable_exec.temporal.TemporalAgent] as `run_context_type`.
177177

178178
### Streaming

docs/logfire.md

Lines changed: 16 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -106,49 +106,30 @@ We can also query data with SQL in Logfire to monitor the performance of an appl
106106

107107
### Monitoring HTTP Requests
108108

109-
!!! tip "\"F**k you, show me the prompt.\""
110-
As per Hamel Husain's influential 2024 blog post ["Fuck You, Show Me The Prompt."](https://hamel.dev/blog/posts/prompt/)
111-
(bear with the capitalization, the point is valid), it's often useful to be able to view the raw HTTP requests and responses made to model providers.
109+
As per Hamel Husain's influential 2024 blog post ["Fuck You, Show Me The Prompt."](https://hamel.dev/blog/posts/prompt/)
110+
(bear with the capitalization, the point is valid), it's often useful to be able to view the raw HTTP requests and responses made to model providers.
112111

113-
To observe raw HTTP requests made to model providers, you can use Logfire's [HTTPX instrumentation](https://logfire.pydantic.dev/docs/integrations/http-clients/httpx/) since all provider SDKs use the [HTTPX](https://www.python-httpx.org/) library internally.
112+
To observe raw HTTP requests made to model providers, you can use Logfire's [HTTPX instrumentation](https://logfire.pydantic.dev/docs/integrations/http-clients/httpx/) since all provider SDKs (except for [Bedrock](models/bedrock.md)) use the [HTTPX](https://www.python-httpx.org/) library internally:
114113

115-
=== "With HTTP instrumentation"
116114

117-
```py {title="with_logfire_instrument_httpx.py" hl_lines="7"}
118-
import logfire
119-
120-
from pydantic_ai import Agent
121-
122-
logfire.configure()
123-
logfire.instrument_pydantic_ai()
124-
logfire.instrument_httpx(capture_all=True) # (1)!
125-
agent = Agent('openai:gpt-5')
126-
result = agent.run_sync('What is the capital of France?')
127-
print(result.output)
128-
#> The capital of France is Paris.
129-
```
130-
131-
1. See the [`logfire.instrument_httpx` docs][logfire.Logfire.instrument_httpx] more details, `capture_all=True` means both headers and body are captured for both the request and response.
132-
133-
![Logfire with HTTPX instrumentation](img/logfire-with-httpx.png)
134-
135-
=== "Without HTTP instrumentation"
115+
```py {title="with_logfire_instrument_httpx.py" hl_lines="7"}
116+
import logfire
136117

137-
```py {title="without_logfire_instrument_httpx.py"}
138-
import logfire
118+
from pydantic_ai import Agent
139119

140-
from pydantic_ai import Agent
120+
logfire.configure()
121+
logfire.instrument_pydantic_ai()
122+
logfire.instrument_httpx(capture_all=True) # (1)!
141123

142-
logfire.configure()
143-
logfire.instrument_pydantic_ai()
124+
agent = Agent('openai:gpt-5')
125+
result = agent.run_sync('What is the capital of France?')
126+
print(result.output)
127+
#> The capital of France is Paris.
128+
```
144129

145-
agent = Agent('openai:gpt-5')
146-
result = agent.run_sync('What is the capital of France?')
147-
print(result.output)
148-
#> The capital of France is Paris.
149-
```
130+
1. See the [`logfire.instrument_httpx` docs][logfire.Logfire.instrument_httpx] more details, `capture_all=True` means both headers and body are captured for both the request and response.
150131

151-
![Logfire without HTTPX instrumentation](img/logfire-without-httpx.png)
132+
![Logfire with HTTPX instrumentation](img/logfire-with-httpx.png)
152133

153134
## Using OpenTelemetry
154135

docs/mcp/client.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -338,10 +338,94 @@ calculator_server = MCPServerSSE(
338338
agent = Agent('openai:gpt-5', toolsets=[weather_server, calculator_server])
339339
```
340340

341+
## Server Instructions
342+
343+
MCP servers can provide instructions during initialization that give context about how to best interact with the server's tools. These instructions are accessible via the [`instructions`][pydantic_ai.mcp.MCPServer.instructions] property after the server connection is established.
344+
345+
```python {title="mcp_server_instructions.py"}
346+
from pydantic_ai import Agent
347+
from pydantic_ai.mcp import MCPServerStreamableHTTP
348+
349+
server = MCPServerStreamableHTTP('http://localhost:8000/mcp')
350+
agent = Agent('openai:gpt-5', toolsets=[server])
351+
352+
@agent.instructions
353+
async def mcp_server_instructions():
354+
return server.instructions # (1)!
355+
356+
async def main():
357+
result = await agent.run('What is 7 plus 5?')
358+
print(result.output)
359+
#> The answer is 12.
360+
```
361+
362+
1. The server connection is guaranteed to be established by this point, so `server.instructions` is available.
363+
341364
## Tool metadata
342365

343366
MCP tools can include metadata that provides additional information about the tool's characteristics, which can be useful when [filtering tools][pydantic_ai.toolsets.FilteredToolset]. The `meta`, `annotations`, and `output_schema` fields can be found on the `metadata` dict on the [`ToolDefinition`][pydantic_ai.tools.ToolDefinition] object that's passed to filter functions.
344367

368+
## Resources
369+
370+
MCP servers can provide [resources](https://modelcontextprotocol.io/docs/concepts/resources) - files, data, or content that can be accessed by the client. Resources in MCP are application-driven, with host applications determining how to incorporate context manually, based on their needs. This means they will _not_ be exposed to the LLM automatically (unless a tool returns a `ResourceLink` or `EmbeddedResource`).
371+
372+
Pydantic AI provides methods to discover and read resources from MCP servers:
373+
374+
- [`list_resources()`][pydantic_ai.mcp.MCPServer.list_resources] - List all available resources on the server
375+
- [`list_resource_templates()`][pydantic_ai.mcp.MCPServer.list_resource_templates] - List resource templates with parameter placeholders
376+
- [`read_resource(uri)`][pydantic_ai.mcp.MCPServer.read_resource] - Read the contents of a specific resource by URI
377+
378+
Resources are automatically converted: text content is returned as `str`, and binary content is returned as [`BinaryContent`][pydantic_ai.messages.BinaryContent].
379+
380+
Before consuming resources, we need to run a server that exposes some:
381+
382+
```python {title="mcp_resource_server.py"}
383+
from mcp.server.fastmcp import FastMCP
384+
385+
mcp = FastMCP('Pydantic AI MCP Server')
386+
log_level = 'unset'
387+
388+
389+
@mcp.resource('resource://user_name.txt', mime_type='text/plain')
390+
async def user_name_resource() -> str:
391+
return 'Alice'
392+
393+
394+
if __name__ == '__main__':
395+
mcp.run()
396+
```
397+
398+
Then we can create the client:
399+
400+
```python {title="mcp_resources.py", requires="mcp_resource_server.py"}
401+
import asyncio
402+
403+
from pydantic_ai.mcp import MCPServerStdio
404+
405+
406+
async def main():
407+
server = MCPServerStdio('python', args=['-m', 'mcp_resource_server'])
408+
409+
async with server:
410+
# List all available resources
411+
resources = await server.list_resources()
412+
for resource in resources:
413+
print(f' - {resource.name}: {resource.uri} ({resource.mime_type})')
414+
#> - user_name_resource: resource://user_name.txt (text/plain)
415+
416+
# Read a text resource
417+
user_name = await server.read_resource('resource://user_name.txt')
418+
print(f'Text content: {user_name}')
419+
#> Text content: Alice
420+
421+
422+
if __name__ == '__main__':
423+
asyncio.run(main())
424+
```
425+
426+
_(This example is complete, it can be run "as is")_
427+
428+
345429
## Custom TLS / SSL configuration
346430

347431
In some environments you need to tweak how HTTPS connections are established –

docs/models/anthropic.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -83,8 +83,8 @@ agent = Agent(model)
8383
Anthropic supports [prompt caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) to reduce costs by caching parts of your prompts. Pydantic AI provides three ways to use prompt caching:
8484

8585
1. **Cache User Messages with [`CachePoint`][pydantic_ai.messages.CachePoint]**: Insert a `CachePoint` marker in your user messages to cache everything before it
86-
2. **Cache System Instructions**: Enable the [`AnthropicModelSettings.anthropic_cache_instructions`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_instructions] [model setting](../agents.md#model-run-settings) to cache your system prompt
87-
3. **Cache Tool Definitions**: Enable the [`AnthropicModelSettings.anthropic_cache_tool_definitions`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_tool_definitions] [model setting](../agents.md#model-run-settings) to cache your tool definitions
86+
2. **Cache System Instructions**: Set [`AnthropicModelSettings.anthropic_cache_instructions`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_instructions] to `True` (uses 5m TTL by default) or specify `'5m'` / `'1h'` directly
87+
3. **Cache Tool Definitions**: Set [`AnthropicModelSettings.anthropic_cache_tool_definitions`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_tool_definitions] to `True` (uses 5m TTL by default) or specify `'5m'` / `'1h'` directly
8888

8989
You can combine all three strategies for maximum savings:
9090

@@ -96,8 +96,9 @@ agent = Agent(
9696
'anthropic:claude-sonnet-4-5',
9797
system_prompt='Detailed instructions...',
9898
model_settings=AnthropicModelSettings(
99+
# Use True for default 5m TTL, or specify '5m' / '1h' directly
99100
anthropic_cache_instructions=True,
100-
anthropic_cache_tool_definitions=True,
101+
anthropic_cache_tool_definitions='1h', # Longer cache for tool definitions
101102
),
102103
)
103104

@@ -134,7 +135,7 @@ agent = Agent(
134135
'anthropic:claude-sonnet-4-5',
135136
system_prompt='Instructions...',
136137
model_settings=AnthropicModelSettings(
137-
anthropic_cache_instructions=True
138+
anthropic_cache_instructions=True # Default 5m TTL
138139
),
139140
)
140141

docs/models/google.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -214,22 +214,22 @@ from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
214214
settings = GoogleModelSettings(
215215
temperature=0.2,
216216
max_tokens=1024,
217-
google_thinking_config={'thinking_budget': 2048},
217+
google_thinking_config={'thinking_level': 'low'},
218218
google_safety_settings=[
219219
{
220220
'category': HarmCategory.HARM_CATEGORY_HATE_SPEECH,
221221
'threshold': HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
222222
}
223223
]
224224
)
225-
model = GoogleModel('gemini-2.5-flash')
225+
model = GoogleModel('gemini-2.5-pro')
226226
agent = Agent(model, model_settings=settings)
227227
...
228228
```
229229

230230
### Disable thinking
231231

232-
You can disable thinking by setting the `thinking_budget` to `0` on the `google_thinking_config`:
232+
On models older than Gemini 2.5 Pro, you can disable thinking by setting the `thinking_budget` to `0` on the `google_thinking_config`:
233233

234234
```python
235235
from pydantic_ai import Agent

pydantic_ai_slim/pydantic_ai/_agent_graph.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,12 @@ async def run( # noqa: C901
216216
ctx.state.message_history = messages
217217
ctx.deps.new_message_index = len(messages)
218218

219+
# Validate that message history starts with a user message
220+
if messages and isinstance(messages[0], _messages.ModelResponse):
221+
raise exceptions.UserError(
222+
'Message history cannot start with a `ModelResponse`. Conversations must begin with a user message.'
223+
)
224+
219225
if self.deferred_tool_results is not None:
220226
return await self._handle_deferred_tool_results(self.deferred_tool_results, messages, ctx)
221227

pydantic_ai_slim/pydantic_ai/_mcp.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
from __future__ import annotations
2+
13
import base64
24
from collections.abc import Sequence
35
from typing import Literal

pydantic_ai_slim/pydantic_ai/_output.py

Lines changed: 4 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -470,7 +470,7 @@ def __init__(
470470
allows_image: bool,
471471
):
472472
super().__init__(
473-
processor=PromptedOutputProcessor(processor),
473+
processor=processor,
474474
allows_deferred_tools=allows_deferred_tools,
475475
allows_image=allows_image,
476476
)
@@ -494,13 +494,6 @@ def build_instructions(cls, template: str, object_def: OutputObjectDefinition) -
494494

495495
return template.format(schema=json.dumps(schema))
496496

497-
def instructions(self, default_template: str) -> str: # pragma: no cover
498-
"""Get instructions to tell model to output JSON matching the schema."""
499-
template = self.template or default_template
500-
object_def = self.object_def
501-
assert object_def is not None
502-
return self.build_instructions(template, object_def)
503-
504497

505498
@dataclass(init=False)
506499
class ToolOutputSchema(OutputSchema[OutputDataT]):
@@ -542,28 +535,6 @@ class BaseObjectOutputProcessor(BaseOutputProcessor[OutputDataT]):
542535
object_def: OutputObjectDefinition
543536

544537

545-
@dataclass(init=False)
546-
class PromptedOutputProcessor(BaseObjectOutputProcessor[OutputDataT]):
547-
wrapped: BaseObjectOutputProcessor[OutputDataT]
548-
549-
def __init__(self, wrapped: BaseObjectOutputProcessor[OutputDataT]):
550-
self.wrapped = wrapped
551-
super().__init__(object_def=wrapped.object_def)
552-
553-
async def process(
554-
self,
555-
data: str,
556-
run_context: RunContext[AgentDepsT],
557-
allow_partial: bool = False,
558-
wrap_validation_errors: bool = True,
559-
) -> OutputDataT:
560-
text = _utils.strip_markdown_fences(data)
561-
562-
return await self.wrapped.process(
563-
text, run_context, allow_partial=allow_partial, wrap_validation_errors=wrap_validation_errors
564-
)
565-
566-
567538
@dataclass(init=False)
568539
class ObjectOutputProcessor(BaseObjectOutputProcessor[OutputDataT]):
569540
outer_typed_dict_key: str | None = None
@@ -653,6 +624,9 @@ async def process(
653624
Returns:
654625
Either the validated output data (left) or a retry message (right).
655626
"""
627+
if isinstance(data, str):
628+
data = _utils.strip_markdown_fences(data)
629+
656630
try:
657631
output = self.validate(data, allow_partial)
658632
except ValidationError as e:

pydantic_ai_slim/pydantic_ai/_utils.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -467,12 +467,14 @@ def validate_empty_kwargs(_kwargs: dict[str, Any]) -> None:
467467
raise exceptions.UserError(f'Unknown keyword arguments: {unknown_kwargs}')
468468

469469

470+
_MARKDOWN_FENCES_PATTERN = re.compile(r'```(?:\w+)?\n(\{.*\})', flags=re.DOTALL)
471+
472+
470473
def strip_markdown_fences(text: str) -> str:
471474
if text.startswith('{'):
472475
return text
473476

474-
regex = r'```(?:\w+)?\n(\{.*\})\n```'
475-
match = re.search(regex, text, re.DOTALL)
477+
match = re.search(_MARKDOWN_FENCES_PATTERN, text)
476478
if match:
477479
return match.group(1)
478480

pydantic_ai_slim/pydantic_ai/durable_exec/temporal/_run_context.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
class TemporalRunContext(RunContext[AgentDepsT]):
1515
"""The [`RunContext`][pydantic_ai.tools.RunContext] subclass to use to serialize and deserialize the run context for use inside a Temporal activity.
1616
17-
By default, only the `deps`, `run_id`, `retries`, `tool_call_id`, `tool_name`, `tool_call_approved`, `retry`, `max_retries`, `run_step` and `partial_output` attributes will be available.
17+
By default, only the `deps`, `run_id`, `retries`, `tool_call_id`, `tool_name`, `tool_call_approved`, `retry`, `max_retries`, `run_step`, `usage`, and `partial_output` attributes will be available.
1818
To make another attribute available, create a `TemporalRunContext` subclass with a custom `serialize_run_context` class method that returns a dictionary that includes the attribute and pass it to [`TemporalAgent`][pydantic_ai.durable_exec.temporal.TemporalAgent].
1919
"""
2020

@@ -51,6 +51,7 @@ def serialize_run_context(cls, ctx: RunContext[Any]) -> dict[str, Any]:
5151
'max_retries': ctx.max_retries,
5252
'run_step': ctx.run_step,
5353
'partial_output': ctx.partial_output,
54+
'usage': ctx.usage,
5455
}
5556

5657
@classmethod

0 commit comments

Comments
 (0)