pydantic
diff --git a/‎docs/durable_execution/temporal.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/durable_execution/temporal.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/models/anthropic.md‎
Lines changed: 5 additions & 4 deletions b/‎docs/models/anthropic.md‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎docs/models/google.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/models/google.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎pydantic_ai_slim/pydantic_ai/_json_schema.py‎
Lines changed: 3 additions & 4 deletions b/‎pydantic_ai_slim/pydantic_ai/_json_schema.py‎
Lines changed: 3 additions & 4 deletions
diff --git a/‎pydantic_ai_slim/pydantic_ai/durable_exec/temporal/_run_context.py‎
Lines changed: 2 additions & 1 deletion b/‎pydantic_ai_slim/pydantic_ai/durable_exec/temporal/_run_context.py‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎pydantic_ai_slim/pydantic_ai/messages.py‎
Lines changed: 14 additions & 1 deletion b/‎pydantic_ai_slim/pydantic_ai/messages.py‎
Lines changed: 14 additions & 1 deletion
diff --git a/‎pydantic_ai_slim/pydantic_ai/models/__init__.py‎
Lines changed: 10 additions & 6 deletions b/‎pydantic_ai_slim/pydantic_ai/models/__init__.py‎
Lines changed: 10 additions & 6 deletions
diff --git a/‎pydantic_ai_slim/pydantic_ai/models/anthropic.py‎
Lines changed: 17 additions & 9 deletions b/‎pydantic_ai_slim/pydantic_ai/models/anthropic.py‎
Lines changed: 17 additions & 9 deletions
@@ -172,7 +172,7 @@ As workflows and activities run in separate processes, any values passed between
 
 To account for these limitations, tool functions and the [event stream handler](#streaming) running inside activities receive a limited version of the agent's [`RunContext`][pydantic_ai.tools.RunContext], and it's your responsibility to make sure that the [dependencies](../dependencies.md) object provided to [`TemporalAgent.run()`][pydantic_ai.durable_exec.temporal.TemporalAgent.run] can be serialized using Pydantic.
 
-Specifically, only the `deps`, `run_id`, `retries`, `tool_call_id`, `tool_name`, `tool_call_approved`, `retry`, `max_retries`, `run_step` and `partial_output` fields are available by default, and trying to access `model`, `usage`, `prompt`, `messages`, or `tracer` will raise an error.
+Specifically, only the `deps`, `run_id`, `retries`, `tool_call_id`, `tool_name`, `tool_call_approved`, `retry`, `max_retries`, `run_step`, `usage`, and `partial_output` fields are available by default, and trying to access `model`, `prompt`, `messages`, or `tracer` will raise an error.
 If you need one or more of these attributes to be available inside activities, you can create a [`TemporalRunContext`][pydantic_ai.durable_exec.temporal.TemporalRunContext] subclass with custom `serialize_run_context` and `deserialize_run_context` class methods and pass it to [`TemporalAgent`][pydantic_ai.durable_exec.temporal.TemporalAgent] as `run_context_type`.
 
 ### Streaming
 
@@ -83,8 +83,8 @@ agent = Agent(model)
 Anthropic supports [prompt caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) to reduce costs by caching parts of your prompts. Pydantic AI provides three ways to use prompt caching:
 
 1. **Cache User Messages with [`CachePoint`][pydantic_ai.messages.CachePoint]**: Insert a `CachePoint` marker in your user messages to cache everything before it
-2. **Cache System Instructions**: Enable the [`AnthropicModelSettings.anthropic_cache_instructions`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_instructions] [model setting](../agents.md#model-run-settings) to cache your system prompt
-3. **Cache Tool Definitions**: Enable the [`AnthropicModelSettings.anthropic_cache_tool_definitions`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_tool_definitions] [model setting](../agents.md#model-run-settings) to cache your tool definitions
+2. **Cache System Instructions**: Set [`AnthropicModelSettings.anthropic_cache_instructions`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_instructions] to `True` (uses 5m TTL by default) or specify `'5m'` / `'1h'` directly
+3. **Cache Tool Definitions**: Set [`AnthropicModelSettings.anthropic_cache_tool_definitions`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_tool_definitions] to `True` (uses 5m TTL by default) or specify `'5m'` / `'1h'` directly
 
 You can combine all three strategies for maximum savings:
 
@@ -96,8 +96,9 @@ agent = Agent(
     'anthropic:claude-sonnet-4-5',
     system_prompt='Detailed instructions...',
     model_settings=AnthropicModelSettings(
+        # Use True for default 5m TTL, or specify '5m' / '1h' directly
         anthropic_cache_instructions=True,
-        anthropic_cache_tool_definitions=True,
+        anthropic_cache_tool_definitions='1h',  # Longer cache for tool definitions
     ),
 )
 
@@ -134,7 +135,7 @@ agent = Agent(
     'anthropic:claude-sonnet-4-5',
     system_prompt='Instructions...',
     model_settings=AnthropicModelSettings(
-        anthropic_cache_instructions=True
+        anthropic_cache_instructions=True  # Default 5m TTL
     ),
 )
 
 
@@ -214,22 +214,22 @@ from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
 settings = GoogleModelSettings(
     temperature=0.2,
     max_tokens=1024,
-    google_thinking_config={'thinking_budget': 2048},
+    google_thinking_config={'thinking_level': 'low'},
     google_safety_settings=[
         {
             'category': HarmCategory.HARM_CATEGORY_HATE_SPEECH,
             'threshold': HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
         }
     ]
 )
-model = GoogleModel('gemini-2.5-flash')
+model = GoogleModel('gemini-2.5-pro')
 agent = Agent(model, model_settings=settings)
 ...
 ```
 
 ### Disable thinking
 
-You can disable thinking by setting the `thinking_budget` to `0` on the `google_thinking_config`:
+On models older than Gemini 2.5 Pro, you can disable thinking by setting the `thinking_budget` to `0` on the `google_thinking_config`:
 
 ```python
 from pydantic_ai import Agent
 
@@ -27,7 +27,7 @@ def __init__(
         *,
         strict: bool | None = None,
         prefer_inlined_defs: bool = False,
-        simplify_nullable_unions: bool = False,
+        simplify_nullable_unions: bool = False,  # TODO (v2): Remove this, no longer used
         flatten_allof: bool = False,
     ):
         self.schema = schema
@@ -156,10 +156,9 @@ def _handle_union(self, schema: JsonSchema, union_kind: Literal['anyOf', 'oneOf'
 
         handled = [self._handle(member) for member in members]
 
-        # convert nullable unions to nullable types
+        # TODO (v2): Remove this feature, no longer used
         if self.simplify_nullable_unions:
             handled = self._simplify_nullable_union(handled)
-
         if len(handled) == 1:
             # In this case, no need to retain the union
             return handled[0] | schema
@@ -171,7 +170,7 @@ def _handle_union(self, schema: JsonSchema, union_kind: Literal['anyOf', 'oneOf'
 
     @staticmethod
     def _simplify_nullable_union(cases: list[JsonSchema]) -> list[JsonSchema]:
-        # TODO: Should we move this to relevant subclasses? Or is it worth keeping here to make reuse easier?
+        # TODO (v2): Remove this method, no longer used
         if len(cases) == 2 and {'type': 'null'} in cases:
             # Find the non-null schema
             non_null_schema = next(
 
@@ -14,7 +14,7 @@
 class TemporalRunContext(RunContext[AgentDepsT]):
     """The [`RunContext`][pydantic_ai.tools.RunContext] subclass to use to serialize and deserialize the run context for use inside a Temporal activity.
 
-    By default, only the `deps`, `run_id`, `retries`, `tool_call_id`, `tool_name`, `tool_call_approved`, `retry`, `max_retries`, `run_step` and `partial_output` attributes will be available.
+    By default, only the `deps`, `run_id`, `retries`, `tool_call_id`, `tool_name`, `tool_call_approved`, `retry`, `max_retries`, `run_step`, `usage`, and `partial_output` attributes will be available.
     To make another attribute available, create a `TemporalRunContext` subclass with a custom `serialize_run_context` class method that returns a dictionary that includes the attribute and pass it to [`TemporalAgent`][pydantic_ai.durable_exec.temporal.TemporalAgent].
     """
 
@@ -51,6 +51,7 @@ def serialize_run_context(cls, ctx: RunContext[Any]) -> dict[str, Any]:
             'max_retries': ctx.max_retries,
             'run_step': ctx.run_step,
             'partial_output': ctx.partial_output,
+            'usage': ctx.usage,
         }
 
     @classmethod
 
@@ -627,6 +627,13 @@ class CachePoint:
     kind: Literal['cache-point'] = 'cache-point'
     """Type identifier, this is available on all parts as a discriminator."""
 
+    ttl: Literal['5m', '1h'] = '5m'
+    """The cache time-to-live, either "5m" (5 minutes) or "1h" (1 hour).
+
+    Supported by:
+
+    * Anthropic. See https://docs.claude.com/en/docs/build-with-claude/prompt-caching#1-hour-cache-duration for more information."""
+
 
 MultiModalContent = ImageUrl | AudioUrl | DocumentUrl | VideoUrl | BinaryContent
 UserContent: TypeAlias = str | MultiModalContent | CachePoint
@@ -970,6 +977,9 @@ class ModelRequest:
     run_id: str | None = None
     """The unique identifier of the agent run in which this message originated."""
 
+    metadata: dict[str, Any] | None = None
+    """Additional data that can be accessed programmatically by the application but is not sent to the LLM."""
+
     @classmethod
     def user_text_prompt(cls, user_prompt: str, *, instructions: str | None = None) -> ModelRequest:
         """Create a `ModelRequest` with a single user prompt as text."""
@@ -1060,7 +1070,7 @@ class FilePart:
 
     def has_content(self) -> bool:
         """Return `True` if the file content is non-empty."""
-        return bool(self.content)  # pragma: no cover
+        return bool(self.content.data)
 
     __repr__ = _utils.dataclasses_no_defaults_repr
 
@@ -1214,6 +1224,9 @@ class ModelResponse:
     run_id: str | None = None
     """The unique identifier of the agent run in which this message originated."""
 
+    metadata: dict[str, Any] | None = None
+    """Additional data that can be accessed programmatically by the application but is not sent to the LLM."""
+
     @property
     def text(self) -> str | None:
         """Get the text in the response."""
 
@@ -145,24 +145,28 @@
         'cohere:command-r7b-12-2024',
         'deepseek:deepseek-chat',
         'deepseek:deepseek-reasoner',
+        'google-gla:gemini-flash-latest',
+        'google-gla:gemini-flash-lite-latest',
         'google-gla:gemini-2.0-flash',
         'google-gla:gemini-2.0-flash-lite',
         'google-gla:gemini-2.5-flash',
+        'google-gla:gemini-2.5-flash-preview-09-2025',
+        'google-gla:gemini-2.5-flash-image',
         'google-gla:gemini-2.5-flash-lite',
         'google-gla:gemini-2.5-flash-lite-preview-09-2025',
-        'google-gla:gemini-2.5-flash-preview-09-2025',
         'google-gla:gemini-2.5-pro',
-        'google-gla:gemini-flash-latest',
-        'google-gla:gemini-flash-lite-latest',
+        'google-gla:gemini-3-pro-preview',
+        'google-vertex:gemini-flash-latest',
+        'google-vertex:gemini-flash-lite-latest',
         'google-vertex:gemini-2.0-flash',
         'google-vertex:gemini-2.0-flash-lite',
         'google-vertex:gemini-2.5-flash',
+        'google-vertex:gemini-2.5-flash-preview-09-2025',
+        'google-vertex:gemini-2.5-flash-image',
         'google-vertex:gemini-2.5-flash-lite',
         'google-vertex:gemini-2.5-flash-lite-preview-09-2025',
-        'google-vertex:gemini-2.5-flash-preview-09-2025',
         'google-vertex:gemini-2.5-pro',
-        'google-vertex:gemini-flash-latest',
-        'google-vertex:gemini-flash-lite-latest',
+        'google-vertex:gemini-3-pro-preview',
         'grok:grok-2-image-1212',
         'grok:grok-2-vision-1212',
         'grok:grok-3',
 
@@ -151,19 +151,21 @@ class AnthropicModelSettings(ModelSettings, total=False):
     See [the Anthropic docs](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking) for more information.
     """
 
-    anthropic_cache_tool_definitions: bool
+    anthropic_cache_tool_definitions: bool | Literal['5m', '1h']
     """Whether to add `cache_control` to the last tool definition.
 
     When enabled, the last tool in the `tools` array will have `cache_control` set,
     allowing Anthropic to cache tool definitions and reduce costs.
+    If `True`, uses TTL='5m'. You can also specify '5m' or '1h' directly.
     See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching for more information.
     """
 
-    anthropic_cache_instructions: bool
+    anthropic_cache_instructions: bool | Literal['5m', '1h']
     """Whether to add `cache_control` to the last system prompt block.
 
     When enabled, the last system prompt will have `cache_control` set,
     allowing Anthropic to cache system instructions and reduce costs.
+    If `True`, uses TTL='5m'. You can also specify '5m' or '1h' directly.
     See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching for more information.
     """
 
@@ -476,9 +478,11 @@ def _get_tools(
         ]
 
         # Add cache_control to the last tool if enabled
-        if tools and model_settings.get('anthropic_cache_tool_definitions'):
+        if tools and (cache_tool_defs := model_settings.get('anthropic_cache_tool_definitions')):
+            # If True, use '5m'; otherwise use the specified ttl value
+            ttl: Literal['5m', '1h'] = '5m' if cache_tool_defs is True else cache_tool_defs
             last_tool = tools[-1]
-            last_tool['cache_control'] = BetaCacheControlEphemeralParam(type='ephemeral')
+            last_tool['cache_control'] = BetaCacheControlEphemeralParam(type='ephemeral', ttl=ttl)
 
         return tools
 
@@ -580,7 +584,7 @@ async def _map_message(  # noqa: C901
                     elif isinstance(request_part, UserPromptPart):
                         async for content in self._map_user_prompt(request_part):
                             if isinstance(content, CachePoint):
-                                self._add_cache_control_to_last_param(user_content_params)
+                                self._add_cache_control_to_last_param(user_content_params, ttl=content.ttl)
                             else:
                                 user_content_params.append(content)
                     elif isinstance(request_part, ToolReturnPart):
@@ -744,18 +748,22 @@ async def _map_message(  # noqa: C901
         system_prompt = '\n\n'.join(system_prompt_parts)
 
         # If anthropic_cache_instructions is enabled, return system prompt as a list with cache_control
-        if system_prompt and model_settings.get('anthropic_cache_instructions'):
+        if system_prompt and (cache_instructions := model_settings.get('anthropic_cache_instructions')):
+            # If True, use '5m'; otherwise use the specified ttl value
+            ttl: Literal['5m', '1h'] = '5m' if cache_instructions is True else cache_instructions
             system_prompt_blocks = [
                 BetaTextBlockParam(
-                    type='text', text=system_prompt, cache_control=BetaCacheControlEphemeralParam(type='ephemeral')
+                    type='text',
+                    text=system_prompt,
+                    cache_control=BetaCacheControlEphemeralParam(type='ephemeral', ttl=ttl),
                 )
             ]
             return system_prompt_blocks, anthropic_messages
 
         return system_prompt, anthropic_messages
 
     @staticmethod
-    def _add_cache_control_to_last_param(params: list[BetaContentBlockParam]) -> None:
+    def _add_cache_control_to_last_param(params: list[BetaContentBlockParam], ttl: Literal['5m', '1h'] = '5m') -> None:
         """Add cache control to the last content block param.
 
         See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching for more information.
@@ -776,7 +784,7 @@ def _add_cache_control_to_last_param(params: list[BetaContentBlockParam]) -> Non
             raise UserError(f'Cache control not supported for param type: {last_param["type"]}')
 
         # Add cache_control to the last param
-        last_param['cache_control'] = BetaCacheControlEphemeralParam(type='ephemeral')
+        last_param['cache_control'] = BetaCacheControlEphemeralParam(type='ephemeral', ttl=ttl)
 
     @staticmethod
     async def _map_user_prompt(
Original file line number	Diff line number	Diff line change
`@@ -214,22 +214,22 @@ from pydantic_ai.models.google import GoogleModel, GoogleModelSettings`
`214`	`214`	`settings = GoogleModelSettings(`
`215`	`215`	`temperature=0.2,`
`216`	`216`	`max_tokens=1024,`
`217`		`- google_thinking_config={'thinking_budget': 2048},`
	`217`	`+ google_thinking_config={'thinking_level': 'low'},`
`218`	`218`	`google_safety_settings=[`
`219`	`219`	`{`
`220`	`220`	`'category': HarmCategory.HARM_CATEGORY_HATE_SPEECH,`
`221`	`221`	`'threshold': HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,`
`222`	`222`	`}`
`223`	`223`	`]`
`224`	`224`	`)`
`225`		`-model = GoogleModel('gemini-2.5-flash')`
	`225`	`+model = GoogleModel('gemini-2.5-pro')`
`226`	`226`	`agent = Agent(model, model_settings=settings)`
`227`	`227`	`...`
`228`	`228`	```
`229`	`229`
`230`	`230`	`### Disable thinking`
`231`	`231`
`232`		-You can disable thinking by setting the `thinking_budget` to `0` on the `google_thinking_config`:
	`232`	+On models older than Gemini 2.5 Pro, you can disable thinking by setting the `thinking_budget` to `0` on the `google_thinking_config`:
`233`	`233`
`234`	`234`	```python
`235`	`235`	`from pydantic_ai import Agent`