Skip to content

Commit 3db21d4

Browse files
authored
Support image generation and output with Google and OpenAI (#2970)
1 parent 322e092 commit 3db21d4

File tree

78 files changed

+9035
-991
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+9035
-991
lines changed

docs/builtin-tools.md

Lines changed: 195 additions & 20 deletions
Large diffs are not rendered by default.

docs/output.md

Lines changed: 41 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
"Output" refers to the final value returned from [running an agent](agents.md#running-agents). This can be either plain text, [structured data](#structured-output), or the result of a [function](#output-functions) called with arguments provided by the model.
1+
"Output" refers to the final value returned from [running an agent](agents.md#running-agents). This can be either plain text, [structured data](#structured-output), an [image](#image-output), or the result of a [function](#output-functions) called with arguments provided by the model.
22

33
The output is wrapped in [`AgentRunResult`][pydantic_ai.agent.AgentRunResult] or [`StreamedRunResult`][pydantic_ai.result.StreamedRunResult] so that you can access other data, like [usage][pydantic_ai.usage.RunUsage] of the run and [message history](message-history.md#accessing-messages-from-results).
44

55
Both `AgentRunResult` and `StreamedRunResult` are generic in the data they wrap, so typing information about the data returned by the agent is preserved.
66

7-
A run ends when the model responds with one of the structured output types, or, if no output type is specified or `str` is one of the allowed options, when a plain text response is received. A run can also be cancelled if usage limits are exceeded, see [Usage Limits](agents.md#usage-limits).
7+
A run ends when the model responds with one of the output types, or, if no output type is specified or `str` is one of the allowed options, when a plain text response is received. A run can also be cancelled if usage limits are exceeded, see [Usage Limits](agents.md#usage-limits).
88

99
Here's an example using a Pydantic model as the `output_type`, forcing the model to respond with data matching our specification:
1010

@@ -29,7 +29,7 @@ print(result.usage())
2929

3030
_(This example is complete, it can be run "as is")_
3131

32-
## Output data {#structured-output}
32+
## Structured output data {#structured-output}
3333

3434
The [`Agent`][pydantic_ai.Agent] class constructor takes an `output_type` argument that takes one or more types or [output functions](#output-functions). It supports simple scalar types, list and dict types (including `TypedDict`s and [`StructuredDict`s](#structured-dict)), dataclasses and Pydantic models, as well as type unions -- generally everything supported as type hints in a Pydantic model. You can also pass a list of multiple choices.
3535

@@ -470,6 +470,44 @@ print(result.output)
470470

471471
_(This example is complete, it can be run "as is")_
472472

473+
## Image output
474+
475+
Some models can generate images as part of their response, for example those that support the [Image Generation built-in tool](builtin-tools.md#image-generation-tool) and OpenAI models using the [Code Execution built-in tool](builtin-tools.md#code-execution-tool) when told to generate a chart.
476+
477+
To use the generated image as the output of the agent run, you can set `output_type` to [`BinaryImage`][pydantic_ai.messages.BinaryImage]. If no image-generating built-in tool is explicitly specified, the [`ImageGenerationTool`][pydantic_ai.builtin_tools.ImageGenerationTool] will be enabled automatically.
478+
479+
```py {title="image_output.py"}
480+
from pydantic_ai import Agent, BinaryImage
481+
482+
agent = Agent('openai-responses:gpt-5', output_type=BinaryImage)
483+
484+
result = agent.run_sync('Generate an image of an axolotl.')
485+
assert isinstance(result.output, BinaryImage)
486+
```
487+
488+
_(This example is complete, it can be run "as is")_
489+
490+
If an agent does not need to always generate an image, you can use a union of `BinaryImage` and `str`. If the model generates both, the image will take precedence as output and the text will be available on [`ModelResponse.text`][pydantic_ai.messages.ModelResponse.text]:
491+
492+
```py {title="image_output_union.py"}
493+
from pydantic_ai import Agent, BinaryImage
494+
495+
agent = Agent('openai-responses:gpt-5', output_type=BinaryImage | str)
496+
497+
result = agent.run_sync('Tell me a two-sentence story about an axolotl, no image please.')
498+
print(result.output)
499+
"""
500+
Once upon a time, in a hidden underwater cave, lived a curious axolotl named Pip who loved to explore. One day, while venturing further than usual, Pip discovered a shimmering, ancient coin that granted wishes!
501+
"""
502+
503+
result = agent.run_sync('Tell me a two-sentence story about an axolotl with an illustration.')
504+
assert isinstance(result.output, BinaryImage)
505+
print(result.response.text)
506+
"""
507+
Once upon a time, in a hidden underwater cave, lived a curious axolotl named Pip who loved to explore. One day, while venturing further than usual, Pip discovered a shimmering, ancient coin that granted wishes!
508+
"""
509+
```
510+
473511
## Streamed Results
474512

475513
There two main challenges with streamed results:

docs/thinking.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,12 @@ You can customize the tags using the [`thinking_tags`][pydantic_ai.profiles.Mode
1414
### OpenAI Responses
1515

1616
The [`OpenAIResponsesModel`][pydantic_ai.models.openai.OpenAIResponsesModel] can generate native thinking parts.
17-
To enable this functionality, you need to set the `openai_reasoning_effort` and `openai_reasoning_summary` fields in the
18-
[`OpenAIResponsesModelSettings`][pydantic_ai.models.openai.OpenAIResponsesModelSettings].
17+
To enable this functionality, you need to set the
18+
[`OpenAIResponsesModelSettings.openai_reasoning_effort`][pydantic_ai.models.openai.OpenAIResponsesModelSettings.openai_reasoning_effort] and [`OpenAIResponsesModelSettings.openai_reasoning_summary`][pydantic_ai.models.openai.OpenAIResponsesModelSettings.openai_reasoning_summary] [model settings](agents.md#model-run-settings).
1919

2020
By default, the unique IDs of reasoning, text, and function call parts from the message history are sent to the model, which can result in errors like `"Item 'rs_123' of type 'reasoning' was provided without its required following item."`
2121
if the message history you're sending does not match exactly what was received from the Responses API in a previous response, for example if you're using a [history processor](message-history.md#processing-message-history).
22-
To disable this, you can set the `openai_send_reasoning_ids` field on [`OpenAIResponsesModelSettings`][pydantic_ai.models.openai.OpenAIResponsesModelSettings] to `False`.
22+
To disable this, you can disable the [`OpenAIResponsesModelSettings.openai_send_reasoning_ids`][pydantic_ai.models.openai.OpenAIResponsesModelSettings.openai_send_reasoning_ids] [model setting](agents.md#model-run-settings).
2323

2424
```python {title="openai_thinking_part.py"}
2525
from pydantic_ai import Agent
@@ -36,7 +36,7 @@ agent = Agent(model, model_settings=settings)
3636

3737
## Anthropic
3838

39-
To enable thinking, use the `anthropic_thinking` field in the [`AnthropicModelSettings`][pydantic_ai.models.anthropic.AnthropicModelSettings].
39+
To enable thinking, use the [`AnthropicModelSettings.anthropic_thinking`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_thinking] [model setting](agents.md#model-run-settings).
4040

4141
```python {title="anthropic_thinking_part.py"}
4242
from pydantic_ai import Agent
@@ -52,8 +52,7 @@ agent = Agent(model, model_settings=settings)
5252

5353
## Google
5454

55-
To enable thinking, use the `google_thinking_config` field in the
56-
[`GoogleModelSettings`][pydantic_ai.models.google.GoogleModelSettings].
55+
To enable thinking, use the [`GoogleModelSettings.google_thinking_config`][pydantic_ai.models.google.GoogleModelSettings.google_thinking_config] [model setting](agents.md#model-run-settings).
5756

5857
```python {title="google_thinking_part.py"}
5958
from pydantic_ai import Agent
@@ -75,8 +74,7 @@ Groq supports different formats to receive thinking parts:
7574
- `"hidden"`: The thinking part is not included in the text content.
7675
- `"parsed"`: The thinking part has its own structured part in the response which is converted into a [`ThinkingPart`][pydantic_ai.messages.ThinkingPart] object.
7776

78-
To enable thinking, use the `groq_reasoning_format` field in the
79-
[`GroqModelSettings`][pydantic_ai.models.groq.GroqModelSettings]:
77+
To enable thinking, use the [`GroqModelSettings.groq_reasoning_format`][pydantic_ai.models.groq.GroqModelSettings.groq_reasoning_format] [model setting](agents.md#model-run-settings):
8078

8179
```python {title="groq_thinking_part.py"}
8280
from pydantic_ai import Agent

pydantic_ai_slim/pydantic_ai/__init__.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,14 @@
99
UserPromptNode,
1010
capture_run_messages,
1111
)
12-
from .builtin_tools import CodeExecutionTool, UrlContextTool, WebSearchTool, WebSearchUserLocation
12+
from .builtin_tools import (
13+
CodeExecutionTool,
14+
ImageGenerationTool,
15+
MemoryTool,
16+
UrlContextTool,
17+
WebSearchTool,
18+
WebSearchUserLocation,
19+
)
1320
from .exceptions import (
1421
AgentRunError,
1522
ApprovalRequired,
@@ -30,11 +37,13 @@
3037
BaseToolCallPart,
3138
BaseToolReturnPart,
3239
BinaryContent,
40+
BinaryImage,
3341
BuiltinToolCallPart,
3442
BuiltinToolReturnPart,
3543
DocumentFormat,
3644
DocumentMediaType,
3745
DocumentUrl,
46+
FilePart,
3847
FileUrl,
3948
FinalResultEvent,
4049
FinishReason,
@@ -131,6 +140,7 @@
131140
'DocumentMediaType',
132141
'DocumentUrl',
133142
'FileUrl',
143+
'FilePart',
134144
'FinalResultEvent',
135145
'FinishReason',
136146
'FunctionToolCallEvent',
@@ -139,6 +149,7 @@
139149
'ImageFormat',
140150
'ImageMediaType',
141151
'ImageUrl',
152+
'BinaryImage',
142153
'ModelMessage',
143154
'ModelMessagesTypeAdapter',
144155
'ModelRequest',
@@ -197,6 +208,8 @@
197208
'WebSearchUserLocation',
198209
'UrlContextTool',
199210
'CodeExecutionTool',
211+
'ImageGenerationTool',
212+
'MemoryTool',
200213
# output
201214
'ToolOutput',
202215
'NativeOutput',

0 commit comments

Comments
 (0)