Skip to content

Commit 680554c

Browse files
authored
Image, file output types for functions (openai#1898)
To allow the new output types for image/file, you can now return one of the three new types (or lists of those types, or even a typed dict version). If you use those, we'll convert to the correct tool call output type. Resolves openai#1850
1 parent cfddc7c commit 680554c

File tree

12 files changed

+398
-30
lines changed

12 files changed

+398
-30
lines changed

docs/tools.md

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,14 @@ for tool in agent.tools:
173173
}
174174
```
175175

176+
### Returning images or files from function tools
177+
178+
In addition to returning text outputs, you can return one or many images or files as the output of a function tool. To do so, you can return any of:
179+
180+
- Images: [`ToolOutputImage`][agents.tool.ToolOutputImage] (or the TypedDict version, [`ToolOutputImageDict`][agents.tool.ToolOutputImageDict])
181+
- Files: [`ToolOutputFileContent`][agents.tool.ToolOutputFileContent] (or the TypedDict version, [`ToolOutputFileContentDict`][agents.tool.ToolOutputFileContentDict])
182+
- Text: either a string or stringable objects, or [`ToolOutputText`][agents.tool.ToolOutputText] (or the TypedDict version, [`ToolOutputTextDict`][agents.tool.ToolOutputTextDict])
183+
176184
### Custom function tools
177185

178186
Sometimes, you don't want to use a Python function as a tool. You can directly create a [`FunctionTool`][agents.tool.FunctionTool] if you prefer. You'll need to provide:
@@ -288,9 +296,9 @@ async def run_my_agent() -> str:
288296

289297
In certain cases, you might want to modify the output of the tool-agents before returning it to the central agent. This may be useful if you want to:
290298

291-
- Extract a specific piece of information (e.g., a JSON payload) from the sub-agent's chat history.
292-
- Convert or reformat the agent’s final answer (e.g., transform Markdown into plain text or CSV).
293-
- Validate the output or provide a fallback value when the agent’s response is missing or malformed.
299+
- Extract a specific piece of information (e.g., a JSON payload) from the sub-agent's chat history.
300+
- Convert or reformat the agent’s final answer (e.g., transform Markdown into plain text or CSV).
301+
- Validate the output or provide a fallback value when the agent’s response is missing or malformed.
294302

295303
You can do this by supplying the `custom_output_extractor` argument to the `as_tool` method:
296304

@@ -370,16 +378,16 @@ asyncio.run(main())
370378

371379
The `is_enabled` parameter accepts:
372380

373-
- **Boolean values**: `True` (always enabled) or `False` (always disabled)
374-
- **Callable functions**: Functions that take `(context, agent)` and return a boolean
375-
- **Async functions**: Async functions for complex conditional logic
381+
- **Boolean values**: `True` (always enabled) or `False` (always disabled)
382+
- **Callable functions**: Functions that take `(context, agent)` and return a boolean
383+
- **Async functions**: Async functions for complex conditional logic
376384

377385
Disabled tools are completely hidden from the LLM at runtime, making this useful for:
378386

379-
- Feature gating based on user permissions
380-
- Environment-specific tool availability (dev vs prod)
381-
- A/B testing different tool configurations
382-
- Dynamic tool filtering based on runtime state
387+
- Feature gating based on user permissions
388+
- Environment-specific tool availability (dev vs prod)
389+
- A/B testing different tool configurations
390+
- Dynamic tool filtering based on runtime state
383391

384392
## Handling errors in function tools
385393

examples/basic/dynamic_system_prompt.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ def custom_instructions(
2828
instructions=custom_instructions,
2929
)
3030

31+
3132
async def main():
3233
context = CustomContext(style=random.choice(["haiku", "pirate", "robot"]))
3334
print(f"Using style: {context.style}\n")
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
import asyncio
2+
3+
from agents import Agent, Runner, ToolOutputImage, ToolOutputImageDict, function_tool
4+
5+
return_typed_dict = True
6+
7+
8+
@function_tool
9+
def fetch_random_image() -> ToolOutputImage | ToolOutputImageDict:
10+
"""Fetch a random image."""
11+
12+
print("Image tool called")
13+
if return_typed_dict:
14+
return {
15+
"type": "image",
16+
"image_url": "https://upload.wikimedia.org/wikipedia/commons/0/0c/GoldenGateBridge-001.jpg",
17+
"detail": "auto",
18+
}
19+
20+
return ToolOutputImage(
21+
image_url="https://upload.wikimedia.org/wikipedia/commons/0/0c/GoldenGateBridge-001.jpg",
22+
detail="auto",
23+
)
24+
25+
26+
async def main():
27+
agent = Agent(
28+
name="Assistant",
29+
instructions="You are a helpful assistant.",
30+
tools=[fetch_random_image],
31+
)
32+
33+
result = await Runner.run(
34+
agent,
35+
input="Fetch an image using the random_image tool, then describe it",
36+
)
37+
print(result.final_output)
38+
"""The image shows the iconic Golden Gate Bridge, a large suspension bridge painted in a
39+
bright reddish-orange color..."""
40+
41+
42+
if __name__ == "__main__":
43+
asyncio.run(main())

examples/basic/tools.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ def get_weather(city: Annotated[str, "The city to get the weather for"]) -> Weat
1818
print("[debug] get_weather called")
1919
return Weather(city=city, temperature_range="14-20C", conditions="Sunny with wind.")
2020

21+
2122
agent = Agent(
2223
name="Hello world",
2324
instructions="You are a helpful agent.",

src/agents/__init__.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,12 @@
8181
MCPToolApprovalFunctionResult,
8282
MCPToolApprovalRequest,
8383
Tool,
84+
ToolOutputFileContent,
85+
ToolOutputFileContentDict,
86+
ToolOutputImage,
87+
ToolOutputImageDict,
88+
ToolOutputText,
89+
ToolOutputTextDict,
8490
WebSearchTool,
8591
default_tool_error_function,
8692
function_tool,
@@ -273,6 +279,12 @@ def enable_verbose_stdout_logging():
273279
"MCPToolApprovalFunction",
274280
"MCPToolApprovalRequest",
275281
"MCPToolApprovalFunctionResult",
282+
"ToolOutputText",
283+
"ToolOutputTextDict",
284+
"ToolOutputImage",
285+
"ToolOutputImageDict",
286+
"ToolOutputFileContent",
287+
"ToolOutputFileContentDict",
276288
"function_tool",
277289
"Usage",
278290
"add_trace_processor",

src/agents/_run_impl.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -832,7 +832,7 @@ async def run_single_tool(
832832
output=result,
833833
run_item=ToolCallOutputItem(
834834
output=result,
835-
raw_item=ItemHelpers.tool_call_output_item(tool_run.tool_call, str(result)),
835+
raw_item=ItemHelpers.tool_call_output_item(tool_run.tool_call, result),
836836
agent=agent,
837837
),
838838
)

src/agents/extensions/memory/__init__.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,6 @@ def __getattr__(name: str) -> Any:
5858

5959
return AdvancedSQLiteSession
6060
except ModuleNotFoundError as e:
61-
raise ImportError(
62-
f"Failed to import AdvancedSQLiteSession: {e}"
63-
) from e
61+
raise ImportError(f"Failed to import AdvancedSQLiteSession: {e}") from e
6462

6563
raise AttributeError(f"module {__name__} has no attribute {name}")

src/agents/items.py

Lines changed: 100 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,12 @@
2121
from openai.types.responses.response_code_interpreter_tool_call import (
2222
ResponseCodeInterpreterToolCall,
2323
)
24+
from openai.types.responses.response_function_call_output_item_list_param import (
25+
ResponseFunctionCallOutputItemListParam,
26+
ResponseFunctionCallOutputItemParam,
27+
)
28+
from openai.types.responses.response_input_file_content_param import ResponseInputFileContentParam
29+
from openai.types.responses.response_input_image_content_param import ResponseInputImageContentParam
2430
from openai.types.responses.response_input_item_param import (
2531
ComputerCallOutput,
2632
FunctionCallOutput,
@@ -36,9 +42,17 @@
3642
)
3743
from openai.types.responses.response_reasoning_item import ResponseReasoningItem
3844
from pydantic import BaseModel
39-
from typing_extensions import TypeAlias
45+
from typing_extensions import TypeAlias, assert_never
4046

4147
from .exceptions import AgentsException, ModelBehaviorError
48+
from .logger import logger
49+
from .tool import (
50+
ToolOutputFileContent,
51+
ToolOutputImage,
52+
ToolOutputText,
53+
ValidToolOutputPydanticModels,
54+
ValidToolOutputPydanticModelsTypeAdapter,
55+
)
4256
from .usage import Usage
4357

4458
if TYPE_CHECKING:
@@ -298,11 +312,93 @@ def text_message_output(cls, message: MessageOutputItem) -> str:
298312

299313
@classmethod
300314
def tool_call_output_item(
301-
cls, tool_call: ResponseFunctionToolCall, output: str
315+
cls, tool_call: ResponseFunctionToolCall, output: Any
302316
) -> FunctionCallOutput:
303-
"""Creates a tool call output item from a tool call and its output."""
317+
"""Creates a tool call output item from a tool call and its output.
318+
319+
Accepts either plain values (stringified) or structured outputs using
320+
input_text/input_image/input_file shapes. Structured outputs may be
321+
provided as Pydantic models or dicts, or an iterable of such items.
322+
"""
323+
324+
converted_output = cls._convert_tool_output(output)
325+
304326
return {
305327
"call_id": tool_call.call_id,
306-
"output": output,
328+
"output": converted_output,
307329
"type": "function_call_output",
308330
}
331+
332+
@classmethod
333+
def _convert_tool_output(cls, output: Any) -> str | ResponseFunctionCallOutputItemListParam:
334+
"""Converts a tool return value into an output acceptable by the Responses API."""
335+
336+
# If the output is either a single or list of the known structured output types, convert to
337+
# ResponseFunctionCallOutputItemListParam. Else, just stringify.
338+
if isinstance(output, (list, tuple)):
339+
maybe_converted_output_list = [
340+
cls._maybe_get_output_as_structured_function_output(item) for item in output
341+
]
342+
if all(maybe_converted_output_list):
343+
return [
344+
cls._convert_single_tool_output_pydantic_model(item)
345+
for item in maybe_converted_output_list
346+
if item is not None
347+
]
348+
else:
349+
return str(output)
350+
else:
351+
maybe_converted_output = cls._maybe_get_output_as_structured_function_output(output)
352+
if maybe_converted_output:
353+
return [cls._convert_single_tool_output_pydantic_model(maybe_converted_output)]
354+
else:
355+
return str(output)
356+
357+
@classmethod
358+
def _maybe_get_output_as_structured_function_output(
359+
cls, output: Any
360+
) -> ValidToolOutputPydanticModels | None:
361+
if isinstance(output, (ToolOutputText, ToolOutputImage, ToolOutputFileContent)):
362+
return output
363+
elif isinstance(output, dict):
364+
try:
365+
return ValidToolOutputPydanticModelsTypeAdapter.validate_python(output)
366+
except pydantic.ValidationError:
367+
logger.debug("dict was not a valid tool output pydantic model")
368+
return None
369+
370+
return None
371+
372+
@classmethod
373+
def _convert_single_tool_output_pydantic_model(
374+
cls, output: ValidToolOutputPydanticModels
375+
) -> ResponseFunctionCallOutputItemParam:
376+
if isinstance(output, ToolOutputText):
377+
return {"type": "input_text", "text": output.text}
378+
elif isinstance(output, ToolOutputImage):
379+
# Forward all provided optional fields so the Responses API receives
380+
# the correct identifiers and settings for the image resource.
381+
result: ResponseInputImageContentParam = {"type": "input_image"}
382+
if output.image_url is not None:
383+
result["image_url"] = output.image_url
384+
if output.file_id is not None:
385+
result["file_id"] = output.file_id
386+
if output.detail is not None:
387+
result["detail"] = output.detail
388+
return result
389+
elif isinstance(output, ToolOutputFileContent):
390+
# Forward all provided optional fields so the Responses API receives
391+
# the correct identifiers and metadata for the file resource.
392+
result_file: ResponseInputFileContentParam = {"type": "input_file"}
393+
if output.file_data is not None:
394+
result_file["file_data"] = output.file_data
395+
if output.file_url is not None:
396+
result_file["file_url"] = output.file_url
397+
if output.file_id is not None:
398+
result_file["file_id"] = output.file_id
399+
if output.filename is not None:
400+
result_file["filename"] = output.filename
401+
return result_file
402+
else:
403+
assert_never(output)
404+
raise ValueError(f"Unexpected tool output type: {output}")

src/agents/tool.py

Lines changed: 72 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,13 @@
1515
from openai.types.responses.tool_param import CodeInterpreter, ImageGeneration, Mcp
1616
from openai.types.responses.web_search_tool import Filters as WebSearchToolFilters
1717
from openai.types.responses.web_search_tool_param import UserLocation
18-
from pydantic import ValidationError
18+
from pydantic import BaseModel, TypeAdapter, ValidationError
1919
from typing_extensions import Concatenate, NotRequired, ParamSpec, TypedDict
2020

2121
from . import _debug
2222
from .computer import AsyncComputer, Computer
2323
from .exceptions import ModelBehaviorError
2424
from .function_schema import DocstringStyle, function_schema
25-
from .items import RunItem
2625
from .logger import logger
2726
from .run_context import RunContextWrapper
2827
from .strict_schema import ensure_strict_json_schema
@@ -34,6 +33,8 @@
3433

3534
if TYPE_CHECKING:
3635
from .agent import Agent, AgentBase
36+
from .items import RunItem
37+
3738

3839
ToolParams = ParamSpec("ToolParams")
3940

@@ -48,6 +49,72 @@
4849
]
4950

5051

52+
class ToolOutputText(BaseModel):
53+
"""Represents a tool output that should be sent to the model as text."""
54+
55+
type: Literal["text"] = "text"
56+
text: str
57+
58+
59+
class ToolOutputTextDict(TypedDict, total=False):
60+
"""TypedDict variant for text tool outputs."""
61+
62+
type: Literal["text"]
63+
text: str
64+
65+
66+
class ToolOutputImage(BaseModel):
67+
"""Represents a tool output that should be sent to the model as an image.
68+
69+
You can provide either an `image_url` (URL or data URL) or a `file_id` for previously uploaded
70+
content. The optional `detail` can control vision detail.
71+
"""
72+
73+
type: Literal["image"] = "image"
74+
image_url: str | None = None
75+
file_id: str | None = None
76+
detail: Literal["low", "high", "auto"] | None = None
77+
78+
79+
class ToolOutputImageDict(TypedDict, total=False):
80+
"""TypedDict variant for image tool outputs."""
81+
82+
type: Literal["image"]
83+
image_url: NotRequired[str]
84+
file_id: NotRequired[str]
85+
detail: NotRequired[Literal["low", "high", "auto"]]
86+
87+
88+
class ToolOutputFileContent(BaseModel):
89+
"""Represents a tool output that should be sent to the model as a file.
90+
91+
Provide one of `file_data` (base64), `file_url`, or `file_id`. You may also
92+
provide an optional `filename` when using `file_data` to hint file name.
93+
"""
94+
95+
type: Literal["file"] = "file"
96+
file_data: str | None = None
97+
file_url: str | None = None
98+
file_id: str | None = None
99+
filename: str | None = None
100+
101+
102+
class ToolOutputFileContentDict(TypedDict, total=False):
103+
"""TypedDict variant for file content tool outputs."""
104+
105+
type: Literal["file"]
106+
file_data: NotRequired[str]
107+
file_url: NotRequired[str]
108+
file_id: NotRequired[str]
109+
filename: NotRequired[str]
110+
111+
112+
ValidToolOutputPydanticModels = Union[ToolOutputText, ToolOutputImage, ToolOutputFileContent]
113+
ValidToolOutputPydanticModelsTypeAdapter: TypeAdapter[ValidToolOutputPydanticModels] = TypeAdapter(
114+
ValidToolOutputPydanticModels
115+
)
116+
117+
51118
@dataclass
52119
class FunctionToolResult:
53120
tool: FunctionTool
@@ -81,7 +148,9 @@ class FunctionTool:
81148
1. The tool run context.
82149
2. The arguments from the LLM, as a JSON string.
83150
84-
You must return a string representation of the tool output, or something we can call `str()` on.
151+
You must return a one of the structured tool output types (e.g. ToolOutputText, ToolOutputImage,
152+
ToolOutputFileContent) or a string representation of the tool output, or a list of them,
153+
or something we can call `str()` on.
85154
In case of errors, you can either raise an Exception (which will cause the run to fail) or
86155
return a string error message (which will be sent back to the LLM).
87156
"""

0 commit comments

Comments
 (0)