Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/input.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,28 @@ print(result.output)
#> The document discusses...
```

## Uploaded files
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just merged #3492 which (among other things) added an Uploaded Files section as well :)

Can you merge main and update that example to use the UploadedFile object? Keeping the section above the "user-side ..." section makes sense to me.


Use [`UploadedFile`][pydantic_ai.UploadedFile] when you've already uploaded content to the model provider.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the above, let's include examples of how to do that for all providers


- [`OpenAIChatModel`][pydantic_ai.models.openai.OpenAIChatModel] and [`OpenAIResponsesModel`][pydantic_ai.models.openai.OpenAIResponsesModel] accept an `openai.types.FileObject` or a file ID string returned by the OpenAI Files API.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it sound like the model class constructors accepts those types directly in an argument or something. Let's format it more clearly.

I think it'd also be nice to have subclasses of UploadedFile, so that we can hint a type other than Any

- [`GoogleModel`][pydantic_ai.models.google.GoogleModel] accepts a `google.genai.types.File` or a file URI string from the Gemini Files API.
- Other models currently raise `NotImplementedError` when they receive an `UploadedFile`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anthropic provide a client-side SDK for this? In the link I only see it being done with http requests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarruda It's not super discoverable, but all of those code samples have a "Shell" dropdown that also has a "Python" option. So yes there's an SDK for uploading files, and their objects for passing file URLs and binary data also have a file_id field that maps to the ID returned by the file upload SDK.


```py {title="uploaded_file_input.py" test="skip" lint="skip"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't skip linting

from pydantic_ai import Agent, UploadedFile

agent = Agent(model='openai:gpt-5')
result = agent.run_sync(
[
'Give me a short description of this image',
UploadedFile(file='file-abc123'), # file-abc123 is a file ID returned by the provider
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update the example to be more "real"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just meant that we can actually show the code for uploading a file using the provider SDK, and then passing in the return object/ID here instead of a fake ID

]
)
print(result.output)
#> The image is a simple design of a classic yellow smiley face...
```

## User-side download vs. direct file URL

As a general rule, when you provide a URL using any of `ImageUrl`, `AudioUrl`, `VideoUrl` or `DocumentUrl`, Pydantic AI downloads the file content and then sends it as part of the API request.
Expand Down
2 changes: 2 additions & 0 deletions pydantic_ai_slim/pydantic_ai/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@
ToolCallPartDelta,
ToolReturn,
ToolReturnPart,
UploadedFile,
UserContent,
UserPromptPart,
VideoFormat,
Expand Down Expand Up @@ -182,6 +183,7 @@
'ToolCallPartDelta',
'ToolReturn',
'ToolReturnPart',
'UploadedFile',
'UserContent',
'UserPromptPart',
'VideoFormat',
Expand Down
10 changes: 9 additions & 1 deletion pydantic_ai_slim/pydantic_ai/_otel_messages.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,20 @@ class BinaryDataPart(TypedDict):
content: NotRequired[str]


class UploadedFilePart(TypedDict):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a thing in OTel? We should use only things from their genai conventions

type: Literal['uploaded-file']
identifier: NotRequired[str]
file: NotRequired[str]


class ThinkingPart(TypedDict):
type: Literal['thinking']
content: NotRequired[str]


MessagePart: TypeAlias = 'TextPart | ToolCallPart | ToolCallResponsePart | MediaUrlPart | BinaryDataPart | ThinkingPart'
MessagePart: TypeAlias = (
'TextPart | ToolCallPart | ToolCallResponsePart | MediaUrlPart | BinaryDataPart | UploadedFilePart | ThinkingPart'
)


Role = Literal['system', 'user', 'assistant']
Expand Down
75 changes: 72 additions & 3 deletions pydantic_ai_slim/pydantic_ai/messages.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,16 @@ def _multi_modal_content_identifier(identifier: str | bytes) -> str:
return hashlib.sha1(identifier).hexdigest()[:6]


def _uploaded_file_identifier_source(file: Any) -> str:
if isinstance(file, str):
return file
for attr in ('id', 'uri', 'name'):
value = getattr(file, attr, None)
if isinstance(value, str):
return value
return repr(file)


@dataclass(init=False, repr=False)
class FileUrl(ABC):
"""Abstract base class for any URL-based file."""
Expand Down Expand Up @@ -633,6 +643,59 @@ def __init__(
raise ValueError('`BinaryImage` must be have a media type that starts with "image/"') # pragma: no cover


@dataclass(init=False, repr=False)
class UploadedFile:
"""File uploaded to the LLM provider.

Supported by [`OpenAIChatModel`][pydantic_ai.models.openai.OpenAIChatModel],
[`OpenAIResponsesModel`][pydantic_ai.models.openai.OpenAIResponsesModel], and
[`GoogleModel`][pydantic_ai.models.google.GoogleModel].

- For OpenAI-compatible models, provide an `openai.types.FileObject` or a file ID string returned by the Files API.
- For Gemini, provide a `google.genai.types.File` or the file URI string returned by the Files API.

Other models raise `NotImplementedError` when they receive this part.
"""

file: Any
"""A provider-specific file object, e.g. a file ID or a file URL."""

_: KW_ONLY

_identifier: Annotated[str | None, pydantic.Field(alias='identifier', default=None, exclude=True)] = field(
compare=False, default=None
)
"""Optional identifier for the uploaded file."""

kind: Literal['uploaded-file'] = 'uploaded-file'
"""Type identifier, this is available on all parts as a discriminator."""

def __init__(
self,
file: Any,
*,
identifier: str | None = None,
kind: Literal['uploaded-file'] = 'uploaded-file',
# Required for inline-snapshot which expects all dataclass `__init__` methods to take all field names as kwargs.
_identifier: str | None = None,
):
self.file = file
self._identifier = identifier or _identifier
self.kind = kind

@pydantic.computed_field
@property
def identifier(self) -> str:
"""Identifier for the uploaded file, usually derived from the provider's reference."""
identifier = self._identifier
if identifier is not None:
return identifier

return _multi_modal_content_identifier(_uploaded_file_identifier_source(self.file))

__repr__ = _utils.dataclasses_no_defaults_repr


@dataclass
class CachePoint:
"""A cache point marker for prompt caching.
Expand All @@ -656,7 +719,7 @@ class CachePoint:
* Anthropic. See https://docs.claude.com/en/docs/build-with-claude/prompt-caching#1-hour-cache-duration for more information."""


MultiModalContent = ImageUrl | AudioUrl | DocumentUrl | VideoUrl | BinaryContent
MultiModalContent = ImageUrl | AudioUrl | DocumentUrl | VideoUrl | BinaryContent | UploadedFile
UserContent: TypeAlias = str | MultiModalContent | CachePoint


Expand Down Expand Up @@ -774,11 +837,17 @@ def otel_message_parts(self, settings: InstrumentationSettings) -> list[_otel_me
if settings.include_content and settings.include_binary_content:
converted_part['content'] = base64.b64encode(part.data).decode()
parts.append(converted_part)
elif isinstance(part, UploadedFile):
uploaded_part: _otel_messages.UploadedFilePart = {
'type': 'uploaded-file',
'identifier': part.identifier,
}
if settings.include_content:
uploaded_part['file'] = _uploaded_file_identifier_source(part.file)
parts.append(uploaded_part)
elif isinstance(part, CachePoint):
# CachePoint is a marker, not actual content - skip it for otel
pass
else:
parts.append({'type': part.kind}) # pragma: no cover
return parts

__repr__ = _utils.dataclasses_no_defaults_repr
Expand Down
3 changes: 3 additions & 0 deletions pydantic_ai_slim/pydantic_ai/models/bedrock.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
ThinkingPart,
ToolCallPart,
ToolReturnPart,
UploadedFile,
UserPromptPart,
VideoUrl,
_utils,
Expand Down Expand Up @@ -676,6 +677,8 @@ async def _map_user_prompt(part: UserPromptPart, document_count: Iterator[int])
content.append({'video': video})
elif isinstance(item, AudioUrl): # pragma: no cover
raise NotImplementedError('Audio is not supported yet.')
elif isinstance(item, UploadedFile):
raise NotImplementedError('Uploaded files are not supported yet.')
elif isinstance(item, CachePoint):
# Bedrock support has not been implemented yet: https://github.com/pydantic/pydantic-ai/issues/3418
pass
Expand Down
3 changes: 3 additions & 0 deletions pydantic_ai_slim/pydantic_ai/models/gemini.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
ThinkingPart,
ToolCallPart,
ToolReturnPart,
UploadedFile,
UserPromptPart,
VideoUrl,
)
Expand Down Expand Up @@ -392,6 +393,8 @@ async def _map_user_prompt(self, part: UserPromptPart) -> list[_GeminiPartUnion]
else: # pragma: lax no cover
file_data = _GeminiFileDataPart(file_data={'file_uri': item.url, 'mime_type': item.media_type})
content.append(file_data)
elif isinstance(item, UploadedFile):
raise NotImplementedError('Uploaded files are not supported for GeminiModel.')
elif isinstance(item, CachePoint):
# Gemini doesn't support prompt caching via CachePoint
pass
Expand Down
29 changes: 29 additions & 0 deletions pydantic_ai_slim/pydantic_ai/models/google.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
ThinkingPart,
ToolCallPart,
ToolReturnPart,
UploadedFile,
UserPromptPart,
VideoUrl,
)
Expand Down Expand Up @@ -62,6 +63,7 @@
CountTokensConfigDict,
ExecutableCode,
ExecutableCodeDict,
File,
FileDataDict,
FinishReason as GoogleFinishReason,
FunctionCallDict,
Expand Down Expand Up @@ -628,13 +630,40 @@ async def _map_user_prompt(self, part: UserPromptPart) -> list[PartDict]:
else:
file_data_dict: FileDataDict = {'file_uri': item.url, 'mime_type': item.media_type}
content.append({'file_data': file_data_dict}) # pragma: lax no cover
elif isinstance(item, UploadedFile):
content.append({'file_data': self._map_uploaded_file(item)})
elif isinstance(item, CachePoint):
# Google Gemini doesn't support prompt caching via CachePoint
pass
else:
assert_never(item)
return content

@staticmethod
def _map_uploaded_file(item: UploadedFile) -> FileDataDict:
"""Convert an UploadedFile into the structure expected by Gemini."""
file = item.file
if isinstance(file, File):
file_uri = file.uri
mime_type = file.mime_type
display_name = getattr(file, 'display_name', None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why getattr instead of a regular attr read?

elif isinstance(file, str):
file_uri = file
mime_type = None
display_name = None
else:
raise UserError('UploadedFile.file must be a genai.types.File or file URI string')

if not file_uri:
raise UserError('UploadedFile.file must include a file URI')

file_data: FileDataDict = {'file_uri': file_uri}
if mime_type:
file_data['mime_type'] = mime_type
if display_name:
file_data['display_name'] = display_name
return file_data

def _map_response_schema(self, o: OutputObjectDefinition) -> dict[str, Any]:
response_schema = o.json_schema.copy()
if o.name:
Expand Down
3 changes: 3 additions & 0 deletions pydantic_ai_slim/pydantic_ai/models/huggingface.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
ThinkingPart,
ToolCallPart,
ToolReturnPart,
UploadedFile,
UserPromptPart,
VideoUrl,
)
Expand Down Expand Up @@ -448,6 +449,8 @@ async def _map_user_prompt(part: UserPromptPart) -> ChatCompletionInputMessage:
raise NotImplementedError('DocumentUrl is not supported for Hugging Face')
elif isinstance(item, VideoUrl):
raise NotImplementedError('VideoUrl is not supported for Hugging Face')
elif isinstance(item, UploadedFile):
raise NotImplementedError('Uploaded files are not supported for Hugging Face')
elif isinstance(item, CachePoint):
# Hugging Face doesn't support prompt caching via CachePoint
pass
Expand Down
27 changes: 24 additions & 3 deletions pydantic_ai_slim/pydantic_ai/models/openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
ThinkingPart,
ToolCallPart,
ToolReturnPart,
UploadedFile,
UserPromptPart,
VideoUrl,
)
Expand All @@ -56,7 +57,7 @@

try:
from openai import NOT_GIVEN, APIConnectionError, APIStatusError, AsyncOpenAI, AsyncStream
from openai.types import AllModels, chat, responses
from openai.types import AllModels, FileObject, chat, responses
from openai.types.chat import (
ChatCompletionChunk,
ChatCompletionContentPartImageParam,
Expand Down Expand Up @@ -977,6 +978,9 @@ async def _map_user_prompt(self, part: UserPromptPart) -> chat.ChatCompletionUse
type='file',
)
)
elif isinstance(item, UploadedFile):
file_id = _map_uploaded_file(item, self._provider)
content.append(File(file=FileFile(file_id=file_id), type='file'))
elif isinstance(item, VideoUrl): # pragma: no cover
raise NotImplementedError('VideoUrl is not supported for OpenAI')
elif isinstance(item, CachePoint):
Expand Down Expand Up @@ -1733,8 +1737,7 @@ def _map_json_schema(self, o: OutputObjectDefinition) -> responses.ResponseForma
response_format_param['strict'] = o.strict
return response_format_param

@staticmethod
async def _map_user_prompt(part: UserPromptPart) -> responses.EasyInputMessageParam: # noqa: C901
async def _map_user_prompt(self, part: UserPromptPart) -> responses.EasyInputMessageParam: # noqa: C901
content: str | list[responses.ResponseInputContentParam]
if isinstance(part.content, str):
content = part.content
Expand Down Expand Up @@ -1807,6 +1810,9 @@ async def _map_user_prompt(part: UserPromptPart) -> responses.EasyInputMessagePa
filename=f'filename.{downloaded_item["data_type"]}',
)
)
elif isinstance(item, UploadedFile):
file_id = _map_uploaded_file(item, self._provider)
content.append(responses.ResponseInputFileParam(file_id=file_id, type='input_file'))
elif isinstance(item, VideoUrl): # pragma: no cover
raise NotImplementedError('VideoUrl is not supported for OpenAI.')
elif isinstance(item, CachePoint):
Expand Down Expand Up @@ -2324,6 +2330,21 @@ def _map_usage(
)


def _map_uploaded_file(uploaded_file: UploadedFile, _provider: Provider[Any]) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like we need the provider?

"""Map an UploadedFile to a file ID understood by OpenAI-compatible APIs."""
file = uploaded_file.file
if isinstance(file, str):
return file
if isinstance(file, FileObject):
return file.id

file_id = getattr(file, 'id', None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to support arbitrary objects with an id; rather just the types allowed on the future OpenAIUploadedFile: str and FileObject

if isinstance(file_id, str):
return file_id

raise UserError('UploadedFile.file must be a file ID string or an object with an `id` attribute')


def _map_provider_details(
choice: chat_completion_chunk.Choice | chat_completion.Choice,
) -> dict[str, Any]:
Expand Down
Loading
Loading