-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add basic support for UploadedFile UserContent #2611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
2cb4086
6abc260
2c3c6a0
e05ea0c
af6b2a1
ffa6a57
0d6e486
00fdb5a
89bfece
13a1e69
6e8dd1d
eb0d4f3
d55377f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -102,6 +102,28 @@ print(result.output) | |
| #> The document discusses... | ||
| ``` | ||
|
|
||
| ## Uploaded files | ||
|
|
||
| Use [`UploadedFile`][pydantic_ai.UploadedFile] when you've already uploaded content to the model provider. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Related to the above, let's include examples of how to do that for all providers |
||
|
|
||
| - [`OpenAIChatModel`][pydantic_ai.models.openai.OpenAIChatModel] and [`OpenAIResponsesModel`][pydantic_ai.models.openai.OpenAIResponsesModel] accept an `openai.types.FileObject` or a file ID string returned by the OpenAI Files API. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This makes it sound like the model class constructors accepts those types directly in an argument or something. Let's format it more clearly. I think it'd also be nice to have subclasses of |
||
| - [`GoogleModel`][pydantic_ai.models.google.GoogleModel] accepts a `google.genai.types.File` or a file URI string from the Gemini Files API. | ||
| - Other models currently raise `NotImplementedError` when they receive an `UploadedFile`. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's support Anthropic as well: https://platform.claude.com/docs/en/build-with-claude/files
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does anthropic provide a client-side SDK for this? In the link I only see it being done with http requests.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tarruda It's not super discoverable, but all of those code samples have a "Shell" dropdown that also has a "Python" option. So yes there's an SDK for uploading files, and their objects for passing file URLs and binary data also have a |
||
|
|
||
| ```py {title="uploaded_file_input.py" test="skip" lint="skip"} | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please don't skip linting |
||
| from pydantic_ai import Agent, UploadedFile | ||
|
|
||
| agent = Agent(model='openai:gpt-5') | ||
| result = agent.run_sync( | ||
| [ | ||
| 'Give me a short description of this image', | ||
| UploadedFile(file='file-abc123'), # file-abc123 is a file ID returned by the provider | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's update the example to be more "real"
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you elaborate?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just meant that we can actually show the code for uploading a file using the provider SDK, and then passing in the return object/ID here instead of a fake ID |
||
| ] | ||
| ) | ||
| print(result.output) | ||
| #> The image is a simple design of a classic yellow smiley face... | ||
| ``` | ||
|
|
||
| ## User-side download vs. direct file URL | ||
|
|
||
| As a general rule, when you provide a URL using any of `ImageUrl`, `AudioUrl`, `VideoUrl` or `DocumentUrl`, Pydantic AI downloads the file content and then sends it as part of the API request. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -43,12 +43,20 @@ class BinaryDataPart(TypedDict): | |
| content: NotRequired[str] | ||
|
|
||
|
|
||
| class UploadedFilePart(TypedDict): | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this really a thing in OTel? We should use only things from their genai conventions |
||
| type: Literal['uploaded-file'] | ||
| identifier: NotRequired[str] | ||
| file: NotRequired[str] | ||
|
|
||
|
|
||
| class ThinkingPart(TypedDict): | ||
| type: Literal['thinking'] | ||
| content: NotRequired[str] | ||
|
|
||
|
|
||
| MessagePart: TypeAlias = 'TextPart | ToolCallPart | ToolCallResponsePart | MediaUrlPart | BinaryDataPart | ThinkingPart' | ||
| MessagePart: TypeAlias = ( | ||
| 'TextPart | ToolCallPart | ToolCallResponsePart | MediaUrlPart | BinaryDataPart | UploadedFilePart | ThinkingPart' | ||
| ) | ||
|
|
||
|
|
||
| Role = Literal['system', 'user', 'assistant'] | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -34,6 +34,7 @@ | |
| ThinkingPart, | ||
| ToolCallPart, | ||
| ToolReturnPart, | ||
| UploadedFile, | ||
| UserPromptPart, | ||
| VideoUrl, | ||
| ) | ||
|
|
@@ -62,6 +63,7 @@ | |
| CountTokensConfigDict, | ||
| ExecutableCode, | ||
| ExecutableCodeDict, | ||
| File, | ||
| FileDataDict, | ||
| FinishReason as GoogleFinishReason, | ||
| FunctionCallDict, | ||
|
|
@@ -628,13 +630,40 @@ async def _map_user_prompt(self, part: UserPromptPart) -> list[PartDict]: | |
| else: | ||
| file_data_dict: FileDataDict = {'file_uri': item.url, 'mime_type': item.media_type} | ||
| content.append({'file_data': file_data_dict}) # pragma: lax no cover | ||
| elif isinstance(item, UploadedFile): | ||
| content.append({'file_data': self._map_uploaded_file(item)}) | ||
| elif isinstance(item, CachePoint): | ||
| # Google Gemini doesn't support prompt caching via CachePoint | ||
| pass | ||
| else: | ||
| assert_never(item) | ||
| return content | ||
|
|
||
| @staticmethod | ||
| def _map_uploaded_file(item: UploadedFile) -> FileDataDict: | ||
| """Convert an UploadedFile into the structure expected by Gemini.""" | ||
| file = item.file | ||
| if isinstance(file, File): | ||
| file_uri = file.uri | ||
| mime_type = file.mime_type | ||
| display_name = getattr(file, 'display_name', None) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why getattr instead of a regular attr read? |
||
| elif isinstance(file, str): | ||
| file_uri = file | ||
| mime_type = None | ||
| display_name = None | ||
| else: | ||
| raise UserError('UploadedFile.file must be a genai.types.File or file URI string') | ||
|
|
||
| if not file_uri: | ||
| raise UserError('UploadedFile.file must include a file URI') | ||
|
|
||
| file_data: FileDataDict = {'file_uri': file_uri} | ||
| if mime_type: | ||
| file_data['mime_type'] = mime_type | ||
| if display_name: | ||
| file_data['display_name'] = display_name | ||
| return file_data | ||
|
|
||
| def _map_response_schema(self, o: OutputObjectDefinition) -> dict[str, Any]: | ||
| response_schema = o.json_schema.copy() | ||
| if o.name: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -44,6 +44,7 @@ | |
| ThinkingPart, | ||
| ToolCallPart, | ||
| ToolReturnPart, | ||
| UploadedFile, | ||
| UserPromptPart, | ||
| VideoUrl, | ||
| ) | ||
|
|
@@ -56,7 +57,7 @@ | |
|
|
||
| try: | ||
| from openai import NOT_GIVEN, APIConnectionError, APIStatusError, AsyncOpenAI, AsyncStream | ||
| from openai.types import AllModels, chat, responses | ||
| from openai.types import AllModels, FileObject, chat, responses | ||
| from openai.types.chat import ( | ||
| ChatCompletionChunk, | ||
| ChatCompletionContentPartImageParam, | ||
|
|
@@ -977,6 +978,9 @@ async def _map_user_prompt(self, part: UserPromptPart) -> chat.ChatCompletionUse | |
| type='file', | ||
| ) | ||
| ) | ||
| elif isinstance(item, UploadedFile): | ||
| file_id = _map_uploaded_file(item, self._provider) | ||
| content.append(File(file=FileFile(file_id=file_id), type='file')) | ||
| elif isinstance(item, VideoUrl): # pragma: no cover | ||
| raise NotImplementedError('VideoUrl is not supported for OpenAI') | ||
| elif isinstance(item, CachePoint): | ||
|
|
@@ -1733,8 +1737,7 @@ def _map_json_schema(self, o: OutputObjectDefinition) -> responses.ResponseForma | |
| response_format_param['strict'] = o.strict | ||
| return response_format_param | ||
|
|
||
| @staticmethod | ||
| async def _map_user_prompt(part: UserPromptPart) -> responses.EasyInputMessageParam: # noqa: C901 | ||
| async def _map_user_prompt(self, part: UserPromptPart) -> responses.EasyInputMessageParam: # noqa: C901 | ||
| content: str | list[responses.ResponseInputContentParam] | ||
| if isinstance(part.content, str): | ||
| content = part.content | ||
|
|
@@ -1807,6 +1810,9 @@ async def _map_user_prompt(part: UserPromptPart) -> responses.EasyInputMessagePa | |
| filename=f'filename.{downloaded_item["data_type"]}', | ||
| ) | ||
| ) | ||
| elif isinstance(item, UploadedFile): | ||
| file_id = _map_uploaded_file(item, self._provider) | ||
| content.append(responses.ResponseInputFileParam(file_id=file_id, type='input_file')) | ||
| elif isinstance(item, VideoUrl): # pragma: no cover | ||
| raise NotImplementedError('VideoUrl is not supported for OpenAI.') | ||
| elif isinstance(item, CachePoint): | ||
|
|
@@ -2324,6 +2330,21 @@ def _map_usage( | |
| ) | ||
|
|
||
|
|
||
| def _map_uploaded_file(uploaded_file: UploadedFile, _provider: Provider[Any]) -> str: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doesn't look like we need the provider? |
||
| """Map an UploadedFile to a file ID understood by OpenAI-compatible APIs.""" | ||
| file = uploaded_file.file | ||
| if isinstance(file, str): | ||
| return file | ||
| if isinstance(file, FileObject): | ||
| return file.id | ||
|
|
||
| file_id = getattr(file, 'id', None) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need to support arbitrary objects with an id; rather just the types allowed on the future |
||
| if isinstance(file_id, str): | ||
| return file_id | ||
|
|
||
| raise UserError('UploadedFile.file must be a file ID string or an object with an `id` attribute') | ||
|
|
||
|
|
||
| def _map_provider_details( | ||
| choice: chat_completion_chunk.Choice | chat_completion.Choice, | ||
| ) -> dict[str, Any]: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just merged #3492 which (among other things) added an
Uploaded Filessection as well :)Can you merge main and update that example to use the
UploadedFileobject? Keeping the section above the "user-side ..." section makes sense to me.