-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add support for aspect ratio in gemini image generation #3412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e7b6dec to
9a3d8b0
Compare
docs/builtin-tools.md
Outdated
|
|
||
| _(This example is complete, it can be run "as is")_ | ||
|
|
||
| To control the aspect ratio when using Gemini image models, include the `ImageGenerationTool` explicitly: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this example under "Configuration Options" please
docs/builtin-tools.md
Outdated
| |----------|-----------|-------| | ||
| | OpenAI Responses | ✅ | Full feature support. Only supported by models newer than `gpt-5`. Metadata about the generated image, like the [`revised_prompt`](https://platform.openai.com/docs/guides/tools-image-generation#revised-prompt) sent to the underlying image model, is available on the [`BuiltinToolReturnPart`][pydantic_ai.messages.BuiltinToolReturnPart] that's available via [`ModelResponse.builtin_tool_calls`][pydantic_ai.messages.ModelResponse.builtin_tool_calls]. | | ||
| | Google | ✅ | No parameter support. Only supported by [image generation models](https://ai.google.dev/gemini-api/docs/image-generation) like `gemini-2.5-flash-image`. These models do not support [structured output](output.md) or [function tools](tools.md). These models will always generate images, even if this built-in tool is not explicitly specified. | | ||
| | Google | ✅ | Supports the `aspect_ratio` parameter when explicitly provided. Only supported by [image generation models](https://ai.google.dev/gemini-api/docs/image-generation) like `gemini-2.5-flash-image`. These models do not support [structured output](output.md) or [function tools](tools.md) and will always generate images, even if this built-in tool is not explicitly specified. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say "Limited parameter support" like we do further up for web search.
| Supported by: | ||
| * Google image-generation models (Gemini) when the tool is explicitly enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can drop "when the tool is explicitly enabled." as that's implied by this being on that builtin tool class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could support some of these values for OpenAI as well by mapping to one of the size options. Then we'd only need to raise an error from OpenAI if another value is used, or if size and aspect_ratio are used at the same time.
| ) | ||
| if tool.aspect_ratio: | ||
| if image_config and image_config.get('aspect_ratio') != tool.aspect_ratio: | ||
| raise UserError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really only support a single instance anyway, so we can drop this and just always set image_config
| response_schema=response_schema, | ||
| response_modalities=modalities, | ||
| ) | ||
| config: GenerateContentConfigDict = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we have to change how this is built?
DouweM
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mwildehahn Thanks for doing the OpenAI Responses support, that's a nice quality of life detail. Just 2 points on docs and then we'll merge!
| _(This example is complete, it can be run "as is")_ | ||
|
|
||
| OpenAI Responses models also respect the `aspect_ratio` parameter. Because the OpenAI API only exposes discrete image sizes, | ||
| PydanticAI maps `'1:1'` -> `1024x1024`, `'2:3'` -> `1024x1536`, and `'3:2'` -> `1536x1024`. Providing any other aspect ratio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| PydanticAI maps `'1:1'` -> `1024x1024`, `'2:3'` -> `1024x1536`, and `'3:2'` -> `1536x1024`. Providing any other aspect ratio | |
| Pydantic AI maps `'1:1'` -> `1024x1024`, `'2:3'` -> `1024x1536`, and `'3:2'` -> `1536x1024`. Providing any other aspect ratio |
| | `partial_images` | ✅ | ❌ | | ||
| | `quality` | ✅ | ❌ | | ||
| | `size` | ✅ | ❌ | | ||
| | `aspect_ratio` | ✅ (1:1, 2:3, 3:2) | ✅ | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | `aspect_ratio` | ✅ (1:1, 2:3, 3:2) | ✅ | | |
| | `aspect_ratio` | ✅ (`1:1`, `2:3`, `3:2`) | ✅ | |
Ideally we'd use a different emoji for partial support; any ideas? 😄
Fix for #3119