Fix multimodal DataContent support in OpenAI assistants client #670
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The OpenAI assistants client was not handling
DataContentfor multimodal inputs (text + images), causing failures when attempting to process base64-encoded images. The client only supportedUriContentfor images, but notDataContent, which is commonly used for uploaded or embedded image data.Root Cause
In
_assistants_client.py, the content processing logic only checked forUriContent:But it was missing support for
DataContent, which also represents image data (as base64 data URIs).Solution
Updated the content handling to process both
DataContentandUriContentfor images, following the same pattern already implemented in the foundry client:Changes
DataContentimport to the OpenAI assistants clientDataContentandUriContentfor imagestest_openai_assistants_client_create_run_options_with_data_contentto verify the fixTesting
DataContentwith base64 image data is correctly converted toimage_urlformatUriContent(web URLs) andDataContent(base64) now work for multimodal inputsExample Usage
After this fix, both of these patterns now work:
This resolves the multimodal input issue described in the problem statement, enabling agents created from clients to properly handle both text and image inputs.
This pull request was created as a result of the following prompt from Copilot chat.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.