Skip to content

Commit 0e7556e

Browse files
ltwlfmoonbox3
andauthored
Python: Add file handling support to BinaryContent for OpenAI Responses API (#12258)
## Summary Enhances `BinaryContent` to support file handling for OpenAI Responses API, enabling file uploads through the responses agent while maintaining a provider-agnostic design. ## Changes ### BinaryContent Enhancements - **Add `can_read` property**: Indicates whether content has readable data available - **Add `from_file()` class method**: Creates BinaryContent instances from file paths with automatic base64 encoding - **Fix Unicode handling**: Prevents decode errors when processing binary files (PDFs, images, etc.) ### OpenAI Responses Agent Integration - **Add BinaryContent support**: Pattern matching case for file handling in `responses_agent_thread_actions.py` - **Correct OpenAI API format**: Uses proper `filename` and `file_data` structure with data URI format - **UUID-based filenames**: Generates appropriate filenames with mime-type extensions - **Provider-specific mapping**: File format conversion happens only in OpenAI agent code ### Testing - **Test coverage**: New tests for `can_read`, `from_file()`, and binary data handling - **Unicode error prevention**: Specific test for binary PDF-like content - **Base64 encoding verification**: Ensures proper data format for API compatibility ## Design Principles - **Provider-agnostic**: BinaryContent remains completely generic with no OpenAI-specific dependencies - **Clean separation**: OpenAI format mapping isolated to OpenAI agent files - **No FileContent class**: Enhances existing BinaryContent instead of introducing new types - **Follows existing patterns**: Similar approach to ImageContent and TextContent handling ## Usage Example ```python response = await self.agent.get_response( messages=ChatMessageContent( content="Analyse PDF", role=AuthorRole("user"), items=[BinaryContent.from_file(file_path=c":/test.pdf")], ) ) Co-authored-by: Evan Mattson <[email protected]>
1 parent 32d7fa1 commit 0e7556e

File tree

5 files changed

+378
-0
lines changed

5 files changed

+378
-0
lines changed

python/samples/concepts/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@
9595
- [OpenAI Responses Agent Declarative File Search](./agents/openai_responses/openai_responses_agent_declarative_file_search.py)
9696
- [OpenAI Responses Agent Declarative Function Calling From File](./agents/openai_responses/openai_responses_agent_declarative_function_calling_from_file.py)
9797
- [OpenAI Responses Agent Declarative Web Search](./agents/openai_responses/openai_responses_agent_declarative_web_search.py)
98+
- [OpenAI Responses Binary Content Upload](./agents/openai_responses/responses_agent_binary_content_upload.py)
9899
- [OpenAI Responses Message Callback Streaming](./agents/openai_responses/responses_agent_message_callback_streaming.py)
99100
- [OpenAI Responses Message Callback](./agents/openai_responses/responses_agent_message_callback.py)
100101
- [OpenAI Responses File Search Streaming](./agents/openai_responses/responses_agent_file_search_streaming.py)
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# Copyright (c) Microsoft. All rights reserved.
2+
import asyncio
3+
import os
4+
import tempfile
5+
6+
from semantic_kernel.agents import OpenAIResponsesAgent
7+
from semantic_kernel.connectors.ai.open_ai import OpenAISettings
8+
from semantic_kernel.contents.binary_content import BinaryContent
9+
from semantic_kernel.contents.chat_message_content import ChatMessageContent
10+
from semantic_kernel.contents.text_content import TextContent
11+
from semantic_kernel.contents.utils.author_role import AuthorRole
12+
13+
"""
14+
The following sample demonstrates how to upload PDF and text files using BinaryContent
15+
with an OpenAI Responses Agent. This shows how to create BinaryContent objects from files
16+
and compose multi-modal messages that combine text and binary content.
17+
18+
The sample demonstrates:
19+
1. Creating BinaryContent from a PDF file
20+
2. Creating BinaryContent from a text file
21+
3. Composing multi-modal messages with mixed content types (text + binary)
22+
4. Sending complex messages directly to the agent via the messages parameter
23+
5. Having the agent process and respond to questions about the uploaded files
24+
25+
This approach differs from simple string-based interactions by showing how to combine
26+
multiple content types within a single message, which is useful for rich media interactions.
27+
28+
Note: This sample uses the existing employees.pdf file from the resources directory.
29+
"""
30+
31+
# Sample follow-up questions to demonstrate continued conversation
32+
USER_INPUTS = [
33+
"What specific types of files did I just upload?",
34+
"Can you tell me about the content in the PDF file?",
35+
"What does the text file contain?",
36+
"Can you provide a summary of both documents?",
37+
]
38+
39+
40+
def create_sample_text_content() -> str:
41+
"""Create sample text content for demonstration purposes.
42+
43+
Returns:
44+
str: A sample company policy document in text format.
45+
"""
46+
return """Company Policy Document - Remote Work Guidelines
47+
48+
This document outlines our company's remote work policies and procedures.
49+
50+
Remote Work Eligibility:
51+
- Full-time employees with at least 6 months tenure
52+
- Managers approval required
53+
- Home office setup must meet security requirements
54+
55+
Work Schedule:
56+
- Core hours: 10 AM - 3 PM local time
57+
- Flexible start/end times outside core hours
58+
- Maximum 3 remote days per week for hybrid roles
59+
60+
Communication Requirements:
61+
- Daily check-ins with team lead
62+
- Weekly video conference participation
63+
- Response time: within 4 hours during business hours
64+
65+
Equipment and Security:
66+
- Company-provided laptop and VPN access
67+
- Secure Wi-Fi connection required
68+
- No public Wi-Fi for work activities
69+
70+
For questions about remote work policies, contact HR at [email protected]
71+
"""
72+
73+
74+
async def main():
75+
# 1. Initialize the OpenAI client
76+
client = OpenAIResponsesAgent.create_client()
77+
78+
# 2. Prepare file paths and create sample content
79+
pdf_file_path = os.path.join(
80+
os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__)))),
81+
"resources",
82+
"file_search",
83+
"employees.pdf",
84+
)
85+
86+
# Create a temporary text file for demonstration purposes
87+
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) as text_file:
88+
text_content = create_sample_text_content()
89+
text_file.write(text_content)
90+
text_file_path = text_file.name
91+
92+
try:
93+
# 3. Create BinaryContent objects from files using different methods
94+
print("Creating BinaryContent from files...")
95+
96+
# Method 1: Create BinaryContent from an existing PDF file
97+
pdf_binary_content = BinaryContent.from_file(file_path=pdf_file_path, mime_type="application/pdf")
98+
print(f"Created PDF BinaryContent: {pdf_binary_content.mime_type}, can_read: {pdf_binary_content.can_read}")
99+
100+
# Method 2: Create BinaryContent from the temporary text file
101+
text_binary_content = BinaryContent.from_file(file_path=text_file_path, mime_type="text/plain")
102+
print(f"Created text BinaryContent: {text_binary_content.mime_type}, can_read: {text_binary_content.can_read}")
103+
104+
# Method 3: Create BinaryContent directly from in-memory data
105+
# This approach allows creating BinaryContent without file I/O operations
106+
alternative_text_content = BinaryContent(
107+
data=text_content.encode("utf-8"), mime_type="text/plain", data_format="base64"
108+
)
109+
print(f"Alternative text BinaryContent: {alternative_text_content.mime_type}")
110+
111+
# 4. Initialize the OpenAI Responses Agent with file analysis capabilities
112+
# Configure the AI model for responses
113+
settings = OpenAISettings()
114+
responses_model = settings.responses_model_id or "gpt-4o"
115+
116+
agent = OpenAIResponsesAgent(
117+
ai_model_id=responses_model,
118+
client=client,
119+
instructions=(
120+
"You are a helpful assistant that can analyze uploaded files. "
121+
"When users upload files, examine their content and provide helpful insights. "
122+
"You can identify file types, summarize content, and answer questions about the files."
123+
),
124+
name="FileAnalyzer",
125+
)
126+
127+
# 5. Demonstrate multi-modal message composition
128+
# This showcases combining text and binary content in a single message
129+
130+
# Compose a message containing both text instructions and file attachments
131+
# This pattern is ideal for scenarios requiring rich, mixed-content interactions
132+
initial_message = ChatMessageContent(
133+
role=AuthorRole.USER,
134+
items=[
135+
TextContent(text="I'm uploading a PDF document and a text file for you to analyze."),
136+
pdf_binary_content,
137+
text_binary_content,
138+
],
139+
)
140+
141+
# 6. Conduct a conversation with the agent about the uploaded files
142+
thread = None
143+
144+
# Send the initial multi-modal message containing file uploads
145+
print("\n# User: 'I'm uploading a PDF document and a text file for you to analyze.'")
146+
first_chunk = True
147+
async for response in agent.invoke_stream(messages=initial_message, thread=thread):
148+
thread = response.thread
149+
if first_chunk:
150+
print(f"# {response.name}: ", end="", flush=True)
151+
first_chunk = False
152+
print(response.content, end="", flush=True)
153+
print() # New line after response
154+
155+
# Continue the conversation with text-based follow-up questions
156+
for user_input in USER_INPUTS:
157+
print(f"\n# User: '{user_input}'")
158+
159+
# Process follow-up questions using standard text input
160+
first_chunk = True
161+
async for response in agent.invoke_stream(messages=user_input, thread=thread):
162+
thread = response.thread
163+
if first_chunk:
164+
print(f"# {response.name}: ", end="", flush=True)
165+
first_chunk = False
166+
print(response.content, end="", flush=True)
167+
print() # New line after response
168+
169+
finally:
170+
# 7. Clean up temporary resources
171+
if os.path.exists(text_file_path):
172+
os.unlink(text_file_path)
173+
174+
print("\n" + "=" * 60)
175+
print("Sample completed!")
176+
print("\nKey points about BinaryContent:")
177+
print("1. Use BinaryContent.from_file() to create from existing files")
178+
print("2. Use BinaryContent(data=...) to create from bytes/string data")
179+
print("3. Specify appropriate mime_type for proper handling")
180+
print("4. BinaryContent can be included in chat messages alongside text")
181+
print("5. The OpenAI Responses API will process supported file types")
182+
print("\nSupported file types include:")
183+
print("- PDF documents (application/pdf)")
184+
print("- Text files (text/plain)")
185+
186+
187+
if __name__ == "__main__":
188+
asyncio.run(main())

python/semantic_kernel/agents/open_ai/responses_agent_thread_actions.py

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
import asyncio
44
import logging
5+
import uuid
56
from collections.abc import AsyncIterable, Sequence
67
from functools import reduce
78
from typing import TYPE_CHECKING, Any, ClassVar, Literal, TypeVar, cast
@@ -31,6 +32,7 @@
3132
from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior
3233
from semantic_kernel.connectors.ai.open_ai.exceptions.content_filter_ai_exception import ContentFilterAIException
3334
from semantic_kernel.contents.annotation_content import AnnotationContent
35+
from semantic_kernel.contents.binary_content import BinaryContent
3436
from semantic_kernel.contents.chat_history import ChatHistory
3537
from semantic_kernel.contents.chat_message_content import CMC_ITEM_TYPES, ChatMessageContent
3638
from semantic_kernel.contents.function_call_content import FunctionCallContent
@@ -721,6 +723,51 @@ def _prepare_chat_history_for_request(
721723
"call_id": content.call_id,
722724
}
723725
response_inputs.append(rfrc_dict)
726+
case BinaryContent() if content.can_read:
727+
# Generate filename with appropriate extension based on mime type
728+
extension = ""
729+
if content.mime_type == "application/pdf":
730+
extension = ".pdf"
731+
elif content.mime_type.startswith("text/"):
732+
extension = ".txt"
733+
elif content.mime_type.startswith("image/"):
734+
# For image content, warn that ImageContent class should be used instead
735+
logger.warning(
736+
f"Using BinaryContent for image type '{content.mime_type}'. "
737+
"Use ImageContent for handling of images."
738+
)
739+
extension = f".{content.mime_type.split('/')[-1]}"
740+
elif content.mime_type.startswith("audio/"):
741+
# For audio content, warn that AudioContent class should be used instead
742+
logger.warning(
743+
f"Use BinaryContent for audio type '{content.mime_type}'. "
744+
"Use AudioContent for handling of audio."
745+
)
746+
extension = f".{content.mime_type.split('/')[-1]}"
747+
else:
748+
# For other binary types, use generic extension based on MIME type
749+
# or fallback to .bin for application/octet-stream
750+
mime_subtype = (
751+
content.mime_type.split("/")[-1]
752+
if "/" in content.mime_type
753+
else "application/octet-stream"
754+
)
755+
extension = f".{mime_subtype}"
756+
logger.warning(
757+
f"Using binary content with mime type '{content.mime_type}' "
758+
f"which may not be supported by the OpenAI Responses API"
759+
)
760+
761+
filename = f"{uuid.uuid4()}{extension}"
762+
763+
# Format according to OpenAI Responses API specification
764+
file_data_uri = f"data:{content.mime_type};base64,{content.data_string}"
765+
contents.append({
766+
"type": "input_file",
767+
"filename": filename,
768+
"file_data": file_data_uri,
769+
})
770+
response_inputs.append({"role": original_role, "content": contents})
724771

725772
return response_inputs
726773

python/semantic_kernel/contents/binary_content.py

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,53 @@ def mime_type(self, value: str):
165165
if self._data_uri:
166166
self._data_uri.mime_type = value
167167

168+
@property
169+
def can_read(self) -> bool:
170+
"""Get whether the content can be read.
171+
172+
Returns True if the content has data available for reading.
173+
"""
174+
return self._data_uri is not None
175+
176+
@classmethod
177+
def from_file(
178+
cls: type[_T],
179+
file_path: str | Path,
180+
mime_type: str | None = None,
181+
) -> _T:
182+
"""Create BinaryContent from a file.
183+
184+
Args:
185+
file_path: Path to the file to read
186+
mime_type: MIME type of the file content
187+
188+
Returns:
189+
BinaryContent instance with file data
190+
191+
Raises:
192+
FileNotFoundError: If the file doesn't exist
193+
ContentInitializationError: If the path is not a file
194+
"""
195+
from semantic_kernel.exceptions.content_exceptions import ContentInitializationError
196+
197+
path = Path(file_path)
198+
199+
if not path.exists():
200+
raise FileNotFoundError(f"File not found: {file_path}")
201+
202+
if not path.is_file():
203+
raise ContentInitializationError(f"Path is not a file: {file_path}")
204+
205+
# Read file as binary data to handle all file types properly
206+
data = path.read_bytes()
207+
208+
return cls(
209+
data=data,
210+
mime_type=mime_type,
211+
uri=str(path),
212+
data_format="base64",
213+
)
214+
168215
def __str__(self) -> str:
169216
"""Return the string representation of the content."""
170217
return self.data_uri if self._data_uri else str(self.uri)

0 commit comments

Comments
 (0)