Skip to content

Commit 0abe4f8

Browse files
eavanvalkenburgalliscode
authored andcommitted
Python: ADR for simplified get response (microsoft#3098)
* ADR for simplified get response * updated some language, added agent option and code comparison * small update in sample * added workflows and expanded some points * changed decision and number * updated with stream=False default
1 parent 808f413 commit 0abe4f8

File tree

1 file changed

+258
-0
lines changed

1 file changed

+258
-0
lines changed
Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
---
2+
status: Accepted
3+
contact: eavanvalkenburg
4+
date: 2026-01-06
5+
deciders: markwallace-microsoft, dmytrostruk, taochenosu, alliscode, moonbox3, sphenry
6+
consulted: sergeymenshykh, rbarreto, dmytrostruk, westey-m
7+
informed:
8+
---
9+
10+
# Simplify Python Get Response API into a single method
11+
12+
## Context and Problem Statement
13+
14+
Currently chat clients must implement two separate methods to get responses, one for streaming and one for non-streaming. This adds complexity to the client implementations and increases the maintenance burden. This was likely done because the .NET version cannot do proper typing with a single method, in Python this is possible and this for instance is also how the OpenAI python client works, this would then also make it simpler to work with the Python version because there is only one method to learn about instead of two.
15+
16+
## Implications of this change
17+
18+
### Current Architecture Overview
19+
20+
The current design has **two separate methods** at each layer:
21+
22+
| Layer | Non-streaming | Streaming |
23+
|-------|---------------|-----------|
24+
| **Protocol** | `get_response()``ChatResponse` | `get_streaming_response()``AsyncIterable[ChatResponseUpdate]` |
25+
| **BaseChatClient** | `get_response()` (public) | `get_streaming_response()` (public) |
26+
| **Implementation** | `_inner_get_response()` (private) | `_inner_get_streaming_response()` (private) |
27+
28+
### Key Usage Areas Identified
29+
30+
#### 1. **ChatAgent** (_agents.py)
31+
- `run()` → calls `self.chat_client.get_response()`
32+
- `run_stream()` → calls `self.chat_client.get_streaming_response()`
33+
34+
These are parallel methods on the agent, so consolidating the client methods would **not break** the agent API. You could keep `agent.run()` and `agent.run_stream()` unchanged while internally calling `get_response(stream=True/False)`.
35+
36+
#### 2. **Function Invocation Decorator** (_tools.py)
37+
This is **the most impacted area**. Currently:
38+
- `_handle_function_calls_response()` decorates `get_response`
39+
- `_handle_function_calls_streaming_response()` decorates `get_streaming_response`
40+
- The `use_function_invocation` class decorator wraps **both methods separately**
41+
42+
**Impact**: The decorator logic is almost identical (~200 lines each) with small differences:
43+
- Non-streaming collects response, returns it
44+
- Streaming yields updates, returns async iterable
45+
46+
With a unified method, you'd need **one decorator** that:
47+
- Checks the `stream` parameter
48+
- Uses `@overload` to determine return type
49+
- Handles both paths with conditional logic
50+
- The new decorator could be applied just on the method, instead of the whole class.
51+
52+
This would **reduce code duplication** but add complexity to a single function.
53+
54+
#### 3. **Observability/Instrumentation** (observability.py)
55+
Same pattern as function invocation:
56+
- `_trace_get_response()` wraps `get_response`
57+
- `_trace_get_streaming_response()` wraps `get_streaming_response`
58+
- `use_instrumentation` decorator applies both
59+
60+
**Impact**: Would need consolidation into a single tracing wrapper.
61+
62+
#### 4. **Chat Middleware** (_middleware.py)
63+
The `use_chat_middleware` decorator also wraps both methods separately with similar logic.
64+
65+
#### 5. **AG-UI Client** (_client.py)
66+
Wraps both methods to unwrap server function calls:
67+
```python
68+
original_get_streaming_response = chat_client.get_streaming_response
69+
original_get_response = chat_client.get_response
70+
```
71+
72+
#### 6. **Provider Implementations** (all subpackages)
73+
All subclasses implement both `_inner_*` methods, except:
74+
- OpenAI Assistants Client (and similar clients, such as Foundry Agents V1) - it implements `_inner_get_response` by calling `_inner_get_streaming_response`
75+
76+
### Implications of Consolidation
77+
78+
| Aspect | Impact |
79+
|--------|--------|
80+
| **Type Safety** | Overloads work well: `@overload` with `Literal[True]``AsyncIterable`, `Literal[False]``ChatResponse`. Runtime return type based on `stream` param. |
81+
| **Breaking Change** | **Major breaking change** for anyone implementing custom chat clients. They'd need to update from 2 methods to 1 (or 2 inner methods to 1). |
82+
| **Decorator Complexity** | All 3 decorator systems (function invocation, middleware, observability) would need refactoring to handle both paths in one wrapper. |
83+
| **Code Reduction** | Significant reduction in _tools.py (~200 lines of near-duplicate code) and other decorators. |
84+
| **Samples/Tests** | Many samples call `get_streaming_response()` directly - would need updates. |
85+
| **Protocol Simplification** | `ChatClientProtocol` goes from 2 methods + 1 property to 1 method + 1 property. |
86+
87+
### Recommendation
88+
89+
The consolidation makes sense architecturally, but consider:
90+
91+
1. **The overload pattern with `stream: bool`** works well in Python typing:
92+
```python
93+
@overload
94+
async def get_response(self, messages, *, stream: Literal[True] = True, ...) -> AsyncIterable[ChatResponseUpdate]: ...
95+
@overload
96+
async def get_response(self, messages, *, stream: Literal[False] = False, ...) -> ChatResponse: ...
97+
```
98+
99+
2. **The decorator complexity** is the biggest concern. The current approach of separate decorators for separate methods is cleaner than conditional logic inside one wrapper.
100+
101+
## Decision Drivers
102+
103+
- Reduce code needed to implement a Chat Client, simplify the public API for chat clients
104+
- Reduce code duplication in decorators and middleware
105+
- Maintain type safety and clarity in method signatures
106+
107+
## Considered Options
108+
109+
1. Status quo: Keep separate methods for streaming and non-streaming
110+
2. Consolidate into a single `get_response` method with a `stream` parameter
111+
3. Option 2 plus merging `agent.run` and `agent.run_stream` into a single method with a `stream` parameter as well
112+
113+
## Option 1: Status Quo
114+
- Good: Clear separation of streaming vs non-streaming logic
115+
- Good: Aligned with .NET design, although it is already `run` for Python and `RunAsync` for .NET
116+
- Bad: Code duplication in decorators and middleware
117+
- Bad: More complex client implementations
118+
119+
## Option 2: Consolidate into Single Method
120+
- Good: Simplified public API for chat clients
121+
- Good: Reduced code duplication in decorators
122+
- Good: Smaller API footprint for users to get familiar with
123+
- Good: People using OpenAI directly already expect this pattern
124+
- Bad: Increased complexity in decorators and middleware
125+
- Bad: Less alignment with .NET design (`get_response(stream=True)` vs `GetStreamingResponseAsync`)
126+
127+
## Option 3: Consolidate + Merge Agent and Workflow Methods
128+
- Good: Further simplifies agent and workflow implementation
129+
- Good: Single method for all chat interactions
130+
- Good: Smaller API footprint for users to get familiar with
131+
- Good: People using OpenAI directly already expect this pattern
132+
- Good: Workflows internally already use a single method (_run_workflow_with_tracing), so would eliminate public API duplication as well, with hardly any code changes
133+
- Bad: More breaking changes for agent users
134+
- Bad: Increased complexity in agent implementation
135+
- Bad: More extensive misalignment with .NET design (`run(stream=True)` vs `RunStreamingAsync` in addition to `get_response` change)
136+
137+
## Misc
138+
139+
Smaller questions to consider:
140+
- Should default be `stream=False` or `stream=True`? (Current is False)
141+
- Default to `False` makes it simpler for new users, as non-streaming is easier to handle.
142+
- Default to `False` aligns with existing behavior.
143+
- Streaming tends to be faster, so defaulting to `True` could improve performance for common use cases.
144+
- Should this differ between ChatClient, Agent and Workflows? (e.g., Agent and Workflow defaults to streaming, ChatClient to non-streaming)
145+
146+
## Decision Outcome
147+
148+
Chosen Option: **Option 3: Consolidate + Merge Agent and Workflow Methods**
149+
150+
Since this is the most pythonic option and it reduces the API surface and code duplication the most, we will go with this option.
151+
We will keep the default of `stream=False` for all methods to maintain backward compatibility and simplicity for new users.
152+
153+
# Appendix
154+
## Code Samples for Consolidated Method
155+
156+
### Python - Option 3: Direct ChatClient + Agent with Single Method
157+
158+
```python
159+
# Copyright (c) Microsoft. All rights reserved.
160+
161+
import asyncio
162+
from random import randint
163+
from typing import Annotated
164+
165+
from agent_framework import ChatAgent
166+
from agent_framework.openai import OpenAIChatClient
167+
from pydantic import Field
168+
169+
170+
def get_weather(
171+
location: Annotated[str, Field(description="The location to get the weather for.")],
172+
) -> str:
173+
"""Get the weather for a given location."""
174+
conditions = ["sunny", "cloudy", "rainy", "stormy"]
175+
return f"The weather in {location} is {conditions[randint(0, 3)]} with a high of {randint(10, 30)}°C."
176+
177+
178+
async def main() -> None:
179+
# Example 1: Direct ChatClient usage with single method
180+
client = OpenAIChatClient()
181+
message = "What's the weather in Amsterdam and in Paris?"
182+
183+
# Non-streaming usage
184+
print(f"User: {message}")
185+
response = await client.get_response(message, tools=get_weather)
186+
print(f"Assistant: {response.text}")
187+
188+
# Streaming usage - same method, different parameter
189+
print(f"\nUser: {message}")
190+
print("Assistant: ", end="")
191+
async for chunk in client.get_response(message, tools=get_weather, stream=True):
192+
if chunk.text:
193+
print(chunk.text, end="")
194+
print("")
195+
196+
# Example 2: Agent usage with single method
197+
agent = ChatAgent(
198+
chat_client=client,
199+
tools=get_weather,
200+
name="WeatherAgent",
201+
instructions="You are a weather assistant.",
202+
)
203+
thread = agent.get_new_thread()
204+
205+
# Non-streaming agent
206+
print(f"\nUser: {message}")
207+
result = await agent.run(message, thread=thread) # default would be stream=False
208+
print(f"{agent.name}: {result.text}")
209+
210+
# Streaming agent - same method, different parameter
211+
print(f"\nUser: {message}")
212+
print(f"{agent.name}: ", end="")
213+
async for update in agent.run(message, thread=thread, stream=True):
214+
if update.text:
215+
print(update.text, end="")
216+
print("")
217+
218+
219+
if __name__ == "__main__":
220+
asyncio.run(main())
221+
```
222+
223+
### .NET - Current pattern for comparison
224+
225+
```csharp
226+
// Copyright (c) Microsoft. All rights reserved.
227+
228+
using Azure.AI.OpenAI;
229+
using Azure.Identity;
230+
using Microsoft.Agents.AI;
231+
using OpenAI.Chat;
232+
233+
var endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")
234+
?? throw new InvalidOperationException("AZURE_OPENAI_ENDPOINT is not set.");
235+
var deploymentName = Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT_NAME") ?? "gpt-4o-mini";
236+
237+
AIAgent agent = new AzureOpenAIClient(
238+
new Uri(endpoint),
239+
new AzureCliCredential())
240+
.GetChatClient(deploymentName)
241+
.CreateAIAgent(
242+
instructions: "You are good at telling jokes about pirates.",
243+
name: "PirateJoker");
244+
245+
// Non-streaming: Returns a string directly
246+
Console.WriteLine("=== Non-streaming ===");
247+
string result = await agent.RunAsync("Tell me a joke about a pirate.");
248+
Console.WriteLine(result);
249+
250+
// Streaming: Returns IAsyncEnumerable<AgentUpdate>
251+
Console.WriteLine("\n=== Streaming ===");
252+
await foreach (AgentUpdate update in agent.RunStreamingAsync("Tell me a joke about a pirate."))
253+
{
254+
Console.Write(update);
255+
}
256+
Console.WriteLine();
257+
258+
```

0 commit comments

Comments
 (0)