Skip to content

Commit b16f403

Browse files
ADR for simplified get response
1 parent db283cd commit b16f403

File tree

1 file changed

+140
-0
lines changed

1 file changed

+140
-0
lines changed
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
---
2+
status: Proposed
3+
contact: eavanvalkenburg
4+
date: 2026-01-06
5+
deciders: markwallace-microsoft, dmytrostruk, taochenosu, alliscode, moonbox3
6+
consulted: sergeymenshykh, rbarreto, dmytrostruk, westey-m
7+
informed:
8+
---
9+
10+
# Simplify Python Get Response API into a single method
11+
12+
## Context and Problem Statement
13+
14+
Currently chat clients must implement two separate methods to get responses, one for streaming and one for non-streaming. This adds complexity to the client implementations and increases the maintenance burden. This was likely done because the dotnet version cannot do proper typing with a single method, in Python this is possible and this for instance is also how the OpenAI python client works.
15+
16+
## Implications of this change
17+
18+
### Current Architecture Overview
19+
20+
The current design has **two separate methods** at each layer:
21+
22+
| Layer | Non-streaming | Streaming |
23+
|-------|---------------|-----------|
24+
| **Protocol** | `get_response()``ChatResponse` | `get_streaming_response()``AsyncIterable[ChatResponseUpdate]` |
25+
| **BaseChatClient** | `get_response()` (public) | `get_streaming_response()` (public) |
26+
| **Implementation** | `_inner_get_response()` (private) | `_inner_get_streaming_response()` (private) |
27+
28+
### Key Usage Areas Identified
29+
30+
#### 1. **ChatAgent** (_agents.py and _agents.py)
31+
- `run()` → calls `self.chat_client.get_response()`
32+
- `run_stream()` → calls `self.chat_client.get_streaming_response()`
33+
34+
These are parallel methods on the agent, so consolidating the client methods would **not break** the agent API. You could keep `agent.run()` and `agent.run_stream()` unchanged while internally calling `get_response(stream=True/False)`.
35+
36+
#### 2. **Function Invocation Decorator** (_tools.py)
37+
This is **the most impacted area**. Currently:
38+
- `_handle_function_calls_response()` decorates `get_response`
39+
- `_handle_function_calls_streaming_response()` decorates `get_streaming_response`
40+
- The `use_function_invocation` class decorator wraps **both methods separately**
41+
42+
**Impact**: The decorator logic is almost identical (~200 lines each) with small differences:
43+
- Non-streaming collects response, returns it
44+
- Streaming yields updates, returns async iterable
45+
46+
With a unified method, you'd need **one decorator** that:
47+
- Checks the `stream` parameter
48+
- Uses `@overload` to determine return type
49+
- Handles both paths with conditional logic
50+
- The new decorator could be applied just on the method, instead of the whole class.
51+
52+
This would **reduce code duplication** but add complexity to a single function.
53+
54+
#### 3. **Observability/Instrumentation** (observability.py)
55+
Same pattern as function invocation:
56+
- `_trace_get_response()` wraps `get_response`
57+
- `_trace_get_streaming_response()` wraps `get_streaming_response`
58+
- `use_instrumentation` decorator applies both
59+
60+
**Impact**: Would need consolidation into a single tracing wrapper.
61+
62+
#### 4. **Chat Middleware** (_middleware.py)
63+
The `use_chat_middleware` decorator also wraps both methods separately with similar logic.
64+
65+
#### 5. **AG-UI Client** (_client.py)
66+
Wraps both methods to unwrap server function calls:
67+
```python
68+
original_get_streaming_response = chat_client.get_streaming_response
69+
original_get_response = chat_client.get_response
70+
```
71+
72+
#### 6. **Provider Implementations** (all subpackages)
73+
All subclasses implement both `_inner_*` methods, except:
74+
- OpenAI Assistants Client - it implements `_inner_get_response` by calling `_inner_get_streaming_response`
75+
76+
### Implications of Consolidation
77+
78+
| Aspect | Impact |
79+
|--------|--------|
80+
| **Type Safety** | Overloads work well: `@overload` with `Literal[True]``AsyncIterable`, `Literal[False]``ChatResponse`. Runtime return type based on `stream` param. |
81+
| **Breaking Change** | **Major breaking change** for anyone implementing custom chat clients. They'd need to update from 2 methods to 1 (or 2 inner methods to 1). |
82+
| **Decorator Complexity** | All 3 decorator systems (function invocation, middleware, observability) would need refactoring to handle both paths in one wrapper. |
83+
| **Code Reduction** | Significant reduction in _tools.py (~200 lines of near-duplicate code) and other decorators. |
84+
| **Samples/Tests** | Many samples call `get_streaming_response()` directly - would need updates. |
85+
| **Protocol Simplification** | `ChatClientProtocol` goes from 2 methods + 1 property to 1 method + 1 property. |
86+
87+
### Suggested Migration Path
88+
89+
1. **Keep public `get_streaming_response` as an alias** for backward compatibility:
90+
```python
91+
def get_streaming_response(self, messages, **kwargs):
92+
return self.get_response(messages, stream=True, **kwargs)
93+
```
94+
95+
2. **Consolidate inner methods first**: Have `_inner_get_response(stream=True/False)` in `BaseChatClient`, allowing subclasses to gradually migrate.
96+
97+
3. **Update decorators** to handle the unified method with conditional streaming logic.
98+
99+
4. **Deprecation warnings** on direct `get_streaming_response()` calls.
100+
101+
### Recommendation
102+
103+
The consolidation makes sense architecturally, but consider:
104+
105+
1. **The overload pattern with `stream: bool`** works well in Python typing:
106+
```python
107+
@overload
108+
async def get_response(self, messages, *, stream: Literal[True], ...) -> AsyncIterable[ChatResponseUpdate]: ...
109+
@overload
110+
async def get_response(self, messages, *, stream: Literal[False] = False, ...) -> ChatResponse: ...
111+
```
112+
113+
2. **The decorator complexity** is the biggest concern. The current approach of separate decorators for separate methods is cleaner than conditional logic inside one wrapper.
114+
115+
3. **Backward compatibility** should be maintained with deprecation, not immediate removal.
116+
117+
## Decision Drivers
118+
119+
- Reduce code needed to implement a Chat Client, simplify the public API for chat clients
120+
- Reduce code duplication in decorators and middleware
121+
- Maintain type safety and clarity in method signatures
122+
123+
## Considered Options
124+
125+
1. Status quo: Keep separate methods for streaming and non-streaming
126+
1. Consolidate into a single `get_response` method with a `stream` parameter
127+
128+
## Option 1: Status Quo
129+
- Good: Clear separation of streaming vs non-streaming logic
130+
- Good: Aligned with dotnet design
131+
- Bad: Code duplication in decorators and middleware
132+
- Bad: More complex client implementations
133+
134+
## Option 2: Consolidate into Single Method
135+
- Good: Simplified public API for chat clients
136+
- Good: Reduced code duplication in decorators
137+
- Bad: Increased complexity in decorators and middleware
138+
139+
## Decision Outcome
140+
TBD

0 commit comments

Comments
 (0)