|
| 1 | +--- |
| 2 | +status: Proposed |
| 3 | +contact: eavanvalkenburg |
| 4 | +date: 2026-01-06 |
| 5 | +deciders: markwallace-microsoft, dmytrostruk, taochenosu, alliscode, moonbox3 |
| 6 | +consulted: sergeymenshykh, rbarreto, dmytrostruk, westey-m |
| 7 | +informed: |
| 8 | +--- |
| 9 | + |
| 10 | +# Simplify Python Get Response API into a single method |
| 11 | + |
| 12 | +## Context and Problem Statement |
| 13 | + |
| 14 | +Currently chat clients must implement two separate methods to get responses, one for streaming and one for non-streaming. This adds complexity to the client implementations and increases the maintenance burden. This was likely done because the dotnet version cannot do proper typing with a single method, in Python this is possible and this for instance is also how the OpenAI python client works. |
| 15 | + |
| 16 | +## Implications of this change |
| 17 | + |
| 18 | +### Current Architecture Overview |
| 19 | + |
| 20 | +The current design has **two separate methods** at each layer: |
| 21 | + |
| 22 | +| Layer | Non-streaming | Streaming | |
| 23 | +|-------|---------------|-----------| |
| 24 | +| **Protocol** | `get_response()` → `ChatResponse` | `get_streaming_response()` → `AsyncIterable[ChatResponseUpdate]` | |
| 25 | +| **BaseChatClient** | `get_response()` (public) | `get_streaming_response()` (public) | |
| 26 | +| **Implementation** | `_inner_get_response()` (private) | `_inner_get_streaming_response()` (private) | |
| 27 | + |
| 28 | +### Key Usage Areas Identified |
| 29 | + |
| 30 | +#### 1. **ChatAgent** (_agents.py and _agents.py) |
| 31 | +- `run()` → calls `self.chat_client.get_response()` |
| 32 | +- `run_stream()` → calls `self.chat_client.get_streaming_response()` |
| 33 | + |
| 34 | +These are parallel methods on the agent, so consolidating the client methods would **not break** the agent API. You could keep `agent.run()` and `agent.run_stream()` unchanged while internally calling `get_response(stream=True/False)`. |
| 35 | + |
| 36 | +#### 2. **Function Invocation Decorator** (_tools.py) |
| 37 | +This is **the most impacted area**. Currently: |
| 38 | +- `_handle_function_calls_response()` decorates `get_response` |
| 39 | +- `_handle_function_calls_streaming_response()` decorates `get_streaming_response` |
| 40 | +- The `use_function_invocation` class decorator wraps **both methods separately** |
| 41 | + |
| 42 | +**Impact**: The decorator logic is almost identical (~200 lines each) with small differences: |
| 43 | +- Non-streaming collects response, returns it |
| 44 | +- Streaming yields updates, returns async iterable |
| 45 | + |
| 46 | +With a unified method, you'd need **one decorator** that: |
| 47 | +- Checks the `stream` parameter |
| 48 | +- Uses `@overload` to determine return type |
| 49 | +- Handles both paths with conditional logic |
| 50 | +- The new decorator could be applied just on the method, instead of the whole class. |
| 51 | + |
| 52 | +This would **reduce code duplication** but add complexity to a single function. |
| 53 | + |
| 54 | +#### 3. **Observability/Instrumentation** (observability.py) |
| 55 | +Same pattern as function invocation: |
| 56 | +- `_trace_get_response()` wraps `get_response` |
| 57 | +- `_trace_get_streaming_response()` wraps `get_streaming_response` |
| 58 | +- `use_instrumentation` decorator applies both |
| 59 | + |
| 60 | +**Impact**: Would need consolidation into a single tracing wrapper. |
| 61 | + |
| 62 | +#### 4. **Chat Middleware** (_middleware.py) |
| 63 | +The `use_chat_middleware` decorator also wraps both methods separately with similar logic. |
| 64 | + |
| 65 | +#### 5. **AG-UI Client** (_client.py) |
| 66 | +Wraps both methods to unwrap server function calls: |
| 67 | +```python |
| 68 | +original_get_streaming_response = chat_client.get_streaming_response |
| 69 | +original_get_response = chat_client.get_response |
| 70 | +``` |
| 71 | + |
| 72 | +#### 6. **Provider Implementations** (all subpackages) |
| 73 | +All subclasses implement both `_inner_*` methods, except: |
| 74 | +- OpenAI Assistants Client - it implements `_inner_get_response` by calling `_inner_get_streaming_response` |
| 75 | + |
| 76 | +### Implications of Consolidation |
| 77 | + |
| 78 | +| Aspect | Impact | |
| 79 | +|--------|--------| |
| 80 | +| **Type Safety** | Overloads work well: `@overload` with `Literal[True]` → `AsyncIterable`, `Literal[False]` → `ChatResponse`. Runtime return type based on `stream` param. | |
| 81 | +| **Breaking Change** | **Major breaking change** for anyone implementing custom chat clients. They'd need to update from 2 methods to 1 (or 2 inner methods to 1). | |
| 82 | +| **Decorator Complexity** | All 3 decorator systems (function invocation, middleware, observability) would need refactoring to handle both paths in one wrapper. | |
| 83 | +| **Code Reduction** | Significant reduction in _tools.py (~200 lines of near-duplicate code) and other decorators. | |
| 84 | +| **Samples/Tests** | Many samples call `get_streaming_response()` directly - would need updates. | |
| 85 | +| **Protocol Simplification** | `ChatClientProtocol` goes from 2 methods + 1 property to 1 method + 1 property. | |
| 86 | + |
| 87 | +### Suggested Migration Path |
| 88 | + |
| 89 | +1. **Keep public `get_streaming_response` as an alias** for backward compatibility: |
| 90 | + ```python |
| 91 | + def get_streaming_response(self, messages, **kwargs): |
| 92 | + return self.get_response(messages, stream=True, **kwargs) |
| 93 | + ``` |
| 94 | + |
| 95 | +2. **Consolidate inner methods first**: Have `_inner_get_response(stream=True/False)` in `BaseChatClient`, allowing subclasses to gradually migrate. |
| 96 | + |
| 97 | +3. **Update decorators** to handle the unified method with conditional streaming logic. |
| 98 | + |
| 99 | +4. **Deprecation warnings** on direct `get_streaming_response()` calls. |
| 100 | + |
| 101 | +### Recommendation |
| 102 | + |
| 103 | +The consolidation makes sense architecturally, but consider: |
| 104 | + |
| 105 | +1. **The overload pattern with `stream: bool`** works well in Python typing: |
| 106 | + ```python |
| 107 | + @overload |
| 108 | + async def get_response(self, messages, *, stream: Literal[True], ...) -> AsyncIterable[ChatResponseUpdate]: ... |
| 109 | + @overload |
| 110 | + async def get_response(self, messages, *, stream: Literal[False] = False, ...) -> ChatResponse: ... |
| 111 | + ``` |
| 112 | + |
| 113 | +2. **The decorator complexity** is the biggest concern. The current approach of separate decorators for separate methods is cleaner than conditional logic inside one wrapper. |
| 114 | + |
| 115 | +3. **Backward compatibility** should be maintained with deprecation, not immediate removal. |
| 116 | + |
| 117 | +## Decision Drivers |
| 118 | + |
| 119 | +- Reduce code needed to implement a Chat Client, simplify the public API for chat clients |
| 120 | +- Reduce code duplication in decorators and middleware |
| 121 | +- Maintain type safety and clarity in method signatures |
| 122 | + |
| 123 | +## Considered Options |
| 124 | + |
| 125 | +1. Status quo: Keep separate methods for streaming and non-streaming |
| 126 | +1. Consolidate into a single `get_response` method with a `stream` parameter |
| 127 | + |
| 128 | +## Option 1: Status Quo |
| 129 | +- Good: Clear separation of streaming vs non-streaming logic |
| 130 | +- Good: Aligned with dotnet design |
| 131 | +- Bad: Code duplication in decorators and middleware |
| 132 | +- Bad: More complex client implementations |
| 133 | + |
| 134 | +## Option 2: Consolidate into Single Method |
| 135 | +- Good: Simplified public API for chat clients |
| 136 | +- Good: Reduced code duplication in decorators |
| 137 | +- Bad: Increased complexity in decorators and middleware |
| 138 | + |
| 139 | +## Decision Outcome |
| 140 | +TBD |
0 commit comments