Please read this first
Describe the bug
Chat Completions streaming can emit duplicate output_index values for multiple function calls that never enter the real-time streaming-start path.
In src/agents/models/chatcmpl_stream_handler.py, the fallback branch for non-streamed function calls recomputes fallback_starting_index for each call. It offsets text/refusal/reasoning items and already-started streaming function calls, but it does not offset prior fallback function calls emitted in the same loop. As a result, two fallback function calls can both emit response.output_item.added, response.function_call_arguments.delta, and response.output_item.done with the same output_index.
This makes stream consumers that reconcile items by output_index unable to distinguish fallback tool calls reliably.
Debug information
- Agents SDK version:
main at 3854c124cb8e3e51fb660f5714405ee39ee86c5e
- Python version: Python 3.12
Repro steps
Minimal reproducer:
from collections.abc import AsyncIterator
import pytest
from openai.types.chat.chat_completion_chunk import (
ChatCompletionChunk,
Choice,
ChoiceDelta,
ChoiceDeltaToolCall,
ChoiceDeltaToolCallFunction,
)
from openai.types.completion_usage import CompletionUsage
from openai.types.responses import Response
from agents.model_settings import ModelSettings
from agents.models.interface import ModelTracing
from agents.models.openai_chatcompletions import OpenAIChatCompletionsModel
from agents.models.openai_provider import OpenAIProvider
@pytest.mark.asyncio
async def test_fallback_tool_call_indexes(monkeypatch):
chunks = [
ChatCompletionChunk(
id="chunk-id",
created=1,
model="fake",
object="chat.completion.chunk",
choices=[
Choice(
index=0,
delta=ChoiceDelta(
tool_calls=[
ChoiceDeltaToolCall(
index=0,
function=ChoiceDeltaToolCallFunction(
name="first_tool",
arguments='{"a": 1}',
),
type="function",
)
]
),
)
],
),
ChatCompletionChunk(
id="chunk-id",
created=1,
model="fake",
object="chat.completion.chunk",
choices=[
Choice(
index=0,
delta=ChoiceDelta(
tool_calls=[
ChoiceDeltaToolCall(
index=1,
function=ChoiceDeltaToolCallFunction(
name="second_tool",
arguments='{"b": 2}',
),
type="function",
)
]
),
)
],
usage=CompletionUsage(completion_tokens=1, prompt_tokens=1, total_tokens=2),
),
]
async def fake_stream() -> AsyncIterator[ChatCompletionChunk]:
for chunk in chunks:
yield chunk
async def patched_fetch_response(self, *args, **kwargs):
response = Response(
id="resp-id",
created_at=0,
model="fake-model",
object="response",
output=[],
tool_choice="none",
tools=[],
parallel_tool_calls=False,
)
return response, fake_stream()
monkeypatch.setattr(OpenAIChatCompletionsModel, "_fetch_response", patched_fetch_response)
model = OpenAIProvider(use_responses=False).get_model("gpt-4")
events = []
async for event in model.stream_response(
system_instructions=None,
input="",
model_settings=ModelSettings(),
tools=[],
output_schema=None,
handoffs=[],
tracing=ModelTracing.DISABLED,
previous_response_id=None,
conversation_id=None,
prompt=None,
):
events.append(event)
added_indexes = [
event.output_index for event in events if event.type == "response.output_item.added"
]
done_indexes = [
event.output_index for event in events if event.type == "response.output_item.done"
]
print("added_indexes=", added_indexes)
print("done_indexes=", done_indexes)
Current output on main:
added_indexes= [0, 0]
done_indexes= [0, 0]
Expected behavior
Each fallback function call should receive a unique output_index.
Expected output:
added_indexes= [0, 1]
done_indexes= [0, 1]
The fallback finalization loop should maintain a fallback-emitted count and increment the index after each non-streamed function call.
Please read this first
Describe the bug
Chat Completions streaming can emit duplicate
output_indexvalues for multiple function calls that never enter the real-time streaming-start path.In
src/agents/models/chatcmpl_stream_handler.py, the fallback branch for non-streamed function calls recomputesfallback_starting_indexfor each call. It offsets text/refusal/reasoning items and already-started streaming function calls, but it does not offset prior fallback function calls emitted in the same loop. As a result, two fallback function calls can both emitresponse.output_item.added,response.function_call_arguments.delta, andresponse.output_item.donewith the sameoutput_index.This makes stream consumers that reconcile items by
output_indexunable to distinguish fallback tool calls reliably.Debug information
mainat3854c124cb8e3e51fb660f5714405ee39ee86c5eRepro steps
Minimal reproducer:
Current output on
main:Expected behavior
Each fallback function call should receive a unique
output_index.Expected output:
The fallback finalization loop should maintain a fallback-emitted count and increment the index after each non-streamed function call.