Skip to content

Chat Completions streaming fallback tool calls can reuse output_index #3104

@Aphroq

Description

@Aphroq

Please read this first

  • Have you read the docs? Agents SDK docs
  • Have you searched for related issues? Others may have faced similar issues.

Describe the bug

Chat Completions streaming can emit duplicate output_index values for multiple function calls that never enter the real-time streaming-start path.

In src/agents/models/chatcmpl_stream_handler.py, the fallback branch for non-streamed function calls recomputes fallback_starting_index for each call. It offsets text/refusal/reasoning items and already-started streaming function calls, but it does not offset prior fallback function calls emitted in the same loop. As a result, two fallback function calls can both emit response.output_item.added, response.function_call_arguments.delta, and response.output_item.done with the same output_index.

This makes stream consumers that reconcile items by output_index unable to distinguish fallback tool calls reliably.

Debug information

  • Agents SDK version: main at 3854c124cb8e3e51fb660f5714405ee39ee86c5e
  • Python version: Python 3.12

Repro steps

Minimal reproducer:

from collections.abc import AsyncIterator

import pytest
from openai.types.chat.chat_completion_chunk import (
    ChatCompletionChunk,
    Choice,
    ChoiceDelta,
    ChoiceDeltaToolCall,
    ChoiceDeltaToolCallFunction,
)
from openai.types.completion_usage import CompletionUsage
from openai.types.responses import Response

from agents.model_settings import ModelSettings
from agents.models.interface import ModelTracing
from agents.models.openai_chatcompletions import OpenAIChatCompletionsModel
from agents.models.openai_provider import OpenAIProvider


@pytest.mark.asyncio
async def test_fallback_tool_call_indexes(monkeypatch):
    chunks = [
        ChatCompletionChunk(
            id="chunk-id",
            created=1,
            model="fake",
            object="chat.completion.chunk",
            choices=[
                Choice(
                    index=0,
                    delta=ChoiceDelta(
                        tool_calls=[
                            ChoiceDeltaToolCall(
                                index=0,
                                function=ChoiceDeltaToolCallFunction(
                                    name="first_tool",
                                    arguments='{"a": 1}',
                                ),
                                type="function",
                            )
                        ]
                    ),
                )
            ],
        ),
        ChatCompletionChunk(
            id="chunk-id",
            created=1,
            model="fake",
            object="chat.completion.chunk",
            choices=[
                Choice(
                    index=0,
                    delta=ChoiceDelta(
                        tool_calls=[
                            ChoiceDeltaToolCall(
                                index=1,
                                function=ChoiceDeltaToolCallFunction(
                                    name="second_tool",
                                    arguments='{"b": 2}',
                                ),
                                type="function",
                            )
                        ]
                    ),
                )
            ],
            usage=CompletionUsage(completion_tokens=1, prompt_tokens=1, total_tokens=2),
        ),
    ]

    async def fake_stream() -> AsyncIterator[ChatCompletionChunk]:
        for chunk in chunks:
            yield chunk

    async def patched_fetch_response(self, *args, **kwargs):
        response = Response(
            id="resp-id",
            created_at=0,
            model="fake-model",
            object="response",
            output=[],
            tool_choice="none",
            tools=[],
            parallel_tool_calls=False,
        )
        return response, fake_stream()

    monkeypatch.setattr(OpenAIChatCompletionsModel, "_fetch_response", patched_fetch_response)

    model = OpenAIProvider(use_responses=False).get_model("gpt-4")
    events = []
    async for event in model.stream_response(
        system_instructions=None,
        input="",
        model_settings=ModelSettings(),
        tools=[],
        output_schema=None,
        handoffs=[],
        tracing=ModelTracing.DISABLED,
        previous_response_id=None,
        conversation_id=None,
        prompt=None,
    ):
        events.append(event)

    added_indexes = [
        event.output_index for event in events if event.type == "response.output_item.added"
    ]
    done_indexes = [
        event.output_index for event in events if event.type == "response.output_item.done"
    ]

    print("added_indexes=", added_indexes)
    print("done_indexes=", done_indexes)

Current output on main:

added_indexes= [0, 0]
done_indexes= [0, 0]

Expected behavior

Each fallback function call should receive a unique output_index.

Expected output:

added_indexes= [0, 1]
done_indexes= [0, 1]

The fallback finalization loop should maintain a fallback-emitted count and increment the index after each non-streamed function call.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions