Skip to content

Using event_stream_handler causes text to appear before thinking for gpt-oss models on ollama/vllm #3210

@daryllimyt

Description

@daryllimyt

Initial Checks

Description

Issue

When using gpt-oss family models through OpenAIProvider, we see text appear before thinking parts. This only happens when event_stream_handler is present and invoking Agent.run().

When running using Agent.run_sync() or Agent.run() without event_stream_handler (comment out), text correctly appears after thinking.

Example Code

#!/usr/bin/env python3
# /// script
# requires-python = ">=3.12"
# dependencies = [
#     "pydantic-ai-slim[openai,anthropic,bedrock]==1.2.0",
#     "python-dotenv==1.1.1",
# ]
# ///
import asyncio
from collections.abc import AsyncIterable

from dotenv import load_dotenv
from pydantic_ai import Agent, RunContext
from pydantic_ai.messages import AgentStreamEvent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider
from pydantic_core import to_json

load_dotenv()


model = OpenAIChatModel(
    model_name="gpt-oss:20b",
    provider=OpenAIProvider(base_url="http://localhost:11434/v1/", api_key="ollama"), # Running on ollama
)


async def noop(context: RunContext[None], events: AsyncIterable[AgentStreamEvent]):
    pass


async def main():
    agent = Agent(
        model=model,
        instructions="You are a helpful assistant.",
        event_stream_handler=noop,
    )

    result = await agent.run("Hi")

    print(to_json(result.new_messages(), indent=2).decode())


if __name__ == "__main__":
    asyncio.run(main())

Outputs

> uv run scripts/repro/async_gpt_oss_vllm_ollama.py
[
  {
    "parts": [
      {
        "content": "Hi",
        "timestamp": "2025-10-21T15:47:00.949069Z",
        "part_kind": "user-prompt"
      }
    ],
    "instructions": "You are a helpful assistant.",
    "kind": "request"
  },
  {
    "parts": [
      {
        "content": "Hello! How can I help you today?",
        "id": null,
        "part_kind": "text"
      },
      {
        "content": "User says \"Hi\". Need respond politely.",
        "id": "reasoning",
        "signature": null,
        "provider_name": "openai",
        "part_kind": "thinking"
      }
    ],
    "usage": {
      "input_tokens": 82,
      "cache_write_tokens": 0,
      "cache_read_tokens": 0,
      "output_tokens": 28,
      "input_audio_tokens": 0,
      "cache_audio_read_tokens": 0,
      "output_audio_tokens": 0,
      "details": {}
    },
    "model_name": "gpt-oss:20b",
    "timestamp": "2025-10-21T15:47:11Z",
    "kind": "response",
    "provider_name": "openai",
    "provider_details": {
      "finish_reason": "stop"
    },
    "provider_response_id": "chatcmpl-758",
    "finish_reason": "stop"
  }
]

Python, Pydantic AI & LLM client version

#!/usr/bin/env python3
# /// script
# requires-python = ">=3.12"
# dependencies = [
#     "pydantic-ai-slim[openai,anthropic,bedrock]==1.2.0",
#     "python-dotenv==1.1.1",
# ]
# ///

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions