Skip to content

Conversation

serialx
Copy link
Contributor

@serialx serialx commented Sep 15, 2025

Anthropic's API requires thinking blocks to be the first content in assistant messages when reasoning is enabled and tool calls are present.
ref: https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking

This change:

  • Stores thinking text in ResponseReasoningItem.content field
  • Stores signature in encrypted_content field for metadata
  • Reconstructs thinking blocks when converting items->messages
  • Handles both streaming and non-streaming cases
  • Only affects Anthropic models with reasoning enabled
  • Adds regression test to prevent future breakage

Fixes compatibility with Claude models when using tools with interleaved thinking.

This fixes #765
This fixes #678

Code to reproduce:

import asyncio

from openai.types.shared import Reasoning
from pydantic import BaseModel

from agents import (
    Agent,
    AgentUpdatedStreamEvent,
    ModelSettings,
    RawResponsesStreamEvent,
    RunItemStreamEvent,
    Runner,
    function_tool,
)


class WeatherTemp(BaseModel):
    city: str
    temperature_range: str


class WeatherConditions(BaseModel):
    city: str
    conditions: str


@function_tool
def get_weather_temp(city: str) -> WeatherTemp:
    """Get the current weather temperature information for a specified city."""
    print("[debug] get_weather_temp called")
    return WeatherTemp(city=city, temperature_range="14-20C")


@function_tool
def get_weather_conditions(city: str) -> WeatherConditions:
    """Get the current weather conditions information for a specified city."""
    print("[debug] get_weather_conditions called")
    return WeatherConditions(city=city, conditions="Sunny with wind.")


agent = Agent(
    name="Hello world",
    instructions="You are a helpful agent.",
    tools=[get_weather_temp, get_weather_conditions],
    model="litellm/anthropic/claude-sonnet-4-20250514",
    model_settings=ModelSettings(
        reasoning=Reasoning(effort="medium", summary="auto"),
        # Works with both with and without interleaved thinking
        extra_headers={"anthropic-beta": "interleaved-thinking-2025-05-14"},
    ),
)

PROMPT = (
    "What's the temperature in Tokyo? If the temp is below 30C, get the weather "
    "conditions and if it's sunny with wind, get the temperature in Tokyo's nearest "
    "neighbour country's capital"
)


async def agent_run():
    print("=== Runner.run ===")
    result = await Runner.run(
        agent,
        input=PROMPT,
    )
    print(result.final_output)
    print()


async def agent_run_streamed():
    print("=== Runner.run_streamed ===")
    result = Runner.run_streamed(
        agent,
        input=PROMPT,
    )
    async for event in result.stream_events():
        match event:
            case RawResponsesStreamEvent(data=data):
                delta = data.delta if hasattr(data, "delta") else ""
                print(f"Raw response: {data.type} {repr(delta)}")
            case RunItemStreamEvent():
                print("Run item:", event.type)
            case AgentUpdatedStreamEvent():
                print("Agent updated:", event.type)
    print(result.final_output)


async def main():
    await agent_run_streamed()
    await agent_run()


if __name__ == "__main__":
    asyncio.run(main())

Above code results in error in current main HEAD:

litellm.exceptions.BadRequestError: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"messages.1.content.0.type: Expected `thinking` or `redacted_thinking`, but found `text`. When `thinking` is enabled, a final `assistant` message must start with a thinking block (preceeding the lastmost set of `tool_use` and `tool_result` blocks). We recommend you include thinking blocks from previous turns. To avoid this requirement, disable `thinking`. Please consult our documentation at https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking"}}

Anthropic's API requires thinking blocks to be the first content in assistant
messages when reasoning is enabled and tool calls are present.
ref: https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking

This change:
- Stores thinking text in ResponseReasoningItem.content field
- Stores signature in encrypted_content field for metadata
- Reconstructs thinking blocks when converting items->messages
- Handles both streaming and non-streaming cases
- Only affects Anthropic models with reasoning enabled
- Adds regression test to prevent future breakage

Fixes compatibility with Claude models when using tools with interleaved thinking.
@seratch
Copy link
Member

seratch commented Sep 15, 2025

@codex review this

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

Comment on lines 509 to 528
elif reasoning_item := cls.maybe_reasoning_message(item):
# Reconstruct thinking blocks from content (text) and encrypted_content (signature)
content_items = reasoning_item.get("content", [])
signature = reasoning_item.get("encrypted_content")

if content_items:
# Reconstruct thinking blocks from content and signature
pending_thinking_blocks = []
for content_item in content_items:
if (
isinstance(content_item, dict)
and content_item.get("type") == "reasoning_text"
):
thinking_block = {
"type": "thinking",
"thinking": content_item.get("text", ""),
}
# Add signature if available
if signature:
thinking_block["signature"] = signature

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Preserve redacted thinking blocks when building chat messages

The reasoning item is only converted into thinking blocks when content contains reasoning_text. When Anthropic redacts the reasoning text, the SDK still returns a signature in encrypted_content but leaves content empty. In that case this branch never runs, so pending_thinking_blocks stays None and the assistant message is emitted with plain text before the tool call, which continues to violate Anthropic’s requirement that the message start with a thinking/redacted_thinking block. Consider generating a placeholder thinking block whenever a signature exists even if no text is supplied so that redacted reasoning still surfaces as the first content element.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex actually, when the reasoning is redacted the form is completely different. See the example in the official docs:

The second block is the example of a redacted thinking content:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "redacted_thinking",
      "data": "EmwKAhgBEgy3va3pzix/LafPsn4aDFIT2Xlxh0L5L8rLVyIwxtE3rAFBa8cr3qpPkNRj2YfWXGmKDxH4mPnZ5sQ7vB9URj2pLmN3kF8/dW5hR7xJ0aP1oLs9yTcMnKVf2wRpEGjH9XZaBt4UvDcPrQ..."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

My PR currently doesn't support this case since I was unable to test this. The special test prompt mentioned in the documentation didn't work.

Comment on lines 545 to +557
if state.reasoning_content_index_and_output:
outputs.append(state.reasoning_content_index_and_output[1])
reasoning_item = state.reasoning_content_index_and_output[1]
# Store thinking text in content and signature in encrypted_content
if state.thinking_text:
# Add thinking text as a Content object
if not reasoning_item.content:
reasoning_item.content = []
reasoning_item.content.append(
Content(text=state.thinking_text, type="reasoning_text")
)
# Store signature in encrypted_content
if state.thinking_signature:
reasoning_item.encrypted_content = state.thinking_signature

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Streaming path drops thinking block when Anthropic redacts reasoning text

The streaming handler only appends a reasoning_text content item when state.thinking_text is non-empty. For streams where Anthropic suppresses the thinking text and provides only a signature in thinking_blocks, thinking_text remains empty while thinking_signature is set. The resulting ResponseReasoningItem carries the signature but no content, so downstream conversion still produces an assistant message that begins with normal text and fails Anthropic’s extended-thinking requirement. A redacted thinking block should be emitted whenever a signature is collected even if no text arrives.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex This is the same as above.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Here are some suggestions.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

Comment on lines 508 to 528
# 7) reasoning message => extract thinking blocks if present
elif reasoning_item := cls.maybe_reasoning_message(item):
# Reconstruct thinking blocks from content (text) and encrypted_content (signature)
content_items = reasoning_item.get("content", [])
signature = reasoning_item.get("encrypted_content")

if content_items:
# Reconstruct thinking blocks from content and signature
pending_thinking_blocks = []
for content_item in content_items:
if (
isinstance(content_item, dict)
and content_item.get("type") == "reasoning_text"
):
thinking_block = {
"type": "thinking",
"thinking": content_item.get("text", ""),
}
# Add signature if available
if signature:
thinking_block["signature"] = signature

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Handle redacted thinking blocks when rebuilding assistant messages

This code only reconstructs thinking blocks from reasoning items when the content array contains reasoning_text, and it always emits blocks of type "thinking" with a thinking string. The Anthropic API can also return blocks of type "redacted_thinking" where the plaintext is replaced with a data payload, and those blocks must still be preserved at the start of the assistant message. In the redacted case reasoning_item.content will be empty and the encrypted payload is never captured, so no thinking block is emitted and the next assistant message begins with plain text again, triggering the same 400 error (“Expected thinking or redacted_thinking”) this change is meant to fix. Consider detecting redacted_thinking blocks, storing their data alongside the signature, and emitting {"type": "redacted_thinking", ...} when rebuilding the assistant message.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex My PR currently doesn't support this case since I was unable to test this. The special test prompt mentioned in the documentation didn't work. In practice, I didn't observe any redacted_thinking in the wild. I think this is a very rare condition.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serialx
Copy link
Contributor Author

serialx commented Sep 15, 2025

If anyone want to test this, you can add this to your pyproject.toml to test it:

[tool.uv.sources]
openai-agents = { git = "https://github.com/openai/openai-agents-python", branch = "refs/pull/1744/head" }

Move the location of thinking block insertion from `ensure_assistant_message()` to `elif func_call := cls.maybe_function_tool_call(item):` to be more specific and prevent unintended side-effects.
Be more selective/surgical on the reasoning message preservation.
When we handoff from Claude 4 Sonnet Thinking to non-thinking agent, we get errors because non-thinking models expects no thinking blocks in the request. This fixes this edge case by only preserving blocks when reasoning effort is not None.
Since this parameter is anthropic-specific logic, changed the name to anthropic terminology
@serialx serialx requested a review from seratch September 16, 2025 08:24
Copy link
Member

@seratch seratch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed this works with the model

@seratch seratch merged commit 605611c into openai:main Sep 16, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tool calling with LiteLLM + claude thinking models fails Unable to use reasoning models with tool calls using LitellmModel

2 participants