Skip to content

[Bug]: cache_idx stamps cache_control on every block, exceeding Anthropic's 4-block limit #20854

@lestephen

Description

@lestephen

Bug Description

When using AgentWorkflow with output_cls (structured output) and cache_idx set on the Anthropic LLM, the API returns:

invalid_request_error: A maximum of 4 blocks with cache_control may be provided. Found 29.

Root Cause

Two issues in llama_index/llms/anthropic/utils.py:

1. blocks_to_anthropic_blocks stamps cache_control on every block with no cap

When a message has cache_control in its additional_kwargs (injected by cache_idx), blocks_to_anthropic_blocks creates a global_cache_control and applies it to every TextBlock, ImageBlock, ToolUseBlock, etc. in that message:

# utils.py, blocks_to_anthropic_blocks()
if kwargs.get("cache_control"):
    global_cache_control = CacheControlEphemeralParam(**kwargs["cache_control"])

for block in blocks:
    if isinstance(block, TextBlock):
        if block.text:
            anthropic_blocks.append(_to_anthropic_text_block(block))
            if global_cache_control:
                anthropic_blocks[-1]["cache_control"] = global_cache_control  # every block gets it

This is fine for typical messages with 1-2 blocks, but AgentWorkflow.generate_structured_response() flattens the entire conversation history into many TextBlocks in a single ChatMessage. In my case this produces ~29 blocks in one message, all stamped with cache_control, exceeding Anthropic's limit of 4.

2. System prompt cache_control is silently discarded

In messages_to_anthropic_messages, system messages are extracted as plain strings, discarding any cache_control markers that were set:

# utils.py, messages_to_anthropic_messages()
if message.role == MessageRole.SYSTEM:
    system_prompt.extend(
        [block.text for block in message.blocks if isinstance(block, TextBlock)]
    )
# ...
return ..., "\n".join(system_prompt)  # plain string, cache_control lost

So even when cache_idx covers the system message, the cache_control is set but then thrown away when the system prompt is extracted as a joined string.

Steps to Reproduce

from llama_index.llms.anthropic import Anthropic
from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.core.tools import FunctionTool
from pydantic import BaseModel

class MyOutput(BaseModel):
    result: str

def my_tool(query: str) -> str:
    """Look something up."""
    return f"answer to {query}"

llm = Anthropic(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    cache_idx=1,  # enable prompt caching
)

agent = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=[FunctionTool.from_defaults(fn=my_tool)],
    llm=llm,
    system_prompt="You are a helpful assistant.",
    output_cls=MyOutput,  # triggers generate_structured_response
)

import asyncio

async def run():
    result = await agent.run(user_msg="Look up foo, then bar, then baz")
    print(result)

asyncio.run(run())

After a few tool call rounds, generate_structured_response() flattens the conversation into many TextBlocks in one message. With cache_idx=1, all blocks get cache_control, and the Anthropic API rejects the request.

Relevant Logs/Tracbacks

anthropic.BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'A maximum of 4 blocks with cache_control may be provided. Found 29.'}}

Suggested Fix

In blocks_to_anthropic_blocks, only apply cache_control to the last block in the message (matching Anthropic's recommended pattern for cache breakpoints), rather than every block:

# After building all anthropic_blocks:
if global_cache_control and anthropic_blocks:
    anthropic_blocks[-1]["cache_control"] = global_cache_control

For the system prompt issue, messages_to_anthropic_messages could return the system prompt as a list of content blocks (preserving cache_control) instead of a joined plain string, when cache markers are present.

Environment

  • llama-index-llms-anthropic version: 0.10.10
  • Python 3.12
  • Anthropic API

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions