[Bug]: cache_idx stamps cache_control on every block, exceeding Anthropic's 4-block limit

### Bug Description

When using `AgentWorkflow` with `output_cls` (structured output) and `cache_idx` set on the Anthropic LLM, the API returns:

```
invalid_request_error: A maximum of 4 blocks with cache_control may be provided. Found 29.
```

### Root Cause

Two issues in `llama_index/llms/anthropic/utils.py`:

**1. `blocks_to_anthropic_blocks` stamps `cache_control` on every block with no cap**

When a message has `cache_control` in its `additional_kwargs` (injected by `cache_idx`), `blocks_to_anthropic_blocks` creates a `global_cache_control` and applies it to **every** `TextBlock`, `ImageBlock`, `ToolUseBlock`, etc. in that message:

```python
# utils.py, blocks_to_anthropic_blocks()
if kwargs.get("cache_control"):
    global_cache_control = CacheControlEphemeralParam(**kwargs["cache_control"])

for block in blocks:
    if isinstance(block, TextBlock):
        if block.text:
            anthropic_blocks.append(_to_anthropic_text_block(block))
            if global_cache_control:
                anthropic_blocks[-1]["cache_control"] = global_cache_control  # every block gets it
```

This is fine for typical messages with 1-2 blocks, but `AgentWorkflow.generate_structured_response()` flattens the entire conversation history into many `TextBlock`s in a single `ChatMessage`. In my case this produces ~29 blocks in one message, all stamped with `cache_control`, exceeding Anthropic's limit of 4.

**2. System prompt `cache_control` is silently discarded**

In `messages_to_anthropic_messages`, system messages are extracted as plain strings, discarding any `cache_control` markers that were set:

```python
# utils.py, messages_to_anthropic_messages()
if message.role == MessageRole.SYSTEM:
    system_prompt.extend(
        [block.text for block in message.blocks if isinstance(block, TextBlock)]
    )
# ...
return ..., "\n".join(system_prompt)  # plain string, cache_control lost
```

So even when `cache_idx` covers the system message, the `cache_control` is set but then thrown away when the system prompt is extracted as a joined string.

### Steps to Reproduce

```python
from llama_index.llms.anthropic import Anthropic
from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.core.tools import FunctionTool
from pydantic import BaseModel

class MyOutput(BaseModel):
    result: str

def my_tool(query: str) -> str:
    """Look something up."""
    return f"answer to {query}"

llm = Anthropic(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    cache_idx=1,  # enable prompt caching
)

agent = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=[FunctionTool.from_defaults(fn=my_tool)],
    llm=llm,
    system_prompt="You are a helpful assistant.",
    output_cls=MyOutput,  # triggers generate_structured_response
)

import asyncio

async def run():
    result = await agent.run(user_msg="Look up foo, then bar, then baz")
    print(result)

asyncio.run(run())
```

After a few tool call rounds, `generate_structured_response()` flattens the conversation into many `TextBlock`s in one message. With `cache_idx=1`, all blocks get `cache_control`, and the Anthropic API rejects the request.

### Relevant Logs/Tracbacks

```
anthropic.BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'A maximum of 4 blocks with cache_control may be provided. Found 29.'}}
```

### Suggested Fix

In `blocks_to_anthropic_blocks`, only apply `cache_control` to the **last** block in the message (matching Anthropic's recommended pattern for cache breakpoints), rather than every block:

```python
# After building all anthropic_blocks:
if global_cache_control and anthropic_blocks:
    anthropic_blocks[-1]["cache_control"] = global_cache_control
```

For the system prompt issue, `messages_to_anthropic_messages` could return the system prompt as a list of content blocks (preserving `cache_control`) instead of a joined plain string, when cache markers are present.

### Environment

- llama-index-llms-anthropic version: 0.10.10
- Python 3.12
- Anthropic API

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: cache_idx stamps cache_control on every block, exceeding Anthropic's 4-block limit #20854

Bug Description

Root Cause

Steps to Reproduce

Relevant Logs/Tracbacks

Suggested Fix

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: cache_idx stamps cache_control on every block, exceeding Anthropic's 4-block limit #20854

Description

Bug Description

Root Cause

Steps to Reproduce

Relevant Logs/Tracbacks

Suggested Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions