Better handling when max tokens limit results in incomplete tool call arguments

### Initial Checks

- [x] I confirm that I'm using the latest version of Pydantic AI
- [x] I confirm that I searched for my issue in https://github.com/pydantic/pydantic-ai/issues before opening this issue

### Description

I encounter this only with Anthropic models.

From what I gather, it's pretty easy for me to hit the output token limit for some tool calls that I make for file editing, and it seems that this results in completely dropping whichever arg was being written, and leaving an incomplete tool args (even missing the closing brace)

There's a retry prompt by default, but it's ultimately not helpful because I suppose the agent doesn't even know it's input was actually cut off.

My example code calls with and without retries to show what seems to be happening to the tool args

*Saw 1.0.16 included changes around anthropic token usage, so upgraded and still erroring this way
```
---With Retry---
[Request] Starting part 0: ToolCallPart(tool_name='write_long_text', tool_call_id='toolu_012yy98oVc8ZuDPeY4uT75pd')
[Error] EOF while parsing an object at line 1 column 30
Output: 30
{"title": "example with retry"
```

### Example Code

```Python
from pydantic_ai import Agent
from pydantic_ai.messages import (
    FinalResultEvent,
    FunctionToolCallEvent,
    FunctionToolResultEvent,
    PartDeltaEvent,
    PartStartEvent,
    TextPartDelta,
    ThinkingPartDelta,
    ToolCallPartDelta,
)
import asyncio

SONNET_4 = "claude-sonnet-4-5-20250929"

def write_long_text(title: str, text: str):
    return f"Wrote {len(text)} chars for {title}"

async def main(agent):
    prompt = (
        "Call the write_long_text tool with a single argument named 'text' containing approximately "
        "300 lines of physics simulation code. Do not return any assistant text; only make the tool call."
    )

    output = []

    try:
        async with agent.iter(user_prompt=prompt) as agent_run:
            async for node in agent_run:
                if Agent.is_model_request_node(node):
                    n = node  # type: ignore[assignment]
                    async with n.stream(agent_run.ctx) as request_stream:
                        async for chunk in request_stream:
                            if isinstance(chunk, PartStartEvent):
                                print(f'[Request] Starting part {chunk.index}: {chunk.part!r}')
                            elif isinstance(chunk, PartDeltaEvent):
                                if isinstance(chunk.delta, TextPartDelta):
                                    print(f'[Request] Part {chunk.index} text delta: {chunk.delta.content_delta!r}')
                                elif isinstance(chunk.delta, ThinkingPartDelta):
                                    print(f'[Request] Part {chunk.index} thinking delta: {chunk.delta.content_delta!r}')
                                elif isinstance(chunk.delta, ToolCallPartDelta):
                                    # print(f'[Request] Part {chunk.index} args delta: {chunk.delta.args_delta}')
                                    output.append(chunk.delta.args_delta)
                            elif isinstance(chunk, FunctionToolCallEvent):
                                print(
                                    f'[Tools] The LLM calls tool={chunk.part.tool_name!r} with args={chunk.part.args} (tool_call_id={chunk.part.tool_call_id!r})'
                                )
                            elif isinstance(chunk, FunctionToolResultEvent):
                                print(f'[Tools] Tool call {chunk.tool_call_id!r} returned => {chunk.result.content}')
                            elif isinstance(chunk, FinalResultEvent):
                                finish_reason = getattr(chunk, "finish_reason", None)
                                if finish_reason is not None:
                                    print(f"[Result] Finish reason: {finish_reason}")
                                print(f'[Result] The model starting producing a final result (tool_name={chunk.tool_name})')
    except Exception as e:
        print(f"[Error] {e}")

    str_output = "".join(output)
    print(f"Output: {len(str_output)}")
    print(str_output)

if __name__ == "__main__":
    agent = Agent(
        model=SONNET_4,
        output_type=str,
        tools=[write_long_text],
        instructions=(
            "You can use tools. When the user asks to write long text, call the write_long_text tool "
            "with the full text as the 'text' argument and 'example with retry' as the title."
        ),
    )
    print("---With Retry---")
    asyncio.run(main(agent))

    agent_no_retry = Agent(
        model=SONNET_4,
        output_type=str,
        tools=[write_long_text],
        instructions=(
            "You can use tools. When the user asks to write long text, call the write_long_text tool "
            "with the full text as the 'text' argument and 'example without retry' as the title."
        ),
        retries=0
    )
    print("---Without retry---")
    asyncio.run(main(agent_no_retry))

"""
---With Retry---
[Request] Starting part 0: ToolCallPart(tool_name='write_long_text', tool_call_id='toolu_01Pxsax9GPLwJJzDu5YhESoc')
[Error] Exceeded maximum retries (1) for output validation
Output: 14343
{"title": "example with retry", "text": "import [...i skip here, just important to note that this is a complete args json]"}
---Without retry---
[Request] Starting part 0: ToolCallPart(tool_name='write_long_text', tool_call_id='toolu_015hU11V6Xd9wrs4Su3GiU1X')
[Error] Tool 'write_long_text' exceeded max retries count of 0
Output: 33
{"title": "example without retry"

---^ Observe that the offending long arg gets dropped completely

"""
```

### Python, Pydantic AI & LLM client version

```Text
python 3.12.6
pydantic-ai 1.0.15 -> 1.0.16
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Better handling when max tokens limit results in incomplete tool call arguments #3118

Initial Checks

Description

Example Code

Python, Pydantic AI & LLM client version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Better handling when max tokens limit results in incomplete tool call arguments #3118

Description

Initial Checks

Description

Example Code

Python, Pydantic AI & LLM client version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions