Skip to content

fix: Anthropic extended thinking with tool use and minor fixes #2487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Kaushal-26
Copy link

Fixes: #2425 (issue)

- In anthropic extended thinking we cannot send `tool_choice: {"type": "any"}` or `tool_choice: {"type": "tool", "name": "..."}` will give an error.
	- Source: https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#extended-thinking-with-tool-use
- Fixes anthropic models file to explicitly handle thinking enabled flag.
- Add testcases for output variations in pydantic as: `pydantic.BaseModel, ToolOutput and StructuredDict`. Failed earlier without changes
- Add a warning for `temperature < 1`, as anthropic only allows `temperature == 1` if enabled thinking and `0.95 <= top_p <= 1`.
	- Source: https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#feature-compatibility
- Minor fix in .gitignore as .venv 3.9, 3.10, ... handled and pycache, ruff cache handled as previously dependent on .gitignore file inside this folders.
- Fix CLAUDE.md to use `pydantic_ai_slim/pydantic_ai/agent/` instead of `pydantic_ai_slim/pydantic_ai/agent.py` as file doesn't exists.
Copy link
Contributor

hyperlint-ai bot commented Aug 8, 2025

PR Change Summary

Enhanced Anthropic extended thinking functionality and addressed several issues.

  • Fixed tool choice handling in extended thinking to prevent errors.
  • Updated anthropic models to manage the thinking enabled flag explicitly.
  • Added test cases for output variations in Pydantic.
  • Implemented a warning for temperature settings in accordance with Anthropic guidelines.

Modified Files

  • CLAUDE.md

How can I customize these reviews?

Check out the Hyperlint AI Reviewer docs for more information on how to customize the review.

If you just want to ignore it on this PR, you can add the hyperlint-ignore label to the PR. Future changes won't trigger a Hyperlint review.

Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add hyperlint-ignore to the PR to ignore the link check for this PR.

@DouweM
Copy link
Collaborator

DouweM commented Aug 12, 2025

@Kaushal-26 Simply setting tool_choice to auto instead of any when thinking is enabled is not a proper fix, as this may cause the model to never actually call the output tool. The proper solution is to not use tool output mode, and instead use PromptedOutput. We can document that, and/or raise an error telling the user to do that. GPT-5 has a similar issue when built-in tools are used (#2488 (comment)) and I suggested raising an error for the user to fix. Can you please update the PR to do that here as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert these change that seem specific to your environment


if not tools:
tool_choice = None
else:
if not model_request_parameters.allow_text_output:
tool_choice = {'type': 'any'}
if thinking and thinking.get('type') == 'enabled':
tool_choice = {'type': 'auto'}
if temperature and temperature != 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have the user see an error and adjust this manually, than to change it for them and log warnings. So please remove this from the PR for now and keep it focused on the tool choice issue. If others report the temperature/top_p issue and find it hard to work around, we can consider something like this then.

@@ -2163,3 +2163,80 @@ async def test_anthropic_web_search_tool_stream(allow_model_requests: None, anth

Additional notable stories include Vietnam's plan to ban fossil-fuel motorcycles in the heart of Hanoi starting July 2026, aiming to cut air pollution and move toward cleaner transport, and ongoing restoration efforts for Copenhagen's Old Stock Exchange, which is taking shape 15 months after a fire destroyed more than half of the 400-year-old building.\
""")


async def test_anthropic_extended_thinking_with_tool_use(allow_model_requests: None, anthropic_api_key: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the suggested change to raise an error instead, we can modify this to try just one output_type and then use pytest.raises to verify the error

@Kaushal-26
Copy link
Author

@Kaushal-26 Simply setting tool_choice to auto instead of any when thinking is enabled is not a proper fix, as this may cause the model to never actually call the output tool. The proper solution is to not use tool output mode, and instead use PromptedOutput. We can document that, and/or raise an error telling the user to do that. GPT-5 has a similar issue when built-in tools are used (#2488 (comment)) and I suggested raising an error for the user to fix. Can you please update the PR to do that here as well?

Okay, I agree with you and am up for the requested changes.

But let's talk about this: Will the model use the tool or not for auto mode? Anthropic says:

Our testing has shown that this should not reduce performance. If you would like to keep chain-of-thought (particularly with Opus) while still requesting that the model use a specific tool, you can use {"type": "auto"} for tool_choice (the default) and add explicit instructions in a user message. For example: What's the weather like in London? Use the get_weather tool in your response.

from here: https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/implement-tool-use#forcing-tool-use

I feel we should allow both. Let me know your pov

@DouweM
Copy link
Collaborator

DouweM commented Aug 12, 2025

@Kaushal-26 Ah, good find. That says we should add explicit instructions in a user message though, which we don't currently do, although we do include this in the final_result tool description:

DEFAULT_OUTPUT_TOOL_DESCRIPTION = 'The final response which ends this conversation'

It may help to make that more strongly worded, but I'm not sure if that will give the same good result as having it in the user message.

Comment on lines +329 to +335
- content:
- text: |-
Validation feedback:
Plain text responses are not permitted, please include your response in a tool call
Fix the errors and try again.
type: text
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DouweM I tested a few things, you are right, though the tool is not 100% sure to be called.

Even in the test cassette, it gave a response after a retry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Anthropic thinking returns error: "Thinking may not be enabled when tool_choice forces tool use."
2 participants