Skip to content

Conversation

@dsfaccini
Copy link
Contributor

@dsfaccini dsfaccini commented Nov 17, 2025

Fixes #3428

work in progress:

  • check CI tests and coverage
  • double check code to clarify

tools, strict_tools_requested = self._get_tools(model_request_parameters, model_settings)
tools, mcp_servers, beta_features = self._add_builtin_tools(tools, model_request_parameters)
output_format = self._build_output_format(model_request_parameters)
structured_output_beta_required = strict_tools_requested or bool(output_format)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could simplify the code below by adding a value to the beta_headers list here right?

'input_schema': f.parameters_json_schema,
}
if f.strict is not None:
tool_param['strict'] = f.strict # type: ignore[assignment]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at how the OpenAI model uses is_strict_compatible. If the user didn't explicitly say strict=True on their tool, it'll be strict=None, so we check if the schema is strict-compatible, and if so we set strict=True.

So to get the same behavior with Anthropic, we should check if the schema can be transformed successfully (losslessly), and set strict=True unless it's explicitly strict=False

- Resolved conflicts in anthropic.py by keeping our strict tools and native JSON implementation
- Resolved conflicts in test_anthropic.py by merging strict=True with TTL='5m' in assertions
- Kept our test_anthropic_mixed_strict_tool_run test
- Added upstream's new features:
  - count_tokens() method for token counting
  - TTL support for cache control (5m and 1h)
  - New helper methods (_infer_tool_choice, _map_extra_headers)
  - test_anthropic_cache_with_custom_ttl test
@DouweM DouweM changed the title Support native json output and strict tool calls for anthropic Support native JSON output and strict tool calls for Anthropic Nov 20, 2025
all_of = node.pop('allOf', None)

node.pop('description', None)
node.pop('title', None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing all of these pops? And why are we modifying the schema in place if the method sounds like it just does a check?

# check compatibility before calling anthropic's transformer
# so we don't auto-enable strict when the SDK would drop constraints
self.is_strict_compatible = False
transformed = transform_schema(schema)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at how OpenAiJsonSchemaTransformer uses is_strict_compatible, you'll see that it depends on self.strict when it find something in compatible with strict mode:

  • If self.strict is True, it will modify the schema to somehow make it work, even if this is lossy
  • If self.strict is None, it will NOT modify the schema and set is_strict_compatible = False
  • If self.strict is False, it won't do anything

I don't think that's the behavior we currently have, as we also transform the schema, not just when self.strict is True

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way I think that means that for tool defs, we should respect their own strict property, as it's valid to a send non-strict-compatible schema.

But for native output, I think we should always force strict=True into this validator (by setting it on the output_object?) because Anthropic requires the schema to be strict.

With OpenAI it's different as their "native" json schema output mode also allows strict=False.


### about static typing

- other codebases don't use types in their test files
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move claude changes to a separate PR so we can be a bit more nitpicky on teh details of what we teach the LLM. For example, I don't think it's necessary/helpful to claim this as if it's a fact :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move to a new PR!

): # pragma: no branch
# This would result in `tool_choice=required`, which Anthropic does not support with thinking.
raise UserError(
'Anthropic does not support thinking and output tools at the same time. Use `output_type=PromptedOutput(...)` instead.'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mode we recommend should be dynamic based on supports_json_schema_output; see Google where we do the same thing


if (
model_request_parameters.output_mode == 'native' and model_request_parameters.output_object is not None
): # pragma: no branch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pragma: no branch is weird, as it means we never get here with an output mode other than native. There should definitely be cases where we get here with tool or prompted


try:
extra_headers = self._map_extra_headers(beta_features, model_settings)
betas_list, extra_headers = self._prepare_betas_and_headers(betas, model_settings)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having both betas and betas_list is a bit weird. I'd rather have just betas as a set, and then turn it into a list when we pass it into the method below

tools=tools or OMIT,
tool_choice=tool_choice or OMIT,
mcp_servers=mcp_servers or OMIT,
betas=betas_list or OMIT,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird that we don't have to pass output_format here, as it does contribute to token usage. Can you make an explicit comment about that, so it doesn't look like an oversight?

tool_def.strict for tool_def in model_request_parameters.tool_defs.values()
)

if has_strict_tools or model_request_parameters.output_mode == 'native':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a scenario where we can send a tool def with strict=True, without also sending the structured output beta: if ToolDefinition.strict is None (by default), has_strict_tools will be False, but customize_request_parameters will set strict=schema_transformer.is_strict_compatible, which maybe True.

So we should really add this beta depending on the result of _get_tools/_map_tool_definition, not the original ToolDefinitions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means we also don't need to check self.profile.supports_json_schema_output here anymore, as the tool dicts only get strict=True if that value is enabled

transformed = transform_schema(schema)
if before != transformed:
self.is_strict_compatible = False
return transformed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we return the unmodified before if self.is_strict_compatible is False?


### about static typing

- other codebases don't use types in their test files
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move to a new PR!

result = await agent.run('What is the capital of the user country?')
# Should return CityLocation since we asked about capital
assert isinstance(result.output, city_location_schema | country_language_schema)
if isinstance(result.output, city_location_schema): # pragma: no branch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need this check if we make the assert explicitly verify city_location_schema



def test_lossless_simple_model():
"""Simple BaseModel with basic types should be lossless."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't test anything anymore, as passing strict=True will always result in is_strict_compatible is True. I want to make sure that passing basic models like this with strict=None will result in is_strict_compatible is True, so that we use strict as much as possible, except in specific cases where the user is really doing something incompatible.

I'm a little worried that transform_schema is going to return something slightly different than the input even in cases where the conversion is lossless, and we end up incorrectly having is_strict_compatible is False and thus not passing strict=True, even though we could have.

For example, turning {type: 'string', nullable: true} into type: ['string', 'null'] would be a lossless conversion, but our logic would treat it as strict-incompatible because it's not identical. This example may not apply to transform_schema, but you get the idea: inequality does not always mean lossiness. So we should have a bunch of tests for standard scenarios to ensure it works as expected.

I believe the openai tests have a big parameterized test for testing how strict ends up already.

assert transformer.is_strict_compatible is False


def test_lossy_nested_defs():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're now testing a lot of things that we know require lossy transformation, but I think it'll be more valuable to test for things that should not require lossy transformation (i.e. no transformation at all, or lossless transformation) and ensure that those get is_strict_compatible is True (+ possibly a transformed schema).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Anthropic native JSON output + strict tool calls

2 participants