Default to native output for groq models that support it#3858
Default to native output for groq models that support it#3858
Conversation
Docs Preview
|
| return GroqModelProfile( | ||
| supports_json_schema_output=True, | ||
| supports_json_object_output=True, | ||
| default_structured_output_mode='native', |
There was a problem hiding this comment.
On Slack, Rami pointed out:
Quick test results: Tool mode is actually the best option for GPT-OSS + tools, but still has ~10% failure rate. Native mode can't work because Groq explicitly rejects JSON mode + tool calling together. Prompted mode confuses the model into trying to call a "json" tool. (edited)
So we should make sure we only use native mode when there are no function tools.
Similar to how we dynamically determine the structured output mode based on various factors in (i believe) google and anthropic
| ) | ||
|
|
||
|
|
||
| def groq_gpt_oss_model_profile(model_name: str) -> ModelProfile: |
There was a problem hiding this comment.
profiles/groq.py is for models by groq (like compound). this function should go into the provider.
There was a problem hiding this comment.
should we add this to a profiles/CLAUDE.md ?
| supports_json_object_output=True, | ||
| supports_json_schema_output=True, | ||
| default_structured_output_mode='native', | ||
| json_schema_transformer=OpenAIJsonSchemaTransformer, |
There was a problem hiding this comment.
did we verify we need to do these 2 lines?
There was a problem hiding this comment.
- i'd rather stick to just the case where we confirmed native works better than tool
- if it needs OpenAIJsonSchemaTransformer, should we set that inside the provider.model_profile method?
There was a problem hiding this comment.
I think we should because all of them are using it
Edit: actually I take that back: here it's formatted nicely, it's ugly and verbose in the model_profile
There was a problem hiding this comment.
I don't really understand your comment 😄 I wasn't talking about the formatting, but about these fields being set at all
There was a problem hiding this comment.
yeahp, moved to centralized constant and yes, verified they all support this
I'm not sure how to verify that the one would work better than the other.... other than running benchmarks
- Add pragma no cover to unused tool body in fallback test - Add unit test for prepare_request auto mode with non-native profile 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…c settings take precedence Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| ) | ||
| elif model_request_parameters.output_mode == 'auto': | ||
| if self.profile.default_structured_output_mode == 'native': | ||
| model_request_parameters = replace(model_request_parameters, output_mode='tool') |
There was a problem hiding this comment.
The unresolved issue from Devin's review is correct and needs to be fixed. When forcing output_mode='tool' here, you must also set allow_text_output=False to ensure tool_choice='required' is used instead of tool_choice='auto'.
Without this, the model can respond with plain text instead of calling the output tool, causing unnecessary retries and API calls. This is confirmed by the test cassettes showing text responses that fail JSON validation.
| model_request_parameters = replace(model_request_parameters, output_mode='tool') | |
| model_request_parameters = replace(model_request_parameters, output_mode='tool', allow_text_output=False) |
@DouweM This is a critical bug that affects the reliability of structured output when function tools are present.
| supports_json_schema_output=True, | ||
| default_structured_output_mode='native', | ||
| json_schema_transformer=OpenAIJsonSchemaTransformer, | ||
| ) |
There was a problem hiding this comment.
@DouweM This constant is applied to MoonshotAI and Meta Llama 4 models in addition to GPT-OSS models, but the PR title/description only mentions GPT-OSS. This introduces behavior changes for MoonshotAI and Meta Llama 4 (defaulting to native structured output instead of tool-based).
Was this intentional and verified? If so, the PR title and description should be updated to reflect that this affects multiple model families, not just GPT-OSS. If not, the profile should only be applied to GPT-OSS models for now.
| return meta_model_profile(model_name) | ||
|
|
||
|
|
||
| def groq_gpt_oss_model_profile(model_name: str) -> ModelProfile: |
There was a problem hiding this comment.
Inconsistent return type: this function returns ModelProfile (non-nullable), while groq_moonshotai_model_profile and meta_groq_model_profile return ModelProfile | None.
Since harmony_model_profile already returns ModelProfile (non-nullable) after the change in profiles/harmony.py, this is correct. However, the other two functions should also return non-nullable ModelProfile to maintain consistency, since they're now guaranteed to return a value (either the base profile or an empty ModelProfile() updated with the native output profile).
| def groq_gpt_oss_model_profile(model_name: str) -> ModelProfile: | |
| def groq_gpt_oss_model_profile(model_name: str) -> ModelProfile: |
| produce tool-based structured output on Groq.""" | ||
|
|
||
|
|
||
| def groq_moonshotai_model_profile(model_name: str) -> ModelProfile | None: |
There was a problem hiding this comment.
The return type should be ModelProfile (non-nullable) not ModelProfile | None, since this function is guaranteed to return a value - either the result of moonshotai_model_profile(model_name) or ModelProfile(), both updated with _GROQ_NATIVE_OUTPUT_PROFILE.
| def groq_moonshotai_model_profile(model_name: str) -> ModelProfile | None: | |
| def groq_moonshotai_model_profile(model_name: str) -> ModelProfile: |
| return base.update(_GROQ_NATIVE_OUTPUT_PROFILE) | ||
|
|
||
|
|
||
| def meta_groq_model_profile(model_name: str) -> ModelProfile | None: |
There was a problem hiding this comment.
For consistency with the other profile functions in this module, the return type should be updated to match. When this function returns a Llama 4 profile (line 56), it's guaranteed to return ModelProfile, but when it returns for other models (line 58), it returns ModelProfile | None.
Consider whether all Llama 4 models handled here should get native output, or just the two specific models. If all handled models should get a profile, change the return type to ModelProfile. If not, keep ModelProfile | None but ensure consistency across all three functions.
|
|
||
|
|
||
| def harmony_model_profile(model_name: str) -> ModelProfile | None: | ||
| def harmony_model_profile(model_name: str) -> ModelProfile: |
There was a problem hiding this comment.
This return type change from ModelProfile | None to ModelProfile (non-nullable) is a breaking change that affects the public API. While it makes sense because openai_model_profile is guaranteed to return a non-null value for Harmony models, this changes the function's contract.
Before making this change, verify that all callers can handle the new return type. Looking at the code, groq_gpt_oss_model_profile now relies on this returning non-null, which is fine, but this should be noted as a behavior change in the PR description or commit message.
| elif isinstance(part, RetryPromptPart): | ||
| if part.tool_name is None: | ||
| yield chat.ChatCompletionUserMessageParam( # pragma: no cover | ||
| role='user', content=part.model_response() | ||
| ) | ||
| yield chat.ChatCompletionUserMessageParam(role='user', content=part.model_response()) | ||
| else: | ||
| yield chat.ChatCompletionToolMessageParam( | ||
| role='tool', | ||
| tool_call_id=_guard_tool_call_id(t=part), | ||
| content=part.model_response(), | ||
| ) | ||
| else: | ||
| assert_never(part) |
There was a problem hiding this comment.
While adding the else: assert_never(part) clause is good for exhaustiveness checking, removing the pragma: no cover comments from lines 501-502 means these lines now need test coverage.
The tests should verify that RetryPromptPart with tool_name=None correctly yields a ChatCompletionUserMessageParam. If this branch isn't covered by existing tests, you'll need to add coverage to maintain the project's 100% coverage requirement.
| assert result.output.city | ||
| assert result.output.country | ||
|
|
||
| # Verify it used tool output (ToolCallPart) not native output | ||
| response = result.response | ||
| assert isinstance(response, ModelResponse) | ||
| assert any(isinstance(part, ToolCallPart) for part in response.parts), ( | ||
| f'Expected tool output fallback when native + tools: {response.parts}' | ||
| ) |
There was a problem hiding this comment.
@DouweM The manual isinstance checks here deviate from the established testing pattern in this file. The test immediately above (line 5720) uses result.all_messages() == snapshot(...) to capture the complete message history including response parts.
The project's testing guidelines state: "Use assert result == snapshot() for complex structures — more maintainable than manual field assertions". Snapshot testing catches all fields and changes automatically, making tests more comprehensive and easier to maintain.
The author noted they "consciously decided against this" but this creates an inconsistency with the established patterns in this test file. Should this exception be allowed, or should the tests follow the standard pattern?
| 'mistral': mistral_model_profile, | ||
| 'moonshotai/': groq_moonshotai_model_profile, | ||
| 'compound-': groq_model_profile, | ||
| 'openai/gpt-oss-': groq_gpt_oss_model_profile, |
There was a problem hiding this comment.
The prefix openai/gpt-oss- will match openai/gpt-oss-safeguard-20b (listed in PreviewGroqModelNames), which is a safety classifier model, not a language model. Applying native structured output to a safety classifier doesn't make sense.
The prefix should be more specific to only match the actual GPT-OSS language models (gpt-oss-20b and gpt-oss-120b), or the function should explicitly exclude safeguard models:
| 'openai/gpt-oss-': groq_gpt_oss_model_profile, | |
| 'openai/gpt-oss-20b': groq_gpt_oss_model_profile, | |
| 'openai/gpt-oss-120b': groq_gpt_oss_model_profile, |
Alternatively, if there are other gpt-oss models that should be included, the profile function should check and exclude safeguard:
def groq_gpt_oss_model_profile(model_name: str) -> ModelProfile:
"""Get profile for OpenAI GPT-OSS models on Groq."""
if 'safeguard' in model_name.lower():
# Safeguard models are classifiers, not language models
return openai_model_profile(model_name)
base = harmony_model_profile(model_name)
return base.update(_GROQ_NATIVE_OUTPUT_PROFILE)| ) from _import_error | ||
|
|
||
|
|
||
| _GROQ_NATIVE_OUTPUT_PROFILE = ModelProfile( |
There was a problem hiding this comment.
This significant behavior change (defaulting specific Groq models to native structured output) should be documented in docs/models/groq.md. The documentation should mention:
- Which models default to native structured output (GPT-OSS, MoonshotAI, Meta Llama 4 Scout/Maverick)
- Why native output is preferred for these models
- The automatic fallback to tool-based output when function tools are present
- The limitation that native structured output cannot be used with function tools on Groq
Per the project guidelines: "Document provider features in 3 places: Supported by: in docstrings (IDE hints), compatibility notes in generic docs (selection), detailed provider sections with links to official docs (deep dive)". This PR addresses the implementation but not the documentation.
| if model_request_parameters.output_mode == 'native': | ||
| raise UserError( | ||
| 'Groq does not support native structured output (JSON mode) with function tools. ' | ||
| 'Use `output_type=ToolOutput(...)` instead.' |
There was a problem hiding this comment.
The error message suggesting ToolOutput(...) is somewhat misleading. When the user gets this error, they've explicitly set output_mode='native' (perhaps via NativeOutput(...)), but the auto fallback to tool mode happens automatically when they use a plain output_type without wrappers.
Consider a message that better explains the situation:
| 'Use `output_type=ToolOutput(...)` instead.' | |
| 'Groq does not support native structured output (JSON mode) with function tools. ' | |
| 'Either remove the function tools or use tool-based structured output instead.' |
Or if you want to keep the wrapper suggestion pattern for consistency with Anthropic:
| 'Use `output_type=ToolOutput(...)` instead.' | |
| 'Groq does not support native structured output (JSON mode) with function tools. ' | |
| 'Use `output_type=ToolOutput(...)` to explicitly use tool-based output, or omit the wrapper to let auto mode handle it.' |
| GROQ_NATIVE_MODELS_WITH_TOOLS = [ | ||
| 'openai/gpt-oss-120b', | ||
| 'moonshotai/kimi-k2-instruct', | ||
| ] |
There was a problem hiding this comment.
Inconsistent test coverage: meta-llama/llama-4-scout-17b-16e-instruct is tested in test_groq_default_native_output (line 5759) but excluded from this fallback test. Since Meta Llama 4 models receive the native output profile, they should also be tested for the automatic fallback to tool output when function tools are present.
Either add 'meta-llama/llama-4-scout-17b-16e-instruct' to this list, or document why it's intentionally excluded.
| assert gpt_oss_profile.supports_json_object_output is True | ||
| assert gpt_oss_profile.supports_json_schema_output is True | ||
| assert gpt_oss_profile.default_structured_output_mode == 'native' | ||
| assert gpt_oss_profile.json_schema_transformer == OpenAIJsonSchemaTransformer |
There was a problem hiding this comment.
Missing test coverage: openai/gpt-oss-safeguard-20b should be tested to verify it does NOT receive the native output profile (since it's a safety classifier, not a language model). This would catch the bug where the openai/gpt-oss- prefix incorrectly matches safeguard models.
Add a test like:
safeguard_profile = provider.model_profile('openai/gpt-oss-safeguard-20b')
assert safeguard_profile is not None
assert safeguard_profile.default_structured_output_mode != 'native' # Should not have native output| for prefix, profile_func in prefix_to_profile.items(): | ||
| model_name = model_name.lower() | ||
| if model_name.startswith(prefix): | ||
| if prefix.endswith('/'): | ||
| model_name = model_name[len(prefix) :] | ||
| return profile_func(model_name) | ||
| model_name_lower = model_name.lower() | ||
| if model_name_lower.startswith(prefix): | ||
| # Strip provider prefix (e.g., 'openai/gpt-oss-120b' -> 'gpt-oss-120b') | ||
| if '/' in model_name_lower: | ||
| model_name_for_profile = model_name_lower.split('/', 1)[-1] | ||
| else: | ||
| model_name_for_profile = model_name_lower | ||
| return profile_func(model_name_for_profile) |
There was a problem hiding this comment.
Performance issue: model_name.lower() is called on every iteration of the loop (line 101). This should be moved outside the loop to avoid redundant string operations:
| for prefix, profile_func in prefix_to_profile.items(): | |
| model_name = model_name.lower() | |
| if model_name.startswith(prefix): | |
| if prefix.endswith('/'): | |
| model_name = model_name[len(prefix) :] | |
| return profile_func(model_name) | |
| model_name_lower = model_name.lower() | |
| if model_name_lower.startswith(prefix): | |
| # Strip provider prefix (e.g., 'openai/gpt-oss-120b' -> 'gpt-oss-120b') | |
| if '/' in model_name_lower: | |
| model_name_for_profile = model_name_lower.split('/', 1)[-1] | |
| else: | |
| model_name_for_profile = model_name_lower | |
| return profile_func(model_name_for_profile) | |
| model_name_lower = model_name.lower() | |
| for prefix, profile_func in prefix_to_profile.items(): | |
| if model_name_lower.startswith(prefix): | |
| # Strip provider prefix (e.g., 'openai/gpt-oss-120b' -> 'gpt-oss-120b') | |
| if '/' in model_name_lower: | |
| model_name_for_profile = model_name_lower.split('/', 1)[-1] | |
| else: | |
| model_name_for_profile = model_name_lower | |
| return profile_func(model_name_for_profile) |
| # Groq doesn't support native structured output with function tools. | ||
| # This must happen BEFORE super().prepare_request() because the base class | ||
| # clears output_tools when output_mode != 'tool'. | ||
| if model_request_parameters.function_tools: |
There was a problem hiding this comment.
The check only considers function_tools, but builtin_tools are also converted to tools at request time (see line 283 in _completions_create). If Groq doesn't support native structured output with ANY tools (not just function tools), this check should also include builtin_tools:
if model_request_parameters.function_tools or model_request_parameters.builtin_tools:If builtin_tools ARE compatible with native output on Groq, the current code is correct. However, this should be verified and potentially documented in a comment explaining why builtin_tools are excluded from this check.
Add resolve_auto_output_mode() to atomically set output_mode and allow_text_output, preventing the bug where auto->tool left allow_text_output=True. Also handle tool_use_failed text errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
motivated by https://pydanticlogfire.slack.com/archives/C083V7PMHHA/p1766580626469929