Skip to content

Default to native output for groq models that support it#3858

Open
dsfaccini wants to merge 19 commits intomainfrom
review-groq-impl
Open

Default to native output for groq models that support it#3858
dsfaccini wants to merge 19 commits intomainfrom
review-groq-impl

Conversation

@dsfaccini
Copy link
Collaborator

@github-actions
Copy link
Contributor

github-actions bot commented Dec 26, 2025

Docs Preview

commit: 54d9b1e
Preview URL: https://d2c0750f-pydantic-ai-previews.pydantic.workers.dev

@dsfaccini dsfaccini requested a review from Kludex December 29, 2025 19:15
@dsfaccini dsfaccini added the feature New feature request, or PR implementing a feature (enhancement) label Jan 6, 2026
@dsfaccini dsfaccini added the groq label Jan 6, 2026
return GroqModelProfile(
supports_json_schema_output=True,
supports_json_object_output=True,
default_structured_output_mode='native',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Slack, Rami pointed out:

Quick test results: Tool mode is actually the best option for GPT-OSS + tools, but still has ~10% failure rate. Native mode can't work because Groq explicitly rejects JSON mode + tool calling together. Prompted mode confuses the model into trying to call a "json" tool. (edited)

So we should make sure we only use native mode when there are no function tools.

Similar to how we dynamically determine the structured output mode based on various factors in (i believe) google and anthropic

)


def groq_gpt_oss_model_profile(model_name: str) -> ModelProfile:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

profiles/groq.py is for models by groq (like compound). this function should go into the provider.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add this to a profiles/CLAUDE.md ?

supports_json_object_output=True,
supports_json_schema_output=True,
default_structured_output_mode='native',
json_schema_transformer=OpenAIJsonSchemaTransformer,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we verify we need to do these 2 lines?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • i'd rather stick to just the case where we confirmed native works better than tool
  • if it needs OpenAIJsonSchemaTransformer, should we set that inside the provider.model_profile method?

Copy link
Collaborator Author

@dsfaccini dsfaccini Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should because all of them are using it

Edit: actually I take that back: here it's formatted nicely, it's ugly and verbose in the model_profile

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand your comment 😄 I wasn't talking about the formatting, but about these fields being set at all

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeahp, moved to centralized constant and yes, verified they all support this

I'm not sure how to verify that the one would work better than the other.... other than running benchmarks

@DouweM DouweM added awaiting author revision bug Report that something isn't working, or PR implementing a fix size: S Small PR (≤100 weighted lines) and removed feature New feature request, or PR implementing a feature (enhancement) labels Jan 6, 2026
dsfaccini and others added 7 commits January 8, 2026 11:58
- Add pragma no cover to unused tool body in fallback test
- Add unit test for prepare_request auto mode with non-native profile

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@dsfaccini dsfaccini requested a review from DouweM January 27, 2026 15:20
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional flags.

Open in Devin Review

github-actions[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

dsfaccini and others added 3 commits February 6, 2026 16:24
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…c settings take precedence

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

@DouweM DouweM removed the auto-review label Feb 7, 2026
@github-actions github-actions bot added size: M Medium PR (101-500 weighted lines) and removed size: S Small PR (≤100 weighted lines) labels Feb 7, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

)
elif model_request_parameters.output_mode == 'auto':
if self.profile.default_structured_output_mode == 'native':
model_request_parameters = replace(model_request_parameters, output_mode='tool')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unresolved issue from Devin's review is correct and needs to be fixed. When forcing output_mode='tool' here, you must also set allow_text_output=False to ensure tool_choice='required' is used instead of tool_choice='auto'.

Without this, the model can respond with plain text instead of calling the output tool, causing unnecessary retries and API calls. This is confirmed by the test cassettes showing text responses that fail JSON validation.

Suggested change
model_request_parameters = replace(model_request_parameters, output_mode='tool')
model_request_parameters = replace(model_request_parameters, output_mode='tool', allow_text_output=False)

@DouweM This is a critical bug that affects the reliability of structured output when function tools are present.

supports_json_schema_output=True,
default_structured_output_mode='native',
json_schema_transformer=OpenAIJsonSchemaTransformer,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DouweM This constant is applied to MoonshotAI and Meta Llama 4 models in addition to GPT-OSS models, but the PR title/description only mentions GPT-OSS. This introduces behavior changes for MoonshotAI and Meta Llama 4 (defaulting to native structured output instead of tool-based).

Was this intentional and verified? If so, the PR title and description should be updated to reflect that this affects multiple model families, not just GPT-OSS. If not, the profile should only be applied to GPT-OSS models for now.

return meta_model_profile(model_name)


def groq_gpt_oss_model_profile(model_name: str) -> ModelProfile:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent return type: this function returns ModelProfile (non-nullable), while groq_moonshotai_model_profile and meta_groq_model_profile return ModelProfile | None.

Since harmony_model_profile already returns ModelProfile (non-nullable) after the change in profiles/harmony.py, this is correct. However, the other two functions should also return non-nullable ModelProfile to maintain consistency, since they're now guaranteed to return a value (either the base profile or an empty ModelProfile() updated with the native output profile).

Suggested change
def groq_gpt_oss_model_profile(model_name: str) -> ModelProfile:
def groq_gpt_oss_model_profile(model_name: str) -> ModelProfile:

produce tool-based structured output on Groq."""


def groq_moonshotai_model_profile(model_name: str) -> ModelProfile | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type should be ModelProfile (non-nullable) not ModelProfile | None, since this function is guaranteed to return a value - either the result of moonshotai_model_profile(model_name) or ModelProfile(), both updated with _GROQ_NATIVE_OUTPUT_PROFILE.

Suggested change
def groq_moonshotai_model_profile(model_name: str) -> ModelProfile | None:
def groq_moonshotai_model_profile(model_name: str) -> ModelProfile:

return base.update(_GROQ_NATIVE_OUTPUT_PROFILE)


def meta_groq_model_profile(model_name: str) -> ModelProfile | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the other profile functions in this module, the return type should be updated to match. When this function returns a Llama 4 profile (line 56), it's guaranteed to return ModelProfile, but when it returns for other models (line 58), it returns ModelProfile | None.

Consider whether all Llama 4 models handled here should get native output, or just the two specific models. If all handled models should get a profile, change the return type to ModelProfile. If not, keep ModelProfile | None but ensure consistency across all three functions.



def harmony_model_profile(model_name: str) -> ModelProfile | None:
def harmony_model_profile(model_name: str) -> ModelProfile:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return type change from ModelProfile | None to ModelProfile (non-nullable) is a breaking change that affects the public API. While it makes sense because openai_model_profile is guaranteed to return a non-null value for Harmony models, this changes the function's contract.

Before making this change, verify that all callers can handle the new return type. Looking at the code, groq_gpt_oss_model_profile now relies on this returning non-null, which is fine, but this should be noted as a behavior change in the PR description or commit message.

Comment on lines 500 to 510
elif isinstance(part, RetryPromptPart):
if part.tool_name is None:
yield chat.ChatCompletionUserMessageParam( # pragma: no cover
role='user', content=part.model_response()
)
yield chat.ChatCompletionUserMessageParam(role='user', content=part.model_response())
else:
yield chat.ChatCompletionToolMessageParam(
role='tool',
tool_call_id=_guard_tool_call_id(t=part),
content=part.model_response(),
)
else:
assert_never(part)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While adding the else: assert_never(part) clause is good for exhaustiveness checking, removing the pragma: no cover comments from lines 501-502 means these lines now need test coverage.

The tests should verify that RetryPromptPart with tool_name=None correctly yields a ChatCompletionUserMessageParam. If this branch isn't covered by existing tests, you'll need to add coverage to maintain the project's 100% coverage requirement.

Comment on lines +5825 to +5833
assert result.output.city
assert result.output.country

# Verify it used tool output (ToolCallPart) not native output
response = result.response
assert isinstance(response, ModelResponse)
assert any(isinstance(part, ToolCallPart) for part in response.parts), (
f'Expected tool output fallback when native + tools: {response.parts}'
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DouweM The manual isinstance checks here deviate from the established testing pattern in this file. The test immediately above (line 5720) uses result.all_messages() == snapshot(...) to capture the complete message history including response parts.

The project's testing guidelines state: "Use assert result == snapshot() for complex structures — more maintainable than manual field assertions". Snapshot testing catches all fields and changes automatically, making tests more comprehensive and easier to maintain.

The author noted they "consciously decided against this" but this creates an inconsistency with the established patterns in this test file. Should this exception be allowed, or should the tests follow the standard pattern?

'mistral': mistral_model_profile,
'moonshotai/': groq_moonshotai_model_profile,
'compound-': groq_model_profile,
'openai/gpt-oss-': groq_gpt_oss_model_profile,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefix openai/gpt-oss- will match openai/gpt-oss-safeguard-20b (listed in PreviewGroqModelNames), which is a safety classifier model, not a language model. Applying native structured output to a safety classifier doesn't make sense.

The prefix should be more specific to only match the actual GPT-OSS language models (gpt-oss-20b and gpt-oss-120b), or the function should explicitly exclude safeguard models:

Suggested change
'openai/gpt-oss-': groq_gpt_oss_model_profile,
'openai/gpt-oss-20b': groq_gpt_oss_model_profile,
'openai/gpt-oss-120b': groq_gpt_oss_model_profile,

Alternatively, if there are other gpt-oss models that should be included, the profile function should check and exclude safeguard:

def groq_gpt_oss_model_profile(model_name: str) -> ModelProfile:
    """Get profile for OpenAI GPT-OSS models on Groq."""
    if 'safeguard' in model_name.lower():
        # Safeguard models are classifiers, not language models
        return openai_model_profile(model_name)
    base = harmony_model_profile(model_name)
    return base.update(_GROQ_NATIVE_OUTPUT_PROFILE)

) from _import_error


_GROQ_NATIVE_OUTPUT_PROFILE = ModelProfile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This significant behavior change (defaulting specific Groq models to native structured output) should be documented in docs/models/groq.md. The documentation should mention:

  1. Which models default to native structured output (GPT-OSS, MoonshotAI, Meta Llama 4 Scout/Maverick)
  2. Why native output is preferred for these models
  3. The automatic fallback to tool-based output when function tools are present
  4. The limitation that native structured output cannot be used with function tools on Groq

Per the project guidelines: "Document provider features in 3 places: Supported by: in docstrings (IDE hints), compatibility notes in generic docs (selection), detailed provider sections with links to official docs (deep dive)". This PR addresses the implementation but not the documentation.

if model_request_parameters.output_mode == 'native':
raise UserError(
'Groq does not support native structured output (JSON mode) with function tools. '
'Use `output_type=ToolOutput(...)` instead.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message suggesting ToolOutput(...) is somewhat misleading. When the user gets this error, they've explicitly set output_mode='native' (perhaps via NativeOutput(...)), but the auto fallback to tool mode happens automatically when they use a plain output_type without wrappers.

Consider a message that better explains the situation:

Suggested change
'Use `output_type=ToolOutput(...)` instead.'
'Groq does not support native structured output (JSON mode) with function tools. '
'Either remove the function tools or use tool-based structured output instead.'

Or if you want to keep the wrapper suggestion pattern for consistency with Anthropic:

Suggested change
'Use `output_type=ToolOutput(...)` instead.'
'Groq does not support native structured output (JSON mode) with function tools. '
'Use `output_type=ToolOutput(...)` to explicitly use tool-based output, or omit the wrapper to let auto mode handle it.'

Comment on lines +5795 to +5798
GROQ_NATIVE_MODELS_WITH_TOOLS = [
'openai/gpt-oss-120b',
'moonshotai/kimi-k2-instruct',
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent test coverage: meta-llama/llama-4-scout-17b-16e-instruct is tested in test_groq_default_native_output (line 5759) but excluded from this fallback test. Since Meta Llama 4 models receive the native output profile, they should also be tested for the automatic fallback to tool output when function tools are present.

Either add 'meta-llama/llama-4-scout-17b-16e-instruct' to this list, or document why it's intentionally excluded.

assert gpt_oss_profile.supports_json_object_output is True
assert gpt_oss_profile.supports_json_schema_output is True
assert gpt_oss_profile.default_structured_output_mode == 'native'
assert gpt_oss_profile.json_schema_transformer == OpenAIJsonSchemaTransformer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage: openai/gpt-oss-safeguard-20b should be tested to verify it does NOT receive the native output profile (since it's a safety classifier, not a language model). This would catch the bug where the openai/gpt-oss- prefix incorrectly matches safeguard models.

Add a test like:

safeguard_profile = provider.model_profile('openai/gpt-oss-safeguard-20b')
assert safeguard_profile is not None
assert safeguard_profile.default_structured_output_mode != 'native'  # Should not have native output

Comment on lines 100 to +108
for prefix, profile_func in prefix_to_profile.items():
model_name = model_name.lower()
if model_name.startswith(prefix):
if prefix.endswith('/'):
model_name = model_name[len(prefix) :]
return profile_func(model_name)
model_name_lower = model_name.lower()
if model_name_lower.startswith(prefix):
# Strip provider prefix (e.g., 'openai/gpt-oss-120b' -> 'gpt-oss-120b')
if '/' in model_name_lower:
model_name_for_profile = model_name_lower.split('/', 1)[-1]
else:
model_name_for_profile = model_name_lower
return profile_func(model_name_for_profile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance issue: model_name.lower() is called on every iteration of the loop (line 101). This should be moved outside the loop to avoid redundant string operations:

Suggested change
for prefix, profile_func in prefix_to_profile.items():
model_name = model_name.lower()
if model_name.startswith(prefix):
if prefix.endswith('/'):
model_name = model_name[len(prefix) :]
return profile_func(model_name)
model_name_lower = model_name.lower()
if model_name_lower.startswith(prefix):
# Strip provider prefix (e.g., 'openai/gpt-oss-120b' -> 'gpt-oss-120b')
if '/' in model_name_lower:
model_name_for_profile = model_name_lower.split('/', 1)[-1]
else:
model_name_for_profile = model_name_lower
return profile_func(model_name_for_profile)
model_name_lower = model_name.lower()
for prefix, profile_func in prefix_to_profile.items():
if model_name_lower.startswith(prefix):
# Strip provider prefix (e.g., 'openai/gpt-oss-120b' -> 'gpt-oss-120b')
if '/' in model_name_lower:
model_name_for_profile = model_name_lower.split('/', 1)[-1]
else:
model_name_for_profile = model_name_lower
return profile_func(model_name_for_profile)

# Groq doesn't support native structured output with function tools.
# This must happen BEFORE super().prepare_request() because the base class
# clears output_tools when output_mode != 'tool'.
if model_request_parameters.function_tools:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check only considers function_tools, but builtin_tools are also converted to tools at request time (see line 283 in _completions_create). If Groq doesn't support native structured output with ANY tools (not just function tools), this check should also include builtin_tools:

if model_request_parameters.function_tools or model_request_parameters.builtin_tools:

If builtin_tools ARE compatible with native output on Groq, the current code is correct. However, this should be verified and potentially documented in a comment explaining why builtin_tools are excluded from this check.

dsfaccini and others added 2 commits February 16, 2026 19:43
Add resolve_auto_output_mode() to atomically set output_mode and
allow_text_output, preventing the bug where auto->tool left
allow_text_output=True. Also handle tool_use_failed text errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 0 new potential issues.

View 8 additional findings in Devin Review.

Open in Devin Review

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 0 new potential issues.

View 9 additional findings in Devin Review.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-review awaiting author revision bug Report that something isn't working, or PR implementing a fix groq size: M Medium PR (101-500 weighted lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants