Skip to content

Comments

feat: Return ParsedChatCompletion[T] for structured output across all providers#812

Open
bilelomrani1 wants to merge 8 commits intomozilla-ai:mainfrom
bilelomrani1:feat/parsed-chat-completion
Open

feat: Return ParsedChatCompletion[T] for structured output across all providers#812
bilelomrani1 wants to merge 8 commits intomozilla-ai:mainfrom
bilelomrani1:feat/parsed-chat-completion

Conversation

@bilelomrani1
Copy link
Contributor

@bilelomrani1 bilelomrani1 commented Feb 14, 2026

Description

When response_format is set to a Pydantic model, acompletion now returns a ParsedChatCompletion[T] with a typed message.parsed field. This works uniformly across all providers that support response_format, not just OpenAI.

Motivation

Previously, when using response_format with a Pydantic model, providers returned a plain ChatCompletion and users had to manually deserialize message.content from JSON. The OpenAI SDK's .parse() method handles this automatically, but that behavior was provider-specific and lost during any-llm's response conversion.

Before this PR, there was an inconsistency in response_format behavior across providers: the OpenAI provider used client.chat.completions.parse(), which returned parsed Pydantic objects and performed client-side validation, but any-llm's response conversion pipeline discarded the richer ParsedChatCompletion type, reducing it back to a plain ChatCompletion. Meanwhile, all other providers returned raw JSON strings in message.content with no parsing or validation. This PR bridges that gap by introducing ParsedChatCompletion[T] as a generic subclass of ChatCompletion (mirroring the OpenAI SDK's type hierarchy) and populating message.parsed at the AnyLLM layer so all providers behave consistently.

Design decisions

Unified parsing at the AnyLLM layer, not in providers. All providers (OpenAI, Mistral, etc.) return a plain ChatCompletion from _acompletion. The structured output parsing (converting message.content JSON into a Pydantic model and attaching it to message.parsed) happens once in acompletion(). This is consistent with how Mistral already worked (using client.chat.complete_async(), not a higher-level parse method). The OpenAI provider was switched from client.chat.completions.parse() to client.chat.completions.create() to match this pattern.

No exceptions on truncation/content filter. When finish_reason is "length" or "content_filter", message.parsed is set to None instead of raising an exception. The raw (possibly truncated) content remains available in message.content, and users can inspect choice.finish_reason to detect these cases. I considered raising exceptions to align with the OpenAI SDK's .parse() behavior, but this would add additional breaking changes: existing code using acompletion with response_format does not expect exceptions on these finish reasons. These exceptions could be added in a future major version. Note that this decision reduces breaking changes but does not fully eliminate them; see the Breaking changes section below.

Breaking changes

When response_format is a Pydantic model, acompletion now validates the response content against the schema via model_validate_json. If the LLM returns JSON that doesn't conform to the model, a pydantic.ValidationError is raised where previously the raw string would sit in message.content without validation.

In practice, the impact depends on the provider:

  • OpenAI: The previous implementation already used client.chat.completions.parse(), which performed Pydantic validation client-side (on top of server-side constrained decoding). Switching to client.chat.completions.create() with any-llm's own model_validate_json call preserves the same validation behavior, so this path should not surface new errors.
  • Other providers: These providers may not always enforce the schema strictly. Responses that previously returned silently with non-conforming JSON in message.content will now raise pydantic.ValidationError. Let me know if this is an acceptable change, it is breaking, but better aligns with OpenAI semantics.

The return type change itself (ParsedChatCompletion instead of ChatCompletion) is non-breaking since ParsedChatCompletion is a subclass of ChatCompletion.

PR Type

  • 🆕 New Feature

Relevant issues

Fixes #806

Checklist

  • I understand the code I am submitting.
  • I have added unit tests that prove my fix/feature works
  • I have run this code locally and verified it fixes the issue.
  • New and existing tests pass locally
  • Documentation was updated where necessary
  • I have read and followed the contribution guidelines
  • AI Usage:
    • No AI was used.
    • AI was used for drafting/refactoring.
    • This is fully AI-generated.

AI Usage Information

  • AI Model used: Claude Opus 4.6
  • AI Developer Tool used: Claude Code

bilelomrani1 and others added 6 commits February 14, 2026 17:39
Introduce ParsedChatCompletionMessage[T], ParsedChoice[T], and
ParsedChatCompletion[T] as generic subclasses mirroring the OpenAI SDK's
parsed completion types. These add a typed `parsed` field to carry
structured output from response_format.

Refs mozilla-ai#806
Overload acompletion() to return ParsedChatCompletion[T] when
response_format is a Pydantic model class. After _acompletion returns,
convert the result using model_validate(from_attributes=True) and
populate parsed from message.content via model_validate_json.

Mirrors OpenAI SDK's parse_chat_completion logic:
- Raises LengthFinishReasonError on finish_reason="length"
- Raises ContentFilterError on finish_reason="content_filter"
- Skips parsing on refusals (parsed=None)

Works for all providers, not just those with native .parse() support.

Fixes mozilla-ai#806
Cover happy path, refusal, length truncation, content filter,
dict-from-extras normalization, and no-parsing cases.

Refs mozilla-ai#806
Verify that response_format returns a ParsedChatCompletion with a typed
parsed field, not just JSON in message.content.

Refs mozilla-ai#806
@njbrake
Copy link
Contributor

njbrake commented Feb 19, 2026

Hi! A few comments from an Opus 4.6 guided review:

completion() at src/any_llm/any_llm.py:368 wraps acompletion() but its return type is still ChatCompletion | Iterator[ChatCompletionChunk]. When a user passes response_format=MyModel, the
synchronous path will return a ParsedChatCompletion at runtime but the type signature won't reflect that. The isinstance(response, ChatCompletion) check at line 378 will work correctly at runtime
(since ParsedChatCompletion is a subclass), but callers using static type checking won't get ParsedChatCompletion[T] narrowing.

Consider adding matching @overload signatures to completion(), or at minimum updating its return type to include ParsedChatCompletion[Any]

@njbrake njbrake self-requested a review February 19, 2026 14:12
@codecov
Copy link

codecov bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 84.37500% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/any_llm/providers/openai/base.py 25.00% 2 Missing and 1 partial ⚠️
src/any_llm/any_llm.py 88.23% 1 Missing and 1 partial ⚠️
Files with missing lines Coverage Δ
src/any_llm/__init__.py 81.81% <100.00%> (+1.81%) ⬆️
src/any_llm/types/completion.py 97.11% <100.00%> (+0.20%) ⬆️
src/any_llm/utils/exception_handler.py 69.01% <100.00%> (-3.45%) ⬇️
src/any_llm/any_llm.py 70.76% <88.23%> (-0.30%) ⬇️
src/any_llm/providers/openai/base.py 54.93% <25.00%> (-0.82%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ParsedChatCompletion type information lost during response conversion

2 participants