feat: Return ParsedChatCompletion[T] for structured output across all providers#812
feat: Return ParsedChatCompletion[T] for structured output across all providers#812bilelomrani1 wants to merge 8 commits intomozilla-ai:mainfrom
Conversation
Introduce ParsedChatCompletionMessage[T], ParsedChoice[T], and ParsedChatCompletion[T] as generic subclasses mirroring the OpenAI SDK's parsed completion types. These add a typed `parsed` field to carry structured output from response_format. Refs mozilla-ai#806
Overload acompletion() to return ParsedChatCompletion[T] when response_format is a Pydantic model class. After _acompletion returns, convert the result using model_validate(from_attributes=True) and populate parsed from message.content via model_validate_json. Mirrors OpenAI SDK's parse_chat_completion logic: - Raises LengthFinishReasonError on finish_reason="length" - Raises ContentFilterError on finish_reason="content_filter" - Skips parsing on refusals (parsed=None) Works for all providers, not just those with native .parse() support. Fixes mozilla-ai#806
Cover happy path, refusal, length truncation, content filter, dict-from-extras normalization, and no-parsing cases. Refs mozilla-ai#806
Verify that response_format returns a ParsedChatCompletion with a typed parsed field, not just JSON in message.content. Refs mozilla-ai#806
|
Hi! A few comments from an Opus 4.6 guided review: completion() at src/any_llm/any_llm.py:368 wraps acompletion() but its return type is still ChatCompletion | Iterator[ChatCompletionChunk]. When a user passes response_format=MyModel, the Consider adding matching @overload signatures to completion(), or at minimum updating its return type to include ParsedChatCompletion[Any] |
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
Description
When
response_formatis set to a Pydantic model,acompletionnow returns aParsedChatCompletion[T]with a typedmessage.parsedfield. This works uniformly across all providers that supportresponse_format, not just OpenAI.Motivation
Previously, when using
response_formatwith a Pydantic model, providers returned a plainChatCompletionand users had to manually deserializemessage.contentfrom JSON. The OpenAI SDK's.parse()method handles this automatically, but that behavior was provider-specific and lost during any-llm's response conversion.Before this PR, there was an inconsistency in
response_formatbehavior across providers: the OpenAI provider usedclient.chat.completions.parse(), which returned parsed Pydantic objects and performed client-side validation, but any-llm's response conversion pipeline discarded the richerParsedChatCompletiontype, reducing it back to a plainChatCompletion. Meanwhile, all other providers returned raw JSON strings inmessage.contentwith no parsing or validation. This PR bridges that gap by introducingParsedChatCompletion[T]as a generic subclass ofChatCompletion(mirroring the OpenAI SDK's type hierarchy) and populatingmessage.parsedat theAnyLLMlayer so all providers behave consistently.Design decisions
Unified parsing at the
AnyLLMlayer, not in providers. All providers (OpenAI, Mistral, etc.) return a plainChatCompletionfrom_acompletion. The structured output parsing (convertingmessage.contentJSON into a Pydantic model and attaching it tomessage.parsed) happens once inacompletion(). This is consistent with how Mistral already worked (usingclient.chat.complete_async(), not a higher-level parse method). The OpenAI provider was switched fromclient.chat.completions.parse()toclient.chat.completions.create()to match this pattern.No exceptions on truncation/content filter. When
finish_reasonis"length"or"content_filter",message.parsedis set toNoneinstead of raising an exception. The raw (possibly truncated) content remains available inmessage.content, and users can inspectchoice.finish_reasonto detect these cases. I considered raising exceptions to align with the OpenAI SDK's.parse()behavior, but this would add additional breaking changes: existing code usingacompletionwithresponse_formatdoes not expect exceptions on these finish reasons. These exceptions could be added in a future major version. Note that this decision reduces breaking changes but does not fully eliminate them; see the Breaking changes section below.Breaking changes
When
response_formatis a Pydantic model,acompletionnow validates the response content against the schema viamodel_validate_json. If the LLM returns JSON that doesn't conform to the model, apydantic.ValidationErroris raised where previously the raw string would sit inmessage.contentwithout validation.In practice, the impact depends on the provider:
client.chat.completions.parse(), which performed Pydantic validation client-side (on top of server-side constrained decoding). Switching toclient.chat.completions.create()with any-llm's ownmodel_validate_jsoncall preserves the same validation behavior, so this path should not surface new errors.message.contentwill now raisepydantic.ValidationError. Let me know if this is an acceptable change, it is breaking, but better aligns with OpenAI semantics.The return type change itself (
ParsedChatCompletioninstead ofChatCompletion) is non-breaking sinceParsedChatCompletionis a subclass ofChatCompletion.PR Type
Relevant issues
Fixes #806
Checklist
AI Usage Information