Improve support for structured output in XML using Pydantic XML #30565

adocherty · 2025-03-31T06:57:38Z

adocherty
Mar 31, 2025

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

What I propose is to write a new output parser PydanticXMLOutputParser that is similar to the PydanticOutputParser but for XML rather than JSON.

The PydanticXMLOutputParser will:

Take a PydanticXML defined XML schema as an argument (in the same way as the PydanticOutputParser)
Generate prompt instructions from this parser using the explicit structure represented in PydanticXML

Advantages:

This will greatly improve the ability to obtain XML output that conforms to a schema
It will unite JSON and XML structured outputs through Pydantic

Performance improvement
Below is a comparison of the proposed approach 2 (PydanticXML) to the current approach 1 (Output Parsers).

See this blog for more information: https://medium.com/@docherty/mastering-structured-output-in-llms-3-langchain-and-xml-8bad9e1f43ef

Motivation

Currently XML structured output support is a second class citizen compared to JSON support.

The current XMLOutputParser does not allow the specify the XML schema explicitly, unlike the PydanticOutputParser or JSONOutputParser. This means that getting XML outputs that match a schema of any complexity is not possible, and means getting structured output is limited to JSON.

However, JSON structured outputs are not always the best:

Anthropic models work well with XML https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags
Aider has shown that LLMs are not good at returning code in JSON (https://aider.chat/2024/08/14/code-in-json.html)
Constraining the LLM to JSON (or other constraints) can impact the reasoning abilities of models and stricter format constraints give worse performance. https://arxiv.org/abs/2408.02442

Proposal (If applicable)

Example an example definition of a schema using Pydantic XML looks like the following, similar to the Pydantic structure for defining JSON structured outputs

from pydantic_xml import BaseXmlModel, element

class ArticleXML(BaseXmlModel, tag="article"):
    title: str = element(description="Title of the article")
    problem: str = element(
        default="Summary of the writer's question, write concisely"
    )
    answer: str = element(default="Answer the writer's question")

See this blog post for much more details of the proposed PydanticXML-based XML structured output feature.

https://medium.com/@docherty/mastering-structured-output-in-llms-3-langchain-and-xml-8bad9e1f43ef

I am happy to implement this feature myself, and work with the community to keep it maintained. Please let me know any thoughts/feedback!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve support for structured output in XML using Pydantic XML #30565

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Improve support for structured output in XML using Pydantic XML #30565

Uh oh!

adocherty Mar 31, 2025

Checked

Feature request

Motivation

Proposal (If applicable)

Replies: 0 comments

adocherty
Mar 31, 2025