You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently XML structured output support is a second class citizen compared to JSON support.
The current XMLOutputParser does not allow the specify the XML schema explicitly, unlike the PydanticOutputParser or JSONOutputParser. This means that getting XML outputs that match a schema of any complexity is not possible, and means getting structured output is limited to JSON.
However, JSON structured outputs are not always the best:
Constraining the LLM to JSON (or other constraints) can impact the reasoning abilities of models and stricter format constraints give worse performance. https://arxiv.org/abs/2408.02442
Proposal (If applicable)
Example an example definition of a schema using Pydantic XML looks like the following, similar to the Pydantic structure for defining JSON structured outputs
from pydantic_xml import BaseXmlModel, element
class ArticleXML(BaseXmlModel, tag="article"):
title: str = element(description="Title of the article")
problem: str = element(
default="Summary of the writer's question, write concisely"
)
answer: str = element(default="Answer the writer's question")
See this blog post for much more details of the proposed PydanticXML-based XML structured output feature.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Checked
Feature request
What I propose is to write a new output parser PydanticXMLOutputParser that is similar to the PydanticOutputParser but for XML rather than JSON.
The PydanticXMLOutputParser will:
Advantages:
Performance improvement

Below is a comparison of the proposed approach 2 (PydanticXML) to the current approach 1 (Output Parsers).
See this blog for more information: https://medium.com/@docherty/mastering-structured-output-in-llms-3-langchain-and-xml-8bad9e1f43ef
Motivation
Currently XML structured output support is a second class citizen compared to JSON support.
The current XMLOutputParser does not allow the specify the XML schema explicitly, unlike the PydanticOutputParser or JSONOutputParser. This means that getting XML outputs that match a schema of any complexity is not possible, and means getting structured output is limited to JSON.
However, JSON structured outputs are not always the best:
Anthropic models work well with XML https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags
Aider has shown that LLMs are not good at returning code in JSON (https://aider.chat/2024/08/14/code-in-json.html)
Constraining the LLM to JSON (or other constraints) can impact the reasoning abilities of models and stricter format constraints give worse performance. https://arxiv.org/abs/2408.02442
Proposal (If applicable)
Example an example definition of a schema using Pydantic XML looks like the following, similar to the Pydantic structure for defining JSON structured outputs
See this blog post for much more details of the proposed PydanticXML-based XML structured output feature.
https://medium.com/@docherty/mastering-structured-output-in-llms-3-langchain-and-xml-8bad9e1f43ef
I am happy to implement this feature myself, and work with the community to keep it maintained. Please let me know any thoughts/feedback!
Beta Was this translation helpful? Give feedback.
All reactions