Skip to content

Python: [Bug]: Support Standard oneOf + discriminator Polymorphism in JSON Schema → Pydantic Model Builder #3153

@sakhan-rio

Description

@sakhan-rio

Description

Component
_build_pydantic_model_from_json_schema()

Summary:
The current implementation of _build_pydantic_model_from_json_schema() does not support oneOf with discriminator, which is a standard OpenAPI / JSON Schema pattern for polymorphic objects.

When schemas using this pattern are provided, the function silently degrades the schema by resolving polymorphic structures as loosely typed (str / dict). This does not raise errors, but it loses schema intent and type safety.

Why This Matters
oneOf + discriminator is the canonical OpenAPI mechanism for modeling polymorphic request payloads, such as action-based APIs and workflow engines.
These schemas are:

  1. Valid JSON Schema
  2. Commonly generated by OpenAPI tools
  3. Expected to round-trip into typed models

Example Input Schema:

    "$defs": {
        "CreateProject": {
            "description": "Action: Create an Azure DevOps project.",
            "properties": {
                "name": {
                    "const": "create_project",
                    "default": "create_project",
                    "type": "string"
                },
                "params": {
                    "$ref": "#/$defs/CreateProjectParams"
                }
            },
            "required": [
                "params"
            ],
            "type": "object"
        },
        "CreateProjectParams": {
            "description": "Parameters for the create_project action.",
            "properties": {
                "projectName": {
                    "minLength": 1,
                    "type": "string"
                },
                "description": {
                    "default": "",
                    "type": "string"
                },
                "template": {
                    "default": "Agile",
                    "type": "string"
                },
                "sourceControl": {
                    "default": "Git",
                    "enum": [
                        "Git",
                        "Tfvc"
                    ],
                    "type": "string"
                },
                "visibility": {
                    "default": "private",
                    "type": "string"
                }
            },
            "required": [
                "orgUrl",
                "projectName"
            ],
            "type": "object"
        },
        "DeployRequest": {
            "description": "Request to deploy Azure DevOps resources.",
            "properties": {
                "projectName": {
                    "minLength": 1,
                    "type": "string"
                },
                "organization": {
                    "minLength": 1,
                    "type": "string"
                },
                "actions": {
                    "items": {
                        "discriminator": {
                            "mapping": {
                                "create_project": "#/$defs/CreateProject",
                                "hello_world": "#/$defs/HelloWorld"
                            },
                            "propertyName": "name"
                        },
                        "oneOf": [
                            {
                                "$ref": "#/$defs/HelloWorld"
                            },
                            {
                                "$ref": "#/$defs/CreateProject"
                            }
                        ]
                    },
                    "type": "array"
                }
            },
            "required": [
                "projectName",
                "organization"
            ],
            "type": "object"
        },
        "HelloWorld": {
            "description": "Action: Prints a greeting message.",
            "properties": {
                "name": {
                    "const": "hello_world",
                    "default": "hello_world",
                    "type": "string"
                },
                "params": {
                    "$ref": "#/$defs/HelloWorldParams"
                }
            },
            "required": [
                "params"
            ],
            "type": "object"
        },
        "HelloWorldParams": {
            "description": "Parameters for the hello_world action.",
            "properties": {
                "name": {
                    "description": "Name to greet",
                    "minLength": 1,
                    "type": "string"
                }
            },
            "required": [
                "name"
            ],
            "type": "object"
        },
    "properties": {
        "params": {
            "$ref": "#/$defs/DeployRequest"
        }
    },
    "required": [
        "params"
    ],
    "type": "object"
}
}

With the schema above, the current func implementation:

  • Ignores oneOf
  • Ignores discriminator
  • Ignores const
  • Resolves actions as a loosely typed collection

Resulting model shape (effectively):

class DeployRequest(BaseModel):
    projectName: str
    organization: str
    actions: list[str] | list[dict]

While intended is:

from typing import Annotated, Union, Literal
from pydantic import BaseModel, Field

class HelloWorld(BaseModel):
    name: Literal["hello_world"]
    params: HelloWorldParams

class CreateProject(BaseModel):
    name: Literal["create_project"]
    params: CreateProjectParams

Action = Annotated[
    Union[HelloWorld, CreateProject],
    Field(discriminator="name")
]

class DeployRequest(BaseModel):
    projectName: str
    organization: str
    actions: list[Action]

I fixed it locally and was able to generate the correct schema for the AI function. Sharing code sample with bug fix below for reference. Let me know can also publish same on a branch.

Code Sample

def _build_pydantic_model_from_json_schema(
    model_name: str,
    schema: Mapping[str, Any],
) -> type[BaseModel]:
    """Creates a Pydantic model from JSON Schema with support for $refs, nested objects, and typed arrays.

    Args:
        model_name: The name of the model to be created.
        schema: The JSON Schema definition (should contain 'properties', 'required', '$defs', etc.).

    Returns:
        The dynamically created Pydantic model class.
    """
    properties = schema.get("properties")
    required = schema.get("required", [])
    definitions = schema.get("$defs", {})

    # Check if 'properties' is missing or not a dictionary
    if not properties:
        return create_model(f"{model_name}_input")

    ## Bug fix
    def _resolve_literal_type(prop_details: dict[str, Any]) -> type | None:
        # const → Literal["value"]
        if "const" in prop_details:
            return Literal[prop_details["const"]]  # type: ignore

        # enum → Literal["a", "b", ...]
        if "enum" in prop_details and isinstance(prop_details["enum"], list):
            enum_values = prop_details["enum"]
            if enum_values:
                return Literal[tuple(enum_values)]  # type: ignore

        return None

    def _resolve_type(prop_details: dict[str, Any], parent_name: str = "") -> type:
        """Resolve JSON Schema type to Python type, handling $ref, nested objects, and typed arrays.

        Args:
            prop_details: The JSON Schema property details
            parent_name: Name to use for creating nested models (for uniqueness)

        Returns:
            Python type annotation (could be int, str, list[str], or a nested Pydantic model)
        """
        # Handle oneOf + discriminator (polymorphic objects) ---> Bug fix
        if "oneOf" in prop_details and "discriminator" in prop_details:
            discriminator = prop_details["discriminator"]
            disc_field = discriminator.get("propertyName")

            variants = []
            for variant in prop_details["oneOf"]:
                if "$ref" in variant:
                    ref = variant["$ref"]
                    if ref.startswith("#/$defs/"):
                        def_name = ref.split("/")[-1]
                        resolved = definitions.get(def_name)
                        if resolved:
                            variant_model = _resolve_type(
                                resolved,
                                parent_name=f"{parent_name}_{def_name}"
                            )
                            variants.append(variant_model)

            if variants and disc_field:
                return Annotated[
                    Union[tuple(variants)],  # type: ignore
                    Field(discriminator=disc_field)
                ]

        # Handle $ref by resolving the reference
        if "$ref" in prop_details:
            ref = prop_details["$ref"]
            # Extract the reference path (e.g., "#/$defs/CustomerIdParam" -> "CustomerIdParam")
            if ref.startswith("#/$defs/"):
                def_name = ref.split("/")[-1]
                if def_name in definitions:
                    # Resolve the reference and use its type
                    resolved = definitions[def_name]
                    return _resolve_type(resolved, def_name)
            # If we can't resolve the ref, default to dict for safety
            return dict

        # Map JSON Schema types to Python types
        json_type = prop_details.get("type", "string")
        match json_type:
            case "integer":
                return int
            case "number":
                return float
            case "boolean":
                return bool
            case "array":
                # Handle typed arrays
                items_schema = prop_details.get("items")
                if items_schema and isinstance(items_schema, dict):
                    # Recursively resolve the item type
                    item_type = _resolve_type(items_schema, f"{parent_name}_item")
                    # Return list[ItemType] instead of bare list
                    return list[item_type]  # type: ignore
                # If no items schema or invalid, return bare list
                return list
            case "object":
                # Handle nested objects by creating a nested Pydantic model
                nested_properties = prop_details.get("properties")
                nested_required = prop_details.get("required", [])

                if nested_properties and isinstance(nested_properties, dict):
                    # Create the name for the nested model
                    nested_model_name = f"{parent_name}_nested" if parent_name else "NestedModel"

                    # Recursively build field definitions for the nested model
                    nested_field_definitions: dict[str, Any] = {}
                    for nested_prop_name, nested_prop_details in nested_properties.items():
                        nested_prop_details = (
                            json.loads(nested_prop_details)
                            if isinstance(nested_prop_details, str)
                            else nested_prop_details
                        )
                         
                        ### Bug fix
                        # nested_python_type = _resolve_type(
                        #     nested_prop_details, f"{nested_model_name}_{nested_prop_name}"
                        # )
                        literal_type = _resolve_literal_type(nested_prop_details)
                        if literal_type is not None:
                            nested_python_type = literal_type
                        else:
                            nested_python_type = _resolve_type(
                                nested_prop_details,
                                f"{nested_model_name}_{nested_prop_name}"
                            )

                        nested_description = nested_prop_details.get("description", "")

                        # Build field kwargs for nested property
                        nested_field_kwargs: dict[str, Any] = {}
                        if nested_description:
                            nested_field_kwargs["description"] = nested_description

                        # Create field definition
                        if nested_prop_name in nested_required:
                            nested_field_definitions[nested_prop_name] = (
                                (
                                    nested_python_type,
                                    Field(**nested_field_kwargs),
                                )
                                if nested_field_kwargs
                                else (nested_python_type, ...)
                            )
                        else:
                            nested_field_kwargs["default"] = nested_prop_details.get("default", None)
                            nested_field_definitions[nested_prop_name] = (
                                nested_python_type,
                                Field(**nested_field_kwargs),
                            )

                    # Create and return the nested Pydantic model
                    return create_model(nested_model_name, **nested_field_definitions)  # type: ignore

                # If no properties defined, return bare dict
                return dict
            case _:
                return str  # default

    field_definitions: dict[str, Any] = {}
    for prop_name, prop_details in properties.items():
        prop_details = json.loads(prop_details) if isinstance(prop_details, str) else prop_details

        # python_type = _resolve_type(prop_details, f"{model_name}_{prop_name}") ---> Bug fix
        literal_type = _resolve_literal_type(prop_details)
        if literal_type is not None:
            python_type = literal_type
        else:
            python_type = _resolve_type(prop_details, f"{model_name}_{prop_name}")

        description = prop_details.get("description", "")

        # Build field kwargs (description, etc.)
        field_kwargs: dict[str, Any] = {}
        if description:
            field_kwargs["description"] = description

        # Create field definition for create_model
        if prop_name in required:
            if field_kwargs:
                field_definitions[prop_name] = (python_type, Field(**field_kwargs))
            else:
                field_definitions[prop_name] = (python_type, ...)
        else:
            default_value = prop_details.get("default", None)
            field_kwargs["default"] = default_value
            if field_kwargs and any(k != "default" for k in field_kwargs):
                field_definitions[prop_name] = (python_type, Field(**field_kwargs))
            else:
                field_definitions[prop_name] = (python_type, default_value)

    return create_model(f"{model_name}_input", **field_definitions)

Error Messages / Stack Traces

Package Versions

agent-framework==1.0.0b260106

Python Version

No response

Additional Context

No response

Metadata

Metadata

Labels

bugSomething isn't workingpython

Type

Projects

Status

In Review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions