Skip to content

⚡️ Speed up method OpenAIJsonSchemaTransformer.transform by 18% #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: try-refinement
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Jul 22, 2025

📄 18% (0.18x) speedup for OpenAIJsonSchemaTransformer.transform in pydantic_ai_slim/pydantic_ai/profiles/openai.py

⏱️ Runtime : 71.5 microseconds 60.8 microseconds (best of 80 runs)

📝 Explanation and details

REFINEMENT Here is an optimized version of your program for both runtime and memory use, preserving all existing logic and return values and keeping function signatures unchanged. The changes focus on.

  • Avoiding repeated lookups and redundant work (e.g., schema.pop/get).
  • Reducing allocations and making iteration more efficient.
  • Using local variable references where it would avoid repeated globals and attribute lookups.
  • Minor logic rearrangement to avoid unnecessary work when possible.

Key optimizations:

  • Use local lookups for self.strict and schema.get.
  • Replace repeated pop with a single dict comprehension for incompatible keys.
  • Early-out in some logic branches.
  • Reduced unnecessary allocations and method calls.
  • More efficient construction of the notes_string in place.
  • Uses setdefault for 'properties' when enforcing strict required property listing (avoids unnecessary dict creation).
  • Removes keys in one tight loop and avoids checking for key existence multiple times.

All comments are preserved per your instructions. The code will return exactly the same values as before, but will run faster and allocate less.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 64 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 10 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import copy
from abc import ABC
# function to test
from dataclasses import dataclass
from typing import Any

# imports
import pytest  # used for our unit tests
from pydantic_ai.profiles.openai import OpenAIJsonSchemaTransformer

_STRICT_INCOMPATIBLE_KEYS = [
    'minLength',
    'maxLength',
    'pattern',
    'format',
    'minimum',
    'maximum',
    'multipleOf',
    'patternProperties',
    'unevaluatedProperties',
    'propertyNames',
    'minProperties',
    'maxProperties',
    'unevaluatedItems',
    'contains',
    'minContains',
    'maxContains',
    'minItems',
    'maxItems',
    'uniqueItems',
]

_sentinel = object()


JsonSchema = dict[str, Any]


@dataclass(init=False)
class JsonSchemaTransformer(ABC):
    """Walks a JSON schema, applying transformations to it at each level."""

    def __init__(
        self,
        schema: JsonSchema,
        *,
        strict: bool | None = None,
        prefer_inlined_defs: bool = False,
        simplify_nullable_unions: bool = False,
    ):
        self.schema = schema

        self.strict = strict
        self.is_strict_compatible = True  # Can be set to False by subclasses to set `strict` on `ToolDefinition` when set not set by user explicitly

        self.prefer_inlined_defs = prefer_inlined_defs
        self.simplify_nullable_unions = simplify_nullable_unions

        self.defs: dict[str, JsonSchema] = self.schema.get('$defs', {})
        self.refs_stack: list[str] = []
        self.recursive_refs = set[str]()
from pydantic_ai.profiles.openai import OpenAIJsonSchemaTransformer

# Alias for test code
transform = OpenAIJsonSchemaTransformer({}).transform

# ------------------- UNIT TESTS -------------------

# Basic Test Cases

def test_remove_title_and_schema():
    """Test that 'title' and '$schema' are removed from the schema."""
    schema = {'title': 'MyTitle', '$schema': 'http://json-schema.org', 'type': 'string'}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.83μs -> 1.50μs (22.2% faster)

def test_remove_discriminator():
    """Test that 'discriminator' is removed."""
    schema = {'type': 'object', 'discriminator': 'kind', 'properties': {}}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 2.08μs -> 1.75μs (19.0% faster)

def test_default_key_strict_removed():
    """Test that 'default' is removed in strict mode."""
    schema = {'type': 'string', 'default': 'abc'}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.54μs -> 1.29μs (19.3% faster)

def test_default_key_non_strict_kept():
    """Test that 'default' is kept in non-strict mode."""
    schema = {'type': 'string', 'default': 'abc'}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=None)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.38μs -> 1.08μs (27.0% faster)

def test_ref_root_and_anyof():
    """Test $ref at root and with siblings is handled correctly."""
    schema = {'$ref': 'root', 'description': 'desc'}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    transformer.root_ref = 'root'
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.88μs -> 1.58μs (18.4% faster)

def test_ref_not_root_and_anyof():
    """Test $ref not at root and with siblings."""
    schema = {'$ref': 'other', 'description': 'desc'}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    transformer.root_ref = 'root'
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.54μs -> 1.25μs (23.3% faster)

def test_ref_root_no_siblings():
    """Test $ref at root and no siblings."""
    schema = {'$ref': 'root'}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    transformer.root_ref = 'root'
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.33μs -> 1.04μs (28.0% faster)

def test_incompatible_keys_strict():
    """Test that all strict-incompatible keys are removed and added to description in strict mode."""
    schema = {
        'type': 'string',
        'minLength': 2,
        'maxLength': 10,
        'pattern': '[a-z]+',
        'description': 'desc'
    }
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 3.17μs -> 2.92μs (8.57% faster)

def test_incompatible_keys_strict_no_description():
    """Test that description is added if not present."""
    schema = {
        'type': 'number',
        'minimum': 1,
        'maximum': 2,
    }
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 2.25μs -> 2.08μs (8.02% faster)

def test_incompatible_keys_non_strict():
    """Test that incompatible keys are not removed in non-strict mode."""
    schema = {
        'type': 'string',
        'minLength': 2,
        'maxLength': 10,
    }
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=None)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.67μs -> 1.21μs (37.9% faster)

def test_oneof_to_anyof_strict():
    """Test that 'oneOf' is converted to 'anyOf' in strict mode."""
    schema = {'type': 'string', 'oneOf': [{'type': 'string'}, {'type': 'number'}]}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.38μs -> 1.17μs (17.9% faster)

def test_oneof_non_strict():
    """Test that 'oneOf' is not removed in non-strict mode."""
    schema = {'type': 'string', 'oneOf': [{'type': 'string'}, {'type': 'number'}]}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=None)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.17μs -> 1.00μs (16.7% faster)

def test_object_additional_properties_and_required():
    """Test that 'additionalProperties' is set to False and 'required' is set to all property keys in strict mode."""
    schema = {'type': 'object', 'properties': {'a': {'type': 'number'}, 'b': {'type': 'string'}}}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.62μs -> 1.38μs (18.2% faster)

def test_object_no_properties():
    """Test that empty properties/required is set when missing."""
    schema = {'type': 'object'}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 2.17μs -> 1.88μs (15.6% faster)

def test_object_non_strict_missing_fields():
    """Test that is_strict_compatible is False if object is missing strict-required fields in non-strict mode."""
    schema = {'type': 'object', 'properties': {'a': {'type': 'string'}}}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=None)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.25μs -> 1.04μs (20.1% faster)

def test_object_non_strict_all_fields_present():
    """Test that is_strict_compatible is True if all strict-required fields are present and correct in non-strict mode."""
    schema = {
        'type': 'object',
        'properties': {'a': {'type': 'string'}, 'b': {'type': 'number'}},
        'required': ['a', 'b'],
        'additionalProperties': False
    }
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=None)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.92μs -> 1.67μs (14.9% faster)

def test_object_non_strict_missing_required():
    """Test that is_strict_compatible is False if required does not include all properties."""
    schema = {
        'type': 'object',
        'properties': {'a': {'type': 'string'}, 'b': {'type': 'number'}},
        'required': ['a'],
        'additionalProperties': False
    }
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=None)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.71μs -> 1.38μs (24.2% faster)

# Edge Test Cases

def test_empty_schema():
    """Test that an empty schema is handled gracefully."""
    schema = {}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.12μs -> 917ns (22.7% faster)

def test_schema_with_all_incompatible_keys():
    """Test schema with all strict-incompatible keys."""
    schema = {k: 1 for k in _STRICT_INCOMPATIBLE_KEYS}
    schema['type'] = 'string'
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 5.62μs -> 4.83μs (16.4% faster)
    for k in _STRICT_INCOMPATIBLE_KEYS:
        pass
    # Description should list all keys
    for k in _STRICT_INCOMPATIBLE_KEYS:
        pass

def test_schema_with_unusual_types():
    """Test schema with an unknown type."""
    schema = {'type': 'funkyType', 'properties': {'a': {'type': 'string'}}}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.21μs -> 958ns (26.1% faster)

def test_properties_not_a_dict():
    """Test properties is not a dict (invalid schema)."""
    schema = {'type': 'object', 'properties': None}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output

def test_required_not_a_list():
    """Test required is not a list (invalid schema)."""
    schema = {'type': 'object', 'properties': {'a': {'type': 'string'}}, 'required': None}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=None)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.58μs -> 1.04μs (51.9% faster)

def test_properties_keys_not_strings():
    """Test properties with non-string keys (invalid but possible in Python dicts)."""
    schema = {'type': 'object', 'properties': {1: {'type': 'string'}, 2: {'type': 'number'}}}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.67μs -> 1.38μs (21.2% faster)

def test_description_preserved_when_no_incompatible():
    """Test that description is preserved when no incompatible keys."""
    schema = {'type': 'string', 'description': 'Keep me!'}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.21μs -> 958ns (26.1% faster)

def test_description_appended_when_incompatible():
    """Test that description is appended to when incompatible keys exist."""
    schema = {'type': 'string', 'description': 'Hello', 'minLength': 2}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 2.25μs -> 1.96μs (14.9% faster)

def test_schema_with_extra_fields():
    """Test that extra, unknown fields are preserved."""
    schema = {'type': 'string', 'x-extra': 123}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.17μs -> 916ns (27.3% faster)

# Large Scale Test Cases

def test_large_number_of_properties():
    """Test object with many properties."""
    n = 500
    props = {f'field{i}': {'type': 'string'} for i in range(n)}
    schema = {'type': 'object', 'properties': props}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 2.88μs -> 2.67μs (7.84% faster)

def test_large_incompatible_keys():
    """Test schema with many incompatible keys and a long description."""
    schema = {k: i for i, k in enumerate(_STRICT_INCOMPATIBLE_KEYS)}
    schema['type'] = 'string'
    schema['description'] = 'A' * 100
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 6.04μs -> 5.38μs (12.4% faster)
    for k in _STRICT_INCOMPATIBLE_KEYS:
        pass

def test_large_oneof():
    """Test schema with a large oneOf list."""
    n = 300
    schema = {'oneOf': [{'type': 'string', 'minLength': i} for i in range(n)], 'type': 'string'}
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 1.50μs -> 1.25μs (20.0% faster)

def test_large_ref_siblings():
    """Test schema with $ref and many siblings."""
    n = 200
    schema = {'$ref': 'root'}
    for i in range(n):
        schema[f'x{i}'] = i
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    transformer.root_ref = 'root'
    codeflash_output = transformer.transform(schema); out = codeflash_output # 2.12μs -> 1.71μs (24.4% faster)
    for i in range(n):
        pass

def test_large_object_with_incompatible_and_properties():
    """Test large object with many properties and incompatible keys."""
    n = 400
    props = {f'f{i}': {'type': 'string'} for i in range(n)}
    schema = {
        'type': 'object',
        'properties': props,
        'minProperties': 2,
        'maxProperties': 10,
        'description': 'desc'
    }
    transformer = OpenAIJsonSchemaTransformer(copy.deepcopy(schema), strict=True)
    codeflash_output = transformer.transform(schema); out = codeflash_output # 4.21μs -> 4.12μs (2.04% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from abc import ABC
from copy import deepcopy
# function to test
from dataclasses import dataclass
from typing import Any

# imports
import pytest  # used for our unit tests
from pydantic_ai.profiles.openai import OpenAIJsonSchemaTransformer

_STRICT_INCOMPATIBLE_KEYS = [
    'minLength',
    'maxLength',
    'pattern',
    'format',
    'minimum',
    'maximum',
    'multipleOf',
    'patternProperties',
    'unevaluatedProperties',
    'propertyNames',
    'minProperties',
    'maxProperties',
    'unevaluatedItems',
    'contains',
    'minContains',
    'maxContains',
    'minItems',
    'maxItems',
    'uniqueItems',
]

_sentinel = object()

JsonSchema = dict[str, Any]

@dataclass(init=False)
class JsonSchemaTransformer(ABC):
    """Walks a JSON schema, applying transformations to it at each level."""

    def __init__(
        self,
        schema: JsonSchema,
        *,
        strict: bool | None = None,
        prefer_inlined_defs: bool = False,
        simplify_nullable_unions: bool = False,
    ):
        self.schema = schema
        self.strict = strict
        self.is_strict_compatible = True
        self.prefer_inlined_defs = prefer_inlined_defs
        self.simplify_nullable_unions = simplify_nullable_unions
        self.defs: dict[str, JsonSchema] = self.schema.get('$defs', {})
        self.refs_stack: list[str] = []
        self.recursive_refs = set[str]()
from pydantic_ai.profiles.openai import OpenAIJsonSchemaTransformer


# Alias for clarity in tests
def transform(schema: dict, *, strict: bool | None = True) -> dict:
    return OpenAIJsonSchemaTransformer(deepcopy(schema), strict=strict).transform(schema)

# -------------------
# Unit Tests
# -------------------

# ========== BASIC TEST CASES ==========














def test_schema_with_oneOf_in_non_strict():
    # Should keep oneOf and mark as not strict-compatible
    schema = {
        "oneOf": [
            {"type": "string"},
            {"type": "integer"}
        ]
    }
    transformer = OpenAIJsonSchemaTransformer(deepcopy(schema), strict=None)
    transformer.transform(schema) # 1.62μs -> 1.42μs (14.7% faster)










from pydantic_ai.profiles.openai import OpenAIJsonSchemaTransformer

def test_OpenAIJsonSchemaTransformer_transform():
    OpenAIJsonSchemaTransformer.transform(OpenAIJsonSchemaTransformer({}, strict=True), {'maximum': ''})

def test_OpenAIJsonSchemaTransformer_transform_2():
    OpenAIJsonSchemaTransformer.transform(OpenAIJsonSchemaTransformer({}, strict=None), {'pattern': 0, 'default': 0, 'title': ''})

def test_OpenAIJsonSchemaTransformer_transform_3():
    OpenAIJsonSchemaTransformer.transform(OpenAIJsonSchemaTransformer({}, strict=True), {'oneOf': 0})

def test_OpenAIJsonSchemaTransformer_transform_4():
    OpenAIJsonSchemaTransformer.transform(OpenAIJsonSchemaTransformer({}, strict=True), {'default': ''})

def test_OpenAIJsonSchemaTransformer_transform_5():
    OpenAIJsonSchemaTransformer.transform(OpenAIJsonSchemaTransformer({}, strict=None), {'oneOf': ''})

To edit these changes git checkout codeflash/optimize-OpenAIJsonSchemaTransformer.transform-mdewgsc2 and push.

Codeflash

REFINEMENT Here is an optimized version of your program for both runtime and memory use, preserving all existing logic and return values and keeping function signatures unchanged. The changes focus on.

- Avoiding repeated lookups and redundant work (e.g., `schema.pop`/`get`).
- Reducing allocations and making iteration more efficient.
- Using local variable references where it would avoid repeated globals and attribute lookups.
- Minor logic rearrangement to avoid unnecessary work when possible.



**Key optimizations:**
- Use local lookups for `self.strict` and `schema.get`.
- Replace repeated pop with a single dict comprehension for incompatible keys.
- Early-out in some logic branches.
- Reduced unnecessary allocations and method calls.
- More efficient construction of the `notes_string` in place.
- Uses `setdefault` for `'properties'` when enforcing strict required property listing (avoids unnecessary dict creation).
- Removes keys in one tight loop and avoids checking for key existence multiple times.

All comments are preserved per your instructions. The code will return **exactly the same values** as before, but will run faster and allocate less.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 22, 2025
@codeflash-ai codeflash-ai bot requested a review from aseembits93 July 22, 2025 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants