Skip to content

Conversation

habema
Copy link
Contributor

@habema habema commented Sep 9, 2025

When creating custom function tools, Pydantic models with optional fields defaulting to None were creating JSON schemas with null values. This caused LLM providers to reject the schemas with errors like:

"tools.1.custom.input_schema: JSON schema is invalid. It must match JSON Schema draft 2020-12"

Root cause: The ensure_strict_json_schema function only removed None from the default field, but left other null values intact.

Fix: Added logic to remove all null values from schemas to comply with JSON Schema Draft 2020-12.

Test: Added test case verifying null values are properly stripped while preserving valid fields.

@seratch
Copy link
Member

seratch commented Sep 10, 2025

@codex review this

@seratch
Copy link
Member

seratch commented Sep 10, 2025

Overall, I feel this is a practical change. The only thing I'd like to confirm is that there is no edge case that could be affected by this.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

Comment on lines +110 to +118
# Remove all null values to comply with JSON Schema Draft 2020-12
# This prevents LLM providers from rejecting schemas with null values
keys_to_remove = []
for key, value in json_schema.items():
if value is None:
keys_to_remove.append(key)

for key in keys_to_remove:
json_schema.pop(key)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Preserve strictness when additionalProperties is None

The new loop removes every key whose value is None. For object schemas that arrive with "additionalProperties": None, the earlier branch that defaults objects to additionalProperties=False is skipped because the key is present. After this block runs, the key is popped entirely, so the returned schema has no additionalProperties entry and therefore permits extra properties by default. This regresses the purpose of ensure_strict_json_schema, which should always disallow additional fields. Consider treating None as “not provided” before the object-type check or coercing it to False rather than dropping the key.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@seratch
Copy link
Member

seratch commented Sep 12, 2025

I am not yet fully confident about having this change. It may break anything that are not mentioned here.

I asked ChatGPT and got some suggestions. I don't follow these blindly, but just wanted to share it first. We may want to continue discussing and doing some research on a viable solution.

Recommendations for ensure_strict_json_schema (null-removal)

Context (short)

The current implementation recursively walks a generated JSON Schema and removes keys whose value is None to avoid provider rejections (e.g., JSON schema is invalid. It must match JSON Schema draft 2020-12). This approach fixes many provider validation failures (especially when None appears as default or metadata), but a blanket removal of all key: null has some risks for legitimate schemas.

Recommendation summary

  1. Narrow the deletion rule: Do not remove every key: None. Instead remove None only for a conservative allowlist of metadata/annotation keys that are safe to drop. Suggested allowlist:

    title, description, examples, default, $comment
    
  2. Preserve validation keywords that can be legitimately null: Do not remove keywords such as const (e.g., {"const": null} is a valid constraint) or other validation keywords where null is meaningful.

  3. Filter None entries inside arrays: For composition keywords (anyOf, oneOf, allOf), filter out None entries and non-dict entries before recursing. This prevents TypeErrors and handles generators that leave None in lists.

  4. Offer a non-destructive mode: Provide an optional inplace: bool = True parameter. When inplace=False, operate on a deep copy so callers that expect the original schema preserved are not surprised.

  5. Add explicit tests for corner cases: Include unit tests for const: None, enum containing null, anyOf: [None, {...}], default: None, and $ref + sibling-key combinations.

  6. Optional: logging for removals: Consider emitting debug logs that list removed keys (only at debug level) to help detect surprising removals during development.

Why this approach

  • Fixes the immediate problem: Most Pydantic-generated schemas that cause provider rejections have None only in annotation fields like default and description. Removing those keys resolves provider validation errors while preserving validation semantics.
  • Avoids breaking legitimate schema semantics: Some JSON Schema keywords can validly be null. A careful allowlist preserves intended validation behavior (e.g., const: null).

Minimal safe code sketch

Below is an illustrative patch replacing blanket removal with an allowlist-based removal and filtering None-entries from composition arrays. (Adapt variable names to your codebase.)

_DROP_NONE_KEY_ALLOWLIST = {"title", "description", "examples", "default", "$comment"}

def _ensure_strict_json_schema(json_schema: dict, *, inplace: bool = True, path: str = "") -> dict:
    if not inplace:
        json_schema = copy.deepcopy(json_schema)

    # Recursively clean composition lists first (anyOf/oneOf/allOf)
    for comp in ("anyOf", "oneOf", "allOf"):
        if comp in json_schema and isinstance(json_schema[comp], list):
            filtered = [v for v in json_schema[comp] if v is not None]
            json_schema[comp] = [
                _ensure_strict_json_schema(v, inplace=True, path=f"{path}/{comp}[{i}]")
                for i, v in enumerate(filtered)
            ]

    # Recurse into common structural fields
    if "properties" in json_schema and isinstance(json_schema["properties"], dict):
        for k, v in list(json_schema["properties"].items()):
            json_schema["properties"][k] = _ensure_strict_json_schema(v, inplace=True, path=f"{path}/properties/{k}")

    # ... handle items, $defs, patternProperties, etc. similarly ...

    # Remove only allowlisted keys with None values
    keys_to_remove = [k for k, v in json_schema.items() if v is None and k in _DROP_NONE_KEY_ALLOWLIST]
    for k in keys_to_remove:
        json_schema.pop(k, None)

    return json_schema

Notes:

  • Keep const and enum untouched. If you need to normalize enum entries, do that explicitly (e.g., remove None from enum only if you consciously want to disallow null as an enum member).
  • Ensure $ref handling respects spec semantics; avoid naive $ref inlining unless you're certain about the ramifications.

Tests & rollout

  • Add unit tests covering:

    • {"const": None} remains and validates null instances.
    • {"default": None} is safely removed (if that is the chosen behavior) and documented.
    • {"anyOf": [None, {"type": "string"}]} does not raise and results in a valid anyOf list.
    • $ref + sibling keywords cases to ensure no accidental semantic changes.
  • Add a CI smoke-test that validates cleaned schemas against a JSON Schema Draft 2020-12 validator (e.g., ajv or a tuned python-jsonschema) to catch regressions.

Bottom line

  • For Pydantic-generated schemas (the reported failure mode), narrowing removals to metadata keys and filtering None in lists is low-risk and will reliably fix provider rejections.
  • For arbitrary or hand-crafted schemas, blanket deletion of key: null is unsafe: it can silently change validation behavior (notably const: null). The allowlist + tests approach gives a safer balance between compatibility with providers and preserving legitimate schema semantics.

@seratch seratch marked this pull request as draft September 12, 2025 05:56
@github-actions
Copy link
Contributor

This PR is stale because it has been open for 10 days with no activity.

@github-actions github-actions bot added the stale label Sep 23, 2025
@github-actions
Copy link
Contributor

This PR was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants