Skip to content

Conversation

@conradlee
Copy link

@conradlee conradlee commented Nov 6, 2025

Summary

Updates GoogleJsonSchemaTransformer to support enhanced JSON Schema features announced by Google in November 2025 for Gemini 2.5+ models.

Transformer Changes (Before → After)

Before: 90+ lines with extensive workarounds
After: ~47 lines with minimal transformations

Removed Workarounds (Now Natively Supported)

  • ❌ Enum-to-string conversion → ✅ Native typed enums (integer, string, etc.)
  • additionalProperties warning/removal → ✅ Native dict support
  • title field removal → ✅ Preserved
  • oneOfanyOf conversion → ✅ Both work natively
  • $ref recursion errors → ✅ Native $ref/$defs support
  • prefixItemsitems conversion → ✅ Native tuple support
  • prefer_inlined_defs=True → ✅ Native $defs with references
  • simplify_nullable_unions=True → ✅ Native type: 'null'

Still Transformed (Not Yet Supported)

  • $schema, const, discriminator, examples → Removed
  • format (date, time, etc.) → Moved to description field
  • exclusiveMinimum/exclusiveMaximum → Removed

New Capabilities

  • Recursive schemas: Tree structures, linked lists, nested graphs
  • Discriminated unions: Type-safe union types with discriminator fields
  • Typed enums: Integer/string enums without conversion
  • Complex dictionaries: dict[str, ComplexType] with schema validation
  • Optional fields: Native type: 'null' support

Tests

Added 6 comprehensive tests in test_google.py:

  • Discriminated unions with oneOf
  • Recursive schemas with $ref/$defs
  • Dicts with additionalProperties
  • Optional/nullable fields
  • Integer enums

Updated 7 snapshot tests in test_gemini.py to reflect new native behavior.

Migration Impact

Fully backwards-compatible - existing code continues to work, schemas are now more expressive.


🤖 Generated with Claude Code

Related: Google Announcement - Gemini API Structured Outputs

Fixes #3364

conradlee and others added 5 commits November 6, 2025 16:50
Google announced in November 2025 that Gemini 2.5+ models now support
enhanced JSON Schema features including title, $ref/$defs, anyOf/oneOf,
minimum/maximum, additionalProperties, prefixItems, and property ordering.
This removes workarounds in GoogleJsonSchemaTransformer and allows native
$ref and oneOf support instead of forced inlining and conversion.

Key findings from empirical testing:
- Native $ref/$defs support confirmed (no inlining needed)
- Both anyOf and oneOf work natively (no conversion needed)
- exclusiveMinimum/exclusiveMaximum NOT yet supported by Google SDK

Changes:
- Set prefer_inlined_defs=False to use native $ref/$defs instead of inlining
- Remove oneOf→anyOf conversion (both work natively now)
- Remove adapter code that stripped title, additionalProperties, and prefixItems
- Keep stripping exclusiveMinimum/exclusiveMaximum (not yet supported)
- Remove code that raised errors for $ref schemas
- Update GoogleJsonSchemaTransformer docstring to document all supported features
- Update test_json_def_recursive to verify recursive schemas work with $ref
- Add comprehensive test suite for new JSON Schema capabilities
- Add documentation section highlighting enhanced JSON Schema support with examples

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Updated GoogleJsonSchemaTransformer docstring to note that discriminator
  is not supported (causes validation errors with nested oneOf)
- Added reference to Google's announcement blog post
- Added test_google_discriminator.py to document the limitation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Changed test to verify discriminator stripping without API calls
- Added proper type hints for pyright compliance
- Test now validates transformation behavior directly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Critical fixes:
- Rewrote test_google_json_schema_features.py to test schema transformation only
  (not API calls) since enhanced features require Vertex AI which CI doesn't have
- Added prominent warning in docs that enhanced features are Vertex AI only
- Updated doc examples to use google-vertex: prefix
- Fixed test_google_discriminator.py schema path issue
- All tests now pass locally

Key discovery: additionalProperties, $ref, and other enhanced features
are NOT supported in the Generative Language API (google-gla:), only
in Vertex AI (google-vertex:). This is validated by the Google SDK.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
CRITICAL FIX: The same GoogleJsonSchemaTransformer was being used for both
Vertex AI and GLA, but they have different JSON Schema support levels.

Changes:
- Created GoogleVertexJsonSchemaTransformer (enhanced features supported)
  * Supports: $ref, $defs, additionalProperties, title, prefixItems, etc.
  * Uses prefer_inlined_defs=False for native $ref support

- Created GoogleGLAJsonSchemaTransformer (limited features)
  * Strips: additionalProperties, title, prefixItems
  * Uses prefer_inlined_defs=True to inline all $refs
  * More conservative transformations for GLA compatibility

- Updated GoogleGLAProvider to use google_gla_model_profile
- Updated GoogleVertexProvider to use google_vertex_model_profile
- GoogleJsonSchemaTransformer now aliases to Vertex version (backward compat)
- Updated all tests to use GoogleVertexJsonSchemaTransformer

This ensures GLA won't receive unsupported schema features that cause
validation errors like "additionalProperties is not supported in the Gemini API"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
## Enhanced JSON Schema Support

!!! note "Vertex AI Only"
The enhanced JSON Schema features listed below are **only available when using Vertex AI** (`google-vertex:` prefix or `GoogleProvider(vertexai=True)`). They are **not supported** in the Generative Language API (`google-gla:` prefix).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that https://ai.google.dev/gemini-api/docs/structured-output?example=feedback#model_support says we have to use response_json_schema instead of the response_schema key we currently set:

response_schema=response_schema,

response_schema=generation_config.get('response_schema'),

When we do that, maybe it will work for GLA and Vertex?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I've updated the PR. Tests show there's not a difference (with one possible exception -- see the comment below)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of testing the schema transformer itself, we should add a test to test_google.py that uses a BaseModel like this as NativeOutput and then verifies that the request succeeds.

@DouweM
Copy link
Collaborator

DouweM commented Nov 7, 2025

@conradlee Thanks for working on this Conrad!

conradlee and others added 4 commits November 12, 2025 14:05
Key changes based on review feedback:
1. Switch from response_schema to response_json_schema
   - This bypasses Google SDK validation that rejected enhanced features for GLA
   - Enhanced features now work for BOTH GLA and Vertex AI!

2. Remove separate GLA/Vertex transformers
   - No longer needed since response_json_schema works everywhere
   - Reverted to single GoogleJsonSchemaTransformer
   - Removed prefer_inlined_defs and simplify_nullable_unions parameters

3. Simplify transformer implementation
   - Removed unnecessary comments and complexity
   - Removed Enhanced JSON Schema Support docs section (users don't need to know internal details)

4. Remove schema transformation tests
   - Deleted test_google_json_schema_features.py
   - Deleted test_google_discriminator.py
   - Removed test_gemini.py::test_json_def_recursive
   - These tested implementation details, not actual functionality
   - Existing test_google_model_structured_output provides adequate coverage

The root cause was using response_schema (old API) instead of response_json_schema (new API).
response_json_schema bypasses the restrictive validation and supports all enhanced features
for both GLA and Vertex AI.

Addresses review by @DouweM in PR pydantic#3357

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The November 2025 announcement explicitly states that Google now supports
'type: null' in JSON schemas, so we don't need to convert anyOf with null
to the OpenAPI 3.0 'nullable: true' format.

Keep __init__ method for documentation purposes to explicitly note why
we're using the defaults (native support for $ref and type: null).

Addresses reviewer question: "Do we still need simplify_nullable_unions?
type: 'null' is now supported natively"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
conradlee and others added 2 commits November 12, 2025 15:07
- Remove enum-to-string conversion workaround (no longer needed)
- Add 6 comprehensive tests for enhanced features:
  * Discriminated unions (oneOf with $ref)
  * Recursive schemas ($ref and $defs)
  * Dicts with additionalProperties
  * Optional/nullable fields (type: 'null')
  * Integer enums (native support)
  * Recursive schema with gemini-2.5-flash (FAILING)

All tests use google_provider with GLA API and recorded cassettes.
Tests use gemini-2.5-flash except recursive schema which uses gemini-2.0-flash.

NOTE: test_google_recursive_schema_native_output_gemini_2_5 consistently
fails with 500 Internal Server Error. This needs investigation before merge.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The test_google_recursive_schema_native_output_gemini_2_5 test now uses
vertex_provider and PASSES successfully.

NOTE: During development, this test consistently failed with a 500 error
when using google_provider (GLA with GEMINI_API_KEY). However, it passes
with vertex_provider (Vertex AI). This may be:
- A temporary GLA API issue
- A limitation specific to certain API keys
- An issue with the GLA endpoint for recursive schemas

Maintainers should verify this works with their GLA setup before merge.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@conradlee
Copy link
Author

conradlee commented Nov 12, 2025

⚠️ Recursive Schemas with gemini-2.5-flash: Testing Note

Test: test_google_recursive_schema_native_output_gemini_2_5
Status:PASSING (with Vertex AI provider)

Issue Found During Development

During development of this PR, testing recursive schemas with gemini-2.5-flash revealed different behavior between Google's two APIs:

  • Vertex AI (vertex_provider):WORKS - Recursive schemas with $ref and $defs work correctly
  • GLA (google_provider with GEMINI_API_KEY):FAILED - Consistently returned 500 Internal Server Error
google.genai.errors.ServerError: 500 INTERNAL.
{'error': {
    'code': 500,
    'message': 'An internal error has occurred. Please retry or report...'
}}

Current Status

The test now uses vertex_provider and passes successfully with a recorded cassette. All other enhanced JSON Schema features work with gemini-2.5-flash on both APIs:

  • ✅ Discriminated unions (oneOf with discriminator)
  • ✅ Dicts with additionalProperties
  • ✅ Optional/nullable fields (type: 'null')
  • ✅ Integer enums
  • ✅ Recursive schemas on Vertex AI
  • ❌ Recursive schemas on GLA (may be temporary or API-key-specific)

Action Requested

Please verify: Can someone with a GLA API key test recursive schemas with gemini-2.5-flash and confirm whether:

  1. This is a temporary API issue
  2. This is specific to certain API keys/projects
  3. This is a known limitation of the GLA endpoint

The GLA failure may need to be reported to Google if it's reproducible.


I suspect that something is just off with my Gemini API Key (I actually only use vertex mode in dev and prod) and so perhaps that's the reason. @DouweM it would be great if you could modify this test to use the google provider (rather than the vertex provider) and see whether it passes for you.

The __init__ method was just calling super().__init__() with the same
parameters, providing no additional functionality. The base class defaults
are exactly what we need:
- prefer_inlined_defs defaults to False (native $ref/$defs support)
- simplify_nullable_unions defaults to False (type: 'null' support)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
conradlee and others added 2 commits November 12, 2025 15:35
This commit fixes all test failures in the CI/CD pipeline:

1. **test_gemini.py snapshot updates** (7 tests):
   - Updated snapshots to reflect new behavior where JSON schemas are NOT transformed
   - Enums now stay as native types (integers remain integers, not converted to strings)
   - $ref and $defs are now preserved (not inlined)
   - anyOf with type: 'null' replaces nullable: true
   - title fields are preserved

2. **test_gemini_additional_properties_is_true**:
   - Removed pytest.warns() assertion since additionalProperties with schemas now work natively
   - Added docstring explaining this is supported since Nov 2025 announcement

3. **Cassette scrubbing fix**:
   - Added 'client_id' to the list of scrubbed OAuth2 parameters in json_body_serializer.py
   - This ensures all Vertex AI cassettes normalize to the same OAuth credentials
   - Fixes CannotOverwriteExistingCassetteException in CI

4. **Re-scrubbed cassette**:
   - Manually scrubbed client_id in test_google_recursive_schema_native_output_gemini_2_5.yaml
   - Now matches the pattern used by other Vertex AI cassettes

All tests now pass locally. The vertex test is correctly skipped locally and will run in CI using the cassette.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The cassette was recorded with project 'ck-nest-prod' but CI uses 'pydantic-ai'.
Also fixed content-length header to match scrubbed body (137 bytes).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@conradlee conradlee requested a review from DouweM November 12, 2025 15:00
@DouweM
Copy link
Collaborator

DouweM commented Nov 12, 2025

it would be great if you could modify this test to use the google provider (rather than the vertex provider) and see whether it passes for you.

@conradlee It's failing for me as well, I've asked our contacts at Google if that's expected or not.

if '$ref' in schema:
raise UserError(f'Recursive `$ref`s in JSON Schema are not supported by Gemini: {schema["$ref"]}')

if 'prefixItems' in schema:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a test yet that verifies that prefixItems now works

Copy link
Author

@conradlee conradlee Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now added a test for this based on a coordinate class whose json schema representation looks like

{
  "description": "A 2D coordinate with latitude and longitude.",
  "properties": {
    "point": {
      "maxItems": 2,
      "minItems": 2,
      "prefixItems": [
        {
          "type": "number"
        },
        {
          "type": "number"
        }
      ],
      "title": "Point",
      "type": "array"
    }
  },
  "required": [
    "point"
  ],
  "title": "Coordinate",
  "type": "object"
}

Luckily this test passes with the google provider.

conradlee and others added 3 commits November 13, 2025 17:02
1. **Fix comment typo in google.py (line 270)**:
   - Changed `response_schema` to `response_json_schema` to match actual field usage
   - Addresses DouweM's suggestion for accuracy

2. **Add test for prefixItems native support**:
   - New test `test_google_prefix_items_native_output` verifies tuple types work natively
   - Uses `tuple[float, float]` which generates `prefixItems` in JSON schema
   - Confirms we no longer need the prefixItems → items conversion workaround
   - Tests with NYC coordinates as a practical example

Note: Cassette will be recorded by CI or during maintainer review.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Records successful test of tuple types (prefixItems in JSON schema) with gemini-2.5-flash.
The response correctly returns NYC coordinates [40.7128, -74.006] as a tuple.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@conradlee conradlee requested a review from DouweM November 13, 2025 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Gemini API's response_json_schema for structured output

2 participants