Skip to content

Google SDK Issues / Request to Not Deprecate GeminiModel #3071

@natestraub

Description

@natestraub

Question

Hey all,

Wasn't sure where to put this since it's not really a Pydantic AI specific bug, we first noticed this issue in LiveKit's Agent SDK and it just so happens that the new GoogleModel/Provider in Pydantic AI also rely on the same underlying SDK. My main ask here is that you don't fully deprecate or remove the older GeminiModel and GoogleVertexProvider implementations until it's actually resolved.

Google GenAI SDK Bug Details
Google's python-genai SDK has a persistent issue that's causing increased latency in generation times (this is particularly bad with multimodal requests):

The increased latency is anywhere from 3-5 seconds on average over 50 parallel eval runs, even using aiohttp and overriding the async client settings with HTTP2. I've tested this in a variety of network conditions and the issue is always the same. They claim this is fixed, but even after upgrading to 1.37 and using google-genai[aiohttp] it doesn't seem to be resolved. I know this issue is SDK-specific because regular REST calls to their API work as expected (e.g. the older GeminiModel implementation, and also how LiteLLM implemented Gemini).

Here's the configuration we're using for the HTTPX client, both passed to GeminiModel and others in the http_client field, and passed as async_client_args to HttpOptionsDict in Google's GeniAI SDK client which is then used to create the GoogleProvider:

 def get_async_client_args() -> dict[str, Any]:
    """
    Get httpx.AsyncClient configuration arguments optimized for LLM providers.
    
    Returns:
        dict: Configuration arguments to pass to httpx.AsyncClient or as async_client_args
    """
    context = ssl.create_default_context()
    return {
        'http2': True,
        'http1': False,
        'timeout': httpx.Timeout(
            timeout=600,
            connect=5,
        ),
        'limits': httpx.Limits(
            max_connections=100,
            max_keepalive_connections=20,
            keepalive_expiry=30.0,
        ),
        'transport': httpx.AsyncHTTPTransport(http2=True),
        'verify': context
    }

Again, everything works fine if we just use the older code (that's marked deprecated), the only issue with GeminiModel is if you use include_thinking: True, mangles the thinking tokens into the text response, which breaks structured outputs. I was going to fix this in the implementation and I'm happy to send a PR.

Additional Context

Pydantic AI v1.0.13
Python 3.12

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions