-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Question
Hey all,
Wasn't sure where to put this since it's not really a Pydantic AI specific bug, we first noticed this issue in LiveKit's Agent SDK and it just so happens that the new GoogleModel/Provider in Pydantic AI also rely on the same underlying SDK. My main ask here is that you don't fully deprecate or remove the older GeminiModel and GoogleVertexProvider implementations until it's actually resolved.
Google GenAI SDK Bug Details
Google's python-genai SDK has a persistent issue that's causing increased latency in generation times (this is particularly bad with multimodal requests):
- Bug - aiohttp did not reuse shared connectio googleapis/python-genai#1206
- Bug - SDK does not reuse aiohttp session googleapis/python-genai#1074
- async
generate_content
is very slow googleapis/python-genai#557
The increased latency is anywhere from 3-5 seconds on average over 50 parallel eval runs, even using aiohttp and overriding the async client settings with HTTP2. I've tested this in a variety of network conditions and the issue is always the same. They claim this is fixed, but even after upgrading to 1.37 and using google-genai[aiohttp] it doesn't seem to be resolved. I know this issue is SDK-specific because regular REST calls to their API work as expected (e.g. the older GeminiModel implementation, and also how LiteLLM implemented Gemini).
Here's the configuration we're using for the HTTPX client, both passed to GeminiModel and others in the http_client field, and passed as async_client_args to HttpOptionsDict in Google's GeniAI SDK client which is then used to create the GoogleProvider:
def get_async_client_args() -> dict[str, Any]:
"""
Get httpx.AsyncClient configuration arguments optimized for LLM providers.
Returns:
dict: Configuration arguments to pass to httpx.AsyncClient or as async_client_args
"""
context = ssl.create_default_context()
return {
'http2': True,
'http1': False,
'timeout': httpx.Timeout(
timeout=600,
connect=5,
),
'limits': httpx.Limits(
max_connections=100,
max_keepalive_connections=20,
keepalive_expiry=30.0,
),
'transport': httpx.AsyncHTTPTransport(http2=True),
'verify': context
}
Again, everything works fine if we just use the older code (that's marked deprecated), the only issue with GeminiModel is if you use include_thinking: True, mangles the thinking tokens into the text response, which breaks structured outputs. I was going to fix this in the implementation and I'm happy to send a PR.
Additional Context
Pydantic AI v1.0.13
Python 3.12