Skip to content

[Bug]: Gen AI Eval SDK run_inference() fails when evaluating Multi-Agent #2578

@mzshwanlee

Description

@mzshwanlee

File Name

https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/evaluation/create_agent_and_run_evaluation.ipynb

What happened?

Summary

The client.evals.run_inference() method in the Gen AI Evaluation SDK fails to evaluate
agents that use Multi-Agent architecture.

Environment

  • SDK Version: google-cloud-aiplatform[evaluation] (latest)
  • Agent Framework: Google ADK (Agent Development Kit)
  • Agent Structure: Multi-Agent with sub_agents (root_agent → search_agent, reservation_agent)
  • Tools: MCP Toolbox via toolbox_core library
  • Deployment: Vertex AI Agent Engine (Reasoning Engine)
  • Region: us-central1

Steps to Reproduce

  1. Deploy a Multi-Agent using ADK with the following structure:
    mcp_toolset = McpToolset(toolbox_server_url=TOOLBOX_SERVER_URL)

    search_agent = Agent(
    name="search_agent",
    tools=[analyze_text, get_current_datetime, mcp_toolset],
    ...
    )

    root_agent = Agent(
    name="test",
    sub_agents=[reservation_agent, search_agent],
    ...
    )
    2. Run evaluation using Gen AI Eval SDK:
    agent_dataset_with_inference = client.evals.run_inference(
    agent=AGENT_RESOURCE,
    src=agent_dataset,
    )
    3. The evaluation fails with a parsing error.

Expected Behavior

The SDK should successfully run inference on the deployed Multi-Agent and collect
intermediate_events and response for evaluation.

Actual Behavior

The SDK returns an error:
{
"error": "Failed to parse agent run response [...] to intermediate events and final response: 'text'"
}## Root Cause (from Agent Engine logs)

The actual error in Agent Engine logs shows an asyncio context issue:

Relevant log output

{"error": "Failed to parse agent run response [{'model_version': 'gemini-2.5-flash', 'content': {'parts': [{'function_call': {'id': 'adk-946f20ca-e878-4870-8b11-216eeadb4fec', 'args': {'agent_name': 'search_agent'}, 'name': 'transfer_to_agent'}, 'thought_signature': 'Cp4HAY89a1_kpY2YcICNaaeVDSipHZ5pI2OCglzkjVDUcO-pvOVTjIAHXZooc99UnWWf8DQG_qRgh5BOePI8u08PT8y54U5uPLUhJUbrWt4zVxsOJCUW9_XgIJCONckbqj09olhLwQzX_lGATHRjmFqY9r2IAdI6HgzuUvNGR_Ol7mVC3rgHO83WGkDYhhF85_fl_zjH1wt9ju37r2OMEr-wJOOOWs0TZk8ELgxLqtdZi4tubuCOw2gbBKq3x-Pxp5jONSiSKKuvDR-SR1Dsaogvt-yr2Lqu2X3a1YV-4CnsCbfDVh40OTzCx-mrSURYvuQNdpNO08iIiUXoBxqc5NG5QBjzvXhYJrixVSly4aeI1HnYWnXXGr0MNx9fSL7h2uakRwsd1mCffYWAt3zSWP3bw1W44c7zb4AlwnBb27s5yeEiHYFgyjG0PT9S4JRBgseaQbewd9wktHEtGY461kYuhgNUEYF9-MvU0c2EsJIX7rd4cxT1KnOsRk-coBAEbwX450CWkKzyfE3t0_zoArG3uJm4WU4nL7Xfj2d_tmXvX-18xS4O6P5FtDmhpwxwPB1AW1vbi6xHx6XSi9hKcTHFki_R78ABIvBfXjox4rkGn12BI6sbPRMMtlUQoNNo82UinlLlCKxhwI46sQhCNb3T_i4BeXb43rdxp7HTg6iDY7GfwTcqXSrx6gXTvbl6jR9qAFlHvvmdgU9g26GSMltA_PfouuAMh0ku7jVdmxyE9e_1ToTp3ByGVhvJQhOJtuupDkmtBzbxlE5cffdh_TOwOOItxIMAtN3h2gKdvUlUT33VXiZnxDO89ezOGqMD4ILljDtUg6ikdMb4AOSDVyl_iAFG3Z98yO7A3dq9G5_GjyBFFRh9xmSjlNGGt1oBCYbuKYDdYC7gE9KEOA5f3rJ6hs0V1k8aqo9FPtYresNVZX5jxLa_K_u6TLgsjBptVw7BC72DcFR7f0J78YwuwjKGn2ZynfwnNYq099pzWSyq9GurMmA7gxl2MunxlBnRO4nODjFAi30Mp3IwKBdgeeqm0vmmduky5qEw8iBgFPIWM4hBqc_k380UOcUXoRSEK4Oi5Aqlcxfkyaz0xD8KxS9iPaiu0Dom0pkIH7Jh4vywIuA7KWU9_hNP94RvaM19CzcwefXscKFi17O9nkCHi5_QCeWnqXWdW5YPhvbynUx3KCGDq8KSgSLWg5E5xoH62aiTIshKALoNx_QHHcAgWZI='}], 'role': 'model'}, 'finish_reason': 'STOP', 'usage_metadata': {'candidates_token_count': 11, 'candidates_tokens_details': [{'modality': 'TEXT', 'token_count': 11}], 'prompt_token_count': 1215, 'prompt_tokens_details': [{'modality': 'TEXT', 'token_count': 1215}], 'thoughts_token_count': 215, 'total_token_count': 1441, 'traffic_type': 'ON_DEMAND'}, 'avg_logprobs': -4.73370985551314, 'invocation_id': 'e-5814a0db-4e23-4ab3-85db-74fa2f0224f5', 'author': 'catchtable', 'actions': {'state_delta': {}, 'artifact_delta': {}, 'requested_auth_configs': {}, 'requested_tool_confirmations': {}}, 'long_running_tool_ids': [], 'id': 'ea958c9f-339d-47e0-a5a3-0ca83f666a3f', 'timestamp': 1767945596.923423}, {'content': {'parts': [{'function_response': {'id': 'adk-946f20ca-e878-4870-8b11-216eeadb4fec', 'name': 'transfer_to_agent', 'response': {'result': None}}}], 'role': 'user'}, 'invocation_id': 'e-5814a0db-4e23-4ab3-85db-74fa2f0224f5', 'author': 'catchtable', 'actions': {'state_delta': {}, 'artifact_delta': {}, 'transfer_to_agent': 'search_agent', 'requested_auth_configs': {}, 'requested_tool_confirmations': {}}, 'id': '806afc44-3b0b-431f-be3d-4a1bfc0ac8e0', 'timestamp': 1767945598.723608}] to intermediate events and final response: 'text'"}

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions